Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Youlin Feng	cc571dab91	cmd/compile: deduplicate instructions when rewrite func results After CL 628075, do not rely on the memory arg of an OpLocalAddr. Fixes #74788 Change-Id: I4e893241e3949bb8f2d93c8b88cc102e155b725d Reviewed-on: https://go-review.googlesource.com/c/go/+/691275 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Mark Freeman <mark@golang.org>	2025-07-30 09:38:10 -07:00
Cuong Manh Le	bd94ae8903	cmd/compile: use unsigned power-of-two detector for unsigned mod Same as CL 689815, but for modulus instead of division. Updates #74485 Change-Id: I73000231c886a987a1093669ff207fd9117a8160 Reviewed-on: https://go-review.googlesource.com/c/go/+/689895 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-07-29 16:22:40 -07:00
Cuong Manh Le	f3582fc80e	cmd/compile: add unsigned power-of-two detector Fixes #74485 Change-Id: Ia22a58ac43bdc36c8414d555672a3a3eafc749ca Reviewed-on: https://go-review.googlesource.com/c/go/+/689815 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2025-07-29 16:22:37 -07:00
Michael Munday	46b5839231	test/codegen: fix failing condmove wasm tests These recently added tests failed when using the -all_codgen flag. Fixes #74770 Change-Id: Idea1ea02af2bd9f45c7d0a28d633c7442328e6df Reviewed-on: https://go-review.googlesource.com/c/go/+/690715 Reviewed-by: Jorropo <jorropo.pgm@gmail.com> Run-TryBot: Michael Munday <mikemndy@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Mark Freeman <mark@golang.org> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> TryBot-Bypass: Michael Knyszek <mknyszek@google.com>	2025-07-28 11:01:53 -07:00
Jorropo	ce05ad448f	cmd/compile: rewrite condselects into doublings and halvings For performance see CL 685676. This allows something like: if y { x *= 2 } To be compiled to: SHLXQ BX, AX, AX Instead of: MOVQ AX, CX SHLQ $1, CX MOVBLZX BL, DX TESTQ DX, DX CMOVQNE CX, AX While ./make.bash uniqued per LOC, there is 2 doublings and 4 halvings. Change-Id: Ic0727cbf429528a2dbf17cbfc3b0121db8387444 Reviewed-on: https://go-review.googlesource.com/c/go/+/685695 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-07-24 14:42:15 -07:00
Jorropo	fcd28070fe	cmd/compile: add opt branchelim to rewrite some CondSelect into math This allows something like: if y { x++ } To be compiled to: MOVBLZX BX, CX ADDQ CX, AX Instead of: LEAQ 1(AX), CX MOVBLZX BL, DX TESTQ DX, DX CMOVQNE CX, AX While ./make.bash uniqued per LOC, there is 100 additions and 75 substractions. See benchmark here: https://go.dev/play/p/DJf5COjwhd_s Either it's a performance no-op or it is faster: goos: linux goarch: amd64 cpu: AMD Ryzen 5 3600 6-Core Processor │ /tmp/old.logs │ /tmp/new.logs │ │ sec/op │ sec/op vs base │ CmovInlineConditionAddLatency-12 0.5443n ± 5% 0.5339n ± 3% -1.90% (p=0.004 n=10) CmovInlineConditionAddThroughputBy6-12 1.492n ± 1% 1.494n ± 1% ~ (p=0.955 n=10) CmovInlineConditionSubLatency-12 0.5419n ± 3% 0.5282n ± 3% -2.52% (p=0.019 n=10) CmovInlineConditionSubThroughputBy6-12 1.587n ± 1% 1.584n ± 2% ~ (p=0.492 n=10) CmovOutlineConditionAddLatency-12 0.5223n ± 1% 0.2639n ± 4% -49.47% (p=0.000 n=10) CmovOutlineConditionAddThroughputBy6-12 1.159n ± 1% 1.097n ± 2% -5.35% (p=0.000 n=10) CmovOutlineConditionSubLatency-12 0.5271n ± 3% 0.2654n ± 2% -49.66% (p=0.000 n=10) CmovOutlineConditionSubThroughputBy6-12 1.053n ± 1% 1.050n ± 1% ~ (p=1.000 n=10) geomean There are other benefits not tested by this benchmark: - the math form is usually a couple bytes shorter (ICACHE) - the math form is usually 0~2 uops shorter (UCACHE) - the math form has usually less register pressure* - the math form can sometimes be optimized further *regalloc rarely find how it can use less registers As far as pass ordering goes there are many possible options, I've decided to reorder branchelim before late opt since: - unlike running exclusively the CondSelect rules after branchelim, some extra optimizations might trigger on the adds or subs. - I don't want to maintain a second generic.rules file of only the stuff, that can trigger after branchelim. - rerunning all of opt a third time increase compilation time for little gains. By elimination moving branchelim seems fine. Change-Id: I869adf57e4d109948ee157cfc47144445146bafd Reviewed-on: https://go-review.googlesource.com/c/go/+/685676 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2025-07-24 14:42:10 -07:00
Alexander Musman	bd80f74bc1	cmd/compile: fold shift through AND for slice operations Fold a shift through AND when the AND gets a zero-or-one operand (e.g. from arithmetic shift by 63 of a 64-bit value) for a common case with slice operations: ASR $63, R2, R2 AND R3<<3, R2, R2 ADD R2, R0, R2 As the operands are 64-bit, we can transform it to: AND R2->63, R3, R2 ADD R2<<3, R0, R2 Code size improvement: compile: .text: 9088004 -> 9086292 (-0.02%) etcd: .text: 10500276 -> 10498964 (-0.01%) Change-Id: Ibcd5e67173da39b77ceff77ca67812fb8be5a7b5 Reviewed-on: https://go-review.googlesource.com/c/go/+/679895 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <mark@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2025-07-24 13:47:20 -07:00
Alexander Musman	dcb479c2f9	cmd/compile: optimize slice bounds checking with SUB/SUBconst comparisons Optimize ARM64 code generation for slice bounds checking by recognizing patterns where comparisons to zero involve SUB or SUBconst operations. This change adds SSA opt rules to simplify: (CMPconst [0] (SUB x y)) => (CMP x y) The optimizations apply to EQ, NE, ULE, and UGT comparisons, enabling more efficient bounds checking for slice operations. Code size improvement: compile: .text: 9088004 -> 9065988 (-0.24%) etcd: .text: 10500276 -> 10497092 (-0.03%) Change-Id: I467cb27674351652bcacc52b87e1f19677bd46a8 Reviewed-on: https://go-review.googlesource.com/c/go/+/679915 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Keith Randall <khr@golang.org>	2025-07-24 12:39:53 -07:00
Paul Murphy	ee7bfbdbcc	cmd/compile/internal/ssa: fix PPC64 merging of (AND (S[RL]Dconst ...) CL 622236 forgot to check the mask was also a 32 bit rotate mask. Add a modified version of isPPC64WordRotateMask which valids the mask is contiguous and fits inside a uint32. I don't this is possible when merging SRDconst, the first check should always reject such combines. But, be extra careful and do it there too. Fixes #73153 Change-Id: Ie95f74ec5e7d89dc761511126db814f886a7a435 Reviewed-on: https://go-review.googlesource.com/c/go/+/679775 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-06-09 20:33:27 -07:00
Jake Bailey	27ff0f249c	cmd/compile/internal/ssa: eliminate string copies for calls to unique.Make unique.Make always copies strings passed into it, so it's safe to not copy byte slices converted to strings either. Handle this just like map accesses with string(b) as keys. This CL only handles unique.Make(string(b)), not nested cases like unique.Make([2]string{string(b1), string(b2)}); this could be done in a followup CL but the map lookup code in walk is sufficiently different than the call handling code that I didn't attempt it. (SSA is much easier). Fixes #71926 Change-Id: Ic2f82f2f91963d563b4ddb1282bd49fc40da8b85 Reviewed-on: https://go-review.googlesource.com/c/go/+/672135 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-21 20:20:31 -07:00
thepudds	f4de2ecffb	cmd/compile/internal/walk: convert composite literals to interfaces without allocating Today, this interface conversion causes the struct literal to be heap allocated: var sink any func example1() { sink = S{1, 1} } For basic literals like integers that are directly used in an interface conversion that would otherwise allocate, the compiler is able to use read-only global storage (see #18704). This CL extends that to struct and array literals as well by creating read-only global storage that is able to represent for example S{1, 1}, and then using a pointer to that storage in the interface when the interface conversion happens. A more challenging example is: func example2() { v := S{1, 1} sink = v } In this case, the struct literal is not directly part of the interface conversion, but is instead assigned to a local variable. To still avoid heap allocation in cases like this, in walk we construct a cache that maps from expressions used in interface conversions to earlier expressions that can be used to represent the same value (via ir.ReassignOracle.StaticValue). This is somewhat analogous to how we avoided heap allocation for basic literals in CL 649077 earlier in our stack, though here we also need to do a little more work to create the read-only global. CL 649076 (also earlier in our stack) added most of the tests along with debug diagnostics in convert.go to make it easier to test this change. See the writeup in #71359 for details. Fixes #71359 Fixes #71323 Updates #62653 Updates #53465 Updates #8618 Change-Id: I8924f0c69ff738ea33439bd6af7b4066af493b90 Reviewed-on: https://go-review.googlesource.com/c/go/+/649555 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>	2025-05-21 12:23:26 -07:00
Junyang Shao	d6c29c7156	cmd/compile: fix offset calculation error in memcombine Fixes #73812 Change-Id: If7a6e103ae9e1442a2cf4a3c6b1270b6a1887196 Reviewed-on: https://go-review.googlesource.com/c/go/+/675175 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-21 12:17:08 -07:00
Xiaolin Zhao	4ce1c8e9e1	cmd/compile: add rules about ORN and ANDN Reduce the number of go toolchain instructions on loong64 as follows. file before after Δ % addr2line 279880 279776 -104 -0.0372% asm 556638 556410 -228 -0.0410% buildid 272272 272072 -200 -0.0735% cgo 481522 481318 -204 -0.0424% compile 2457788 2457580 -208 -0.0085% covdata 323384 323280 -104 -0.0322% cover 518450 518234 -216 -0.0417% dist 340790 340686 -104 -0.0305% distpack 282456 282252 -204 -0.0722% doc 789932 789688 -244 -0.0309% fix 324332 324228 -104 -0.0321% link 704622 704390 -232 -0.0329% nm 277132 277028 -104 -0.0375% objdump 507862 507758 -104 -0.0205% pack 221774 221674 -100 -0.0451% pprof 1469816 1469552 -264 -0.0180% test2json 254836 254732 -104 -0.0408% trace 1100002 1099738 -264 -0.0240% vet 781078 780874 -204 -0.0261% go 1529116 1528848 -268 -0.0175% gofmt 318556 318448 -108 -0.0339% total 13792238 13788566 -3672 -0.0266% Change-Id: I23fb3ebd41309252c7075e57ea7094e79f8c4fef Reviewed-on: https://go-review.googlesource.com/c/go/+/674335 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: abner chenc <chenguoqi@loongson.cn> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn>	2025-05-21 08:28:37 -07:00
Xiaolin Zhao	d37a1bdd48	cmd/compile: fix the implementation of NORconst on loong64 In the loong64 instruction set, there is no NORI instruction, so the immediate value in NORconst need to be stored in register and then use the three-register NOR instruction. Change-Id: I5ef697450619317218cb3ef47fc07e238bdc2139 Reviewed-on: https://go-review.googlesource.com/c/go/+/673836 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-20 20:24:09 -07:00
Junyang Shao	113b25774e	cmd/compile: memcombine different size stores This CL implements the TODO in combineStores to allow combining stores of different sizes, as long as the total size aligns to 2, 4, 8. Fixes #72832. Change-Id: I6d1d471335da90d851ad8f3b5a0cf10bdcfa17c4 Reviewed-on: https://go-review.googlesource.com/c/go/+/661855 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Junyang Shao <shaojunyang@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-20 13:00:16 -07:00
Julian Zhu	dfebef1c04	cmd/compile: fold negation into addition/subtraction on arm64 Fold negation into addition/subtraction and avoid double negation. platform: linux/arm64 file before after Δ % addr2line 3628108 3628116 +8 +0.000% asm 6208353 6207857 -496 -0.008% buildid 3460682 3460418 -264 -0.008% cgo 5572988 5572492 -496 -0.009% compile 26042159 26041039 -1120 -0.004% cover 6304328 6303472 -856 -0.014% dist 4139330 4139098 -232 -0.006% doc 9429305 9428065 -1240 -0.013% fix 3997189 3996733 -456 -0.011% link 8212128 8210280 -1848 -0.023% nm 3620056 3619696 -360 -0.010% objdump 5920289 5919233 -1056 -0.018% pack 2892250 2891778 -472 -0.016% pprof 17094569 17092745 -1824 -0.011% test2json 3335825 3335529 -296 -0.009% trace 15842080 15841456 -624 -0.004% vet 9472194 9471106 -1088 -0.011% go 19081541 19081509 -32 -0.000% total 154253374 154240622 -12752 -0.008% platform: darwin/arm64 file before after Δ % compile 27152002 27135490 -16512 -0.061% link 8372914 8356402 -16512 -0.197% go 19154802 19154778 -24 -0.000% total 157734180 157701132 -33048 -0.021% Change-Id: I15a349bfbaf7333ec3e4a62ae4d06f3f371dfb1d Reviewed-on: https://go-review.googlesource.com/c/go/+/673715 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-20 11:08:28 -07:00
Keith Randall	3baf53aec6	cmd/compile: derive bounds on signed %N for N a power of 2 -N+1 <= x % N <= N-1 This is useful for cases like: func setBit(b []byte, i int) { b[i/8] \|= 1<<(i%8) } The shift does not need protection against larger-than-7 cases. (It does still need protection against <0 cases.) Change-Id: Idf83101386af538548bfeb6e2928cea855610ce2 Reviewed-on: https://go-review.googlesource.com/c/go/+/672995 Reviewed-by: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2025-05-19 15:21:54 -07:00
Julian Zhu	d52679006c	cmd/compile: fold negation into addition/subtraction on mipsx Fold negation into addition/subtraction and avoid double negation. file before after Δ % addr2line 3742022 3741986 -36 -0.001% asm 6668616 6668628 +12 +0.000% buildid 3583786 3583630 -156 -0.004% cgo 6020370 6019634 -736 -0.012% compile 29416016 29417336 +1320 +0.004% cover 6801903 6801675 -228 -0.003% dist 4485916 4485816 -100 -0.002% doc 10652787 10652251 -536 -0.005% fix 4115988 4115560 -428 -0.010% link 9002328 9001616 -712 -0.008% nm 3733148 3732780 -368 -0.010% objdump 6163292 6163068 -224 -0.004% pack 2944768 2944604 -164 -0.006% pprof 18909973 18908773 -1200 -0.006% test2json 3394662 3394778 +116 +0.003% trace 17350911 17349751 -1160 -0.007% vet 10077727 10077527 -200 -0.002% go 19118769 19118609 -160 -0.001% total 166182982 166178022 -4960 -0.003% Change-Id: Id55698800fd70f3cb2ff48393584456b87208921 Reviewed-on: https://go-review.googlesource.com/c/go/+/673556 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2025-05-19 11:27:35 -07:00
Julian Zhu	8097cf14d2	cmd/compile: fold negation into addition/subtraction on mips64x Fold negation into addition/subtraction and avoid double negation. file before after Δ % addr2line 4007310 4007470 +160 +0.004% asm 7007636 7007436 -200 -0.003% buildid 3839268 3838972 -296 -0.008% cgo 6353466 6352738 -728 -0.011% compile 30426920 30426896 -24 -0.000% cover 7005408 7004744 -664 -0.009% dist 4651192 4650872 -320 -0.007% doc 10606050 10606034 -16 -0.000% fix 4446414 4446390 -24 -0.001% link 9237736 9237024 -712 -0.008% nm 3999107 3999323 +216 +0.005% objdump 6762424 6762144 -280 -0.004% pack 3270757 3270493 -264 -0.008% pprof 19428299 19361939 -66360 -0.342% test2json 3717345 3717217 -128 -0.003% trace 17382273 17381657 -616 -0.004% vet 10689481 10688985 -496 -0.005% go 19118769 19118609 -160 -0.001% total 171949855 171878943 -70912 -0.041% Change-Id: I35c1f264d216c214ea3f56252a9ddab8ea850fa6 Reviewed-on: https://go-review.googlesource.com/c/go/+/673555 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2025-05-16 11:06:06 -07:00
Keith Randall	d681270714	cmd/compile: allow load-op merging in additional situations x += p We want to do this with a single load+add operation on amd64. The tricky part is that we don't want to combine if there are other uses of x after this instruction. Implement a simple detector that seems to capture a common situation - x += p is in a loop, and the other use of x is after loop exit. In that case, it does not hurt to do the load+add combo. Change-Id: I466174cce212e78bde83f908cc1f2752b560c49c Reviewed-on: https://go-review.googlesource.com/c/go/+/672957 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-15 15:21:36 -07:00
Keith Randall	19f05770b0	cmd/compile: schedule induction variable increments late for ..; ..; i++ { ... } We want to schedule the i++ late in the block, so that all other uses of i in the block are scheduled first. That way, i++ can happen in place in a register instead of requiring a temporary register. Change-Id: Id777407c7e67a5ddbd8e58251099b0488138c0df Reviewed-on: https://go-review.googlesource.com/c/go/+/672998 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>	2025-05-15 14:06:41 -07:00
Xiaolin Zhao	c31a5c571f	cmd/compile: fold negation into addition/subtraction on loong64 This change also avoid double negation, and add loong64 codegen for arithmetic tests. Reduce the number of go toolchain instructions on loong64 as follows. file before after Δ % addr2line 279972 279896 -76 -0.0271% asm 556390 556310 -80 -0.0144% buildid 272376 272300 -76 -0.0279% cgo 481534 481550 +16 +0.0033% compile 2457992 2457396 -596 -0.0242% covdata 323488 323404 -84 -0.0260% cover 518630 518490 -140 -0.0270% dist 340894 340814 -80 -0.0235% distpack 282568 282484 -84 -0.0297% doc 790224 789984 -240 -0.0304% fix 324408 324348 -60 -0.0185% link 704910 704666 -244 -0.0346% nm 277220 277144 -76 -0.0274% objdump 508026 507878 -148 -0.0291% pack 221810 221786 -24 -0.0108% pprof 1470284 1469880 -404 -0.0275% test2json 254896 254852 -44 -0.0173% trace 1100390 1100074 -316 -0.0287% vet 781398 781142 -256 -0.0328% go 1529668 1529128 -540 -0.0353% gofmt 318668 318568 -100 -0.0314% total 13795746 13792094 -3652 -0.0265% Change-Id: I88d1f12cfc4be0e92687c48e06a57213aa484aca Reviewed-on: https://go-review.googlesource.com/c/go/+/672555 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2025-05-14 17:46:58 -07:00
Jakub Ciolek	c9d0fad5cb	cmd/compile: add 2 phiopt cases Add 2 more cases: if a { x = value } else { x = a } => x = a && value if a { x = a } else { x = value } => x = a \|\| value AND case goes from: 00006 (8) TESTB AX, AX 00007 (8) JNE 9 00008 (13) MOVL AX, BX 00009 (13) MOVL BX, AX 00010 (13) RET to: 00006 (13) ANDL BX, AX 00007 (13) RET OR goes from: 00006 (19) TESTB AX, AX 00007 (19) JNE 9 00008 (24) MOVL BX, AX 00009 (24) RET to: 00006 (24) ORL BX, AX 00007 (24) RET compilecmp linux/amd64: runtime runtime.lock2 847 -> 869 (+2.60%) runtime.addspecial 542 -> 517 (-4.61%) runtime.tracebackPCs changed runtime.scanstack changed runtime.mallocinit changed runtime.traceback2 2238 -> 2206 (-1.43%) runtime [cmd/compile] runtime.lock2 860 -> 882 (+2.56%) runtime.scanstack changed runtime.addspecial 542 -> 517 (-4.61%) runtime.traceback2 2238 -> 2206 (-1.43%) runtime.lockWithRank 870 -> 890 (+2.30%) runtime.tracebackPCs changed runtime.mallocinit changed strconv strconv.ryuFtoaFixed32 changed strconv.ryuFtoaFixed64 639 -> 638 (-0.16%) strconv.readFloat changed strconv.ryuFtoaShortest changed strings strings.(Replacer).build changed strconv [cmd/compile] strconv.readFloat changed strconv.ryuFtoaFixed64 639 -> 638 (-0.16%) strconv.ryuFtoaFixed32 changed strconv.ryuFtoaShortest changed strings [cmd/compile] strings.(Replacer).build changed regexp regexp.makeOnePass.func1 changed regexp [cmd/compile] regexp.makeOnePass.func1 changed encoding/json encoding/json.indirect changed database/sql database/sql.driverArgsConnLocked changed vendor/golang.org/x/text/unicode/norm vendor/golang.org/x/text/unicode/norm.Form.transform changed go/doc/comment go/doc/comment.parseSpans changed internal/diff internal/diff.tgs changed log/slog log/slog.(handleState).appendNonBuiltIns 1898 -> 1877 (-1.11%) testing/fstest testing/fstest.(fsTester).checkGlob changed runtime/pprof runtime/pprof.(profileBuilder).build changed cmd/internal/dwarf cmd/internal/dwarf.isEmptyInlinedCall 254 -> 244 (-3.94%) go/printer go/printer.keepTypeColumn 302 -> 270 (-10.60%) go/printer.(printer).binaryExpr changed cmd/compile/internal/syntax cmd/compile/internal/syntax.(scanner).rune changed cmd/compile/internal/syntax.(scanner).number 2137 -> 2153 (+0.75%) Change-Id: I7f95f54b03a35d0b616c40f38b415a7feb71be73 Reviewed-on: https://go-review.googlesource.com/c/go/+/666835 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Jakub Ciolek <jakub@ciolek.dev> TryBot-Bypass: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-08 10:18:37 -07:00
Keith Randall	12110c3f7e	cmd/compile: improve multiplication strength reduction Use an automatic algorithm to generate strength reduction code. You give it all the linear combination (ax+by) instructions in your architecture, it figures out the rest. Just amd64 and arm64 for now. Fixes #67575 Change-Id: I35c69382bebb1d2abf4bb4e7c43fd8548c6c59a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/626998 Reviewed-by: Jakub Ciolek <jakub@ciolek.dev> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-01 09:33:31 -07:00
Joel Sing	4d10d4ad84	cmd/compile,internal/cpu,runtime: intrinsify math/bits.OnesCount on riscv64 For riscv64/rva22u64 and above, we can intrinsify math/bits.OnesCount using the CPOP/CPOPW machine instructions. Since the native Go implementation of OnesCount is relatively expensive, it is also worth emitting a check for Zbb support when compiled for rva20u64. On a Banana Pi F3, with GORISCV64=rva22u64: │ oc.1 │ oc.2 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.930n ± 0% 4.389n ± 0% -74.08% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.016n ± 0% -11.10% (p=0.000 n=10) OnesCount16-8 9.404n ± 0% 5.015n ± 0% -46.67% (p=0.000 n=10) OnesCount32-8 13.165n ± 0% 4.388n ± 0% -66.67% (p=0.000 n=10) OnesCount64-8 16.300n ± 0% 4.388n ± 0% -73.08% (p=0.000 n=10) geomean 11.40n 4.629n -59.40% On a Banana Pi F3, compiled with GORISCV64=rva20u64 and with Zbb detection enabled: │ oc.3 │ oc.4 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.930n ± 0% 5.643n ± 0% -66.67% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.642n ± 0% ~ (p=0.447 n=10) OnesCount16-8 10.030n ± 0% 6.896n ± 0% -31.25% (p=0.000 n=10) OnesCount32-8 13.170n ± 0% 5.642n ± 0% -57.16% (p=0.000 n=10) OnesCount64-8 16.300n ± 0% 5.642n ± 0% -65.39% (p=0.000 n=10) geomean 11.55n 5.873n -49.16% On a Banana Pi F3, compiled with GORISCV64=rva20u64 but with Zbb detection disabled: │ oc.3 │ oc.5 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.93n ± 0% 29.47n ± 0% +74.07% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.643n ± 0% ~ (p=0.191 n=10) OnesCount16-8 10.03n ± 0% 15.05n ± 0% +50.05% (p=0.000 n=10) OnesCount32-8 13.17n ± 0% 18.18n ± 0% +38.04% (p=0.000 n=10) OnesCount64-8 16.30n ± 0% 21.94n ± 0% +34.60% (p=0.000 n=10) geomean 11.55n 15.84n +37.16% For hardware without Zbb, this adds ~5ns overhead, while for hardware with Zbb we achieve a performance gain up of up to 11ns. It is worth noting that OnesCount8 is cheap enough that it is preferable to stick with the generic version in this case. Change-Id: Id657e40e0dd1b1ab8cc0fe0f8a68df4c9f2d7da5 Reviewed-on: https://go-review.googlesource.com/c/go/+/660856 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-01 05:57:41 -07:00
Joel Sing	90e8b8cdae	cmd/compile: intrinsify math/bits.Bswap on riscv64 For riscv64/rva22u64 and above, we can intrinsify math/bits.Bswap using the REV8 machine instruction. On a StarFive VisionFive 2 with GORISCV64=rva22u64: │ rb.1 │ rb.2 │ │ sec/op │ sec/op vs base │ ReverseBytes-4 18.790n ± 0% 4.026n ± 0% -78.57% (p=0.000 n=10) ReverseBytes16-4 6.710n ± 0% 5.368n ± 0% -20.00% (p=0.000 n=10) ReverseBytes32-4 13.420n ± 0% 5.368n ± 0% -60.00% (p=0.000 n=10) ReverseBytes64-4 17.450n ± 0% 4.026n ± 0% -76.93% (p=0.000 n=10) geomean 13.11n 4.649n -64.54% Change-Id: I26eee34270b1721f7304bb1cddb0fda129b20ece Reviewed-on: https://go-review.googlesource.com/c/go/+/660855 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Junyang Shao <shaojunyang@google.com>	2025-05-01 05:57:13 -07:00
Keith Randall	7d0cb2a2ad	cmd/compile: constant fold 128-bit multiplies The full 64x64->128 multiply comes up when using bits.Mul64. The 64x64->64+overflow multiply comes up in unsafe.Slice when using a constant length. Change-Id: I298515162ca07d804b2d699d03bc957ca30a4ebc Reviewed-on: https://go-review.googlesource.com/c/go/+/667175 Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-04-22 10:24:18 -07:00
Keith Randall	8af32240c6	cmd/compile: don't evaluate side effects of range over array If the thing we're ranging over is an array or ptr to array, and it doesn't have a function call or channel receive in it, then we shouldn't evaluate it. Typecheck the ranged-over value as a constant in that case. That makes the unified exporter replace the range expression with a constant int. Change-Id: I0d4ea081de70d20cf6d1fa8d25ef6cb021975554 Reviewed-on: https://go-review.googlesource.com/c/go/+/659317 Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com>	2025-04-21 15:50:43 -07:00
limeidan	09d76e59d2	cmd/compile: set unalignedOK to make memcombine work properly on loong64 goos: linux goarch: loong64 pkg: unicode/utf8 cpu: Loongson-3A6000-HV @ 2500.00MHz │ old │ new │ │ sec/op │ sec/op vs base │ ValidTenASCIIChars 7.604n ± 0% 6.805n ± 0% -10.51% (p=0.000 n=10) Valid100KASCIIChars 37.41µ ± 0% 16.58µ ± 0% -55.67% (p=0.000 n=10) ValidTenJapaneseChars 60.84n ± 0% 58.62n ± 0% -3.64% (p=0.000 n=10) ValidLongMostlyASCII 113.5µ ± 0% 113.5µ ± 0% ~ (p=0.303 n=10) ValidLongJapanese 204.6µ ± 0% 206.8µ ± 0% +1.07% (p=0.000 n=10) ValidStringTenASCIIChars 7.604n ± 0% 6.803n ± 0% -10.53% (p=0.000 n=10) ValidString100KASCIIChars 38.05µ ± 0% 17.14µ ± 0% -54.97% (p=0.000 n=10) ValidStringTenJapaneseChars 60.58n ± 0% 59.48n ± 0% -1.82% (p=0.000 n=10) ValidStringLongMostlyASCII 113.5µ ± 0% 113.4µ ± 0% -0.10% (p=0.000 n=10) ValidStringLongJapanese 205.9µ ± 0% 207.3µ ± 0% +0.67% (p=0.000 n=10) geomean 3.324µ 2.756µ -17.08% Change-Id: Id43b6e2e41907bd4b92f421dacde31f048db47d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/662495 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Keith Randall <khr@google.com>	2025-04-09 09:18:20 -07:00
Alexander Musman	16a6b71f18	cmd/compile: improve store-to-load forwarding with compatible types Improve the compiler's store-to-load forwarding optimization by relaxing the type comparison condition. Instead of requiring exact type equality (CMPeq), we now use copyCompatibleType which allows forwarding between compatible types where safe. Fix several size comparison bugs in the nested store patterns. Previously, we were comparing the size of the outer store with the load type, rather than comparing with the size of the actual store being forwarded from. Skip OpConvert in dead store elimination to help get rid of dead stores such as zeroing slices. OpConvert, like OpInlMark, doesn't really use the memory. This optimization is particularly beneficial for code that creates slices with computed pointers, such as the runtime's heapBitsSlice function, where intermediate calculations were previously causing the compiler to miss store-to-load forwarding opportunities. Local sweet run result on an x86_64 laptop: │ Orig.res │ Hopt.res │ │ sec/op │ sec/op vs base │ BiogoIgor-8 5.303 ± 1% 5.322 ± 1% ~ (p=0.190 n=10) BiogoKrishna-8 7.894 ± 1% 7.828 ± 2% ~ (p=0.190 n=10) BleveIndexBatch100-8 2.257 ± 1% 2.248 ± 2% ~ (p=0.529 n=10) EtcdPut-8 30.12m ± 1% 30.03m ± 1% ~ (p=0.796 n=10) EtcdSTM-8 127.1m ± 1% 126.2m ± 0% -0.74% (p=0.023 n=10) GoBuildKubelet-8 52.21 ± 0% 52.05 ± 1% ~ (p=0.063 n=10) GoBuildKubeletLink-8 4.342 ± 1% 4.305 ± 0% -0.85% (p=0.000 n=10) GoBuildIstioctl-8 43.33 ± 0% 43.24 ± 0% -0.22% (p=0.015 n=10) GoBuildIstioctlLink-8 4.604 ± 1% 4.598 ± 0% ~ (p=0.063 n=10) GoBuildFrontend-8 15.33 ± 0% 15.29 ± 0% ~ (p=0.143 n=10) GoBuildFrontendLink-8 740.0m ± 1% 737.7m ± 1% ~ (p=0.912 n=10) GopherLuaKNucleotide-8 9.590 ± 1% 9.656 ± 1% ~ (p=0.165 n=10) MarkdownRenderXHTML-8 96.97m ± 1% 97.26m ± 2% ~ (p=0.105 n=10) Tile38QueryLoad-8 335.9µ ± 1% 335.6µ ± 1% ~ (p=0.481 n=10) geomean 1.336 1.333 -0.22% Change-Id: I031552623e6d5a3b1b5be8325e6314706e45534f Reviewed-on: https://go-review.googlesource.com/c/go/+/662075 Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Carlos Amedee <carlos@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-04-04 08:25:47 -07:00
Joel Sing	e6c2e12c63	cmd/compile/internal/ssa: optimise more branches with zero on riscv64 Optimise more branches with zero on riscv64. In particular, BLTU with zero occurs with IsInBounds checks for index zero. This currently results in two instructions and requires an additional register: li t2, 0 bltu t2, t1, 0x174b4 This is equivalent to checking if the bounds is not equal to zero. With this change: bnez t1, 0x174c0 This removes more than 500 instructions from the Go binary on riscv64. Change-Id: I6cd861d853e3ef270bd46dacecdfaa205b1c4644 Reviewed-on: https://go-review.googlesource.com/c/go/+/606715 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>	2025-03-28 01:27:22 -07:00
Mark Freeman	6722c008c1	cmd/compile: rename some test packages in codegen All other files here use the codegen package. Change-Id: I714162941b9fa9051dacc29643e905fe60b9304b Reviewed-on: https://go-review.googlesource.com/c/go/+/661135 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org>	2025-03-27 13:54:37 -07:00
Joel Sing	6bf95d40bb	test/codegen: add combined conversion and shift tests This adds tests for type conversion and shifts, detailing various poor bad code generation that currently exists for riscv64. This will be addressed in future CLs. Change-Id: Ie1d366dfe878832df691600f8500ef383da92848 Reviewed-on: https://go-review.googlesource.com/c/go/+/615678 Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>	2025-03-25 06:53:49 -07:00
Joel Sing	b70244ff7a	cmd/compile: intrinsify math/bits.Len on riscv64 For riscv64/rva22u64 and above, we can intrinsify math/bits.Len using the CLZ/CLZW machine instructions. On a StarFive VisionFive 2 with GORISCV64=rva22u64: │ clz.b.1 │ clz.b.2 │ │ sec/op │ sec/op vs base │ LeadingZeros-4 28.89n ± 0% 12.08n ± 0% -58.19% (p=0.000 n=10) LeadingZeros8-4 18.79n ± 0% 14.76n ± 0% -21.45% (p=0.000 n=10) LeadingZeros16-4 25.27n ± 0% 14.76n ± 0% -41.59% (p=0.000 n=10) LeadingZeros32-4 25.12n ± 0% 12.08n ± 0% -51.92% (p=0.000 n=10) LeadingZeros64-4 25.89n ± 0% 12.08n ± 0% -53.35% (p=0.000 n=10) geomean 24.55n 13.09n -46.70% Change-Id: I0dda684713dbdf5336af393f5ccbdae861c4f694 Reviewed-on: https://go-review.googlesource.com/c/go/+/652321 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2025-03-21 18:21:44 -07:00
Joel Sing	6fb7bdc96d	cmd/compile: intrinsify math/bits.TrailingZeros on riscv64 For riscv64/rva22u64 and above, we can intrinsify math/bits.TrailingZeros using the CTZ/CTZW machine instructions. On a StarFive VisionFive 2 with GORISCV64=rva22u64: │ ctz.b.1 │ ctz.b.2 │ │ sec/op │ sec/op vs base │ TrailingZeros-4 25.500n ± 0% 8.052n ± 0% -68.42% (p=0.000 n=10) TrailingZeros8-4 14.76n ± 0% 10.74n ± 0% -27.24% (p=0.000 n=10) TrailingZeros16-4 26.84n ± 0% 10.74n ± 0% -59.99% (p=0.000 n=10) TrailingZeros32-4 25.500n ± 0% 8.052n ± 0% -68.42% (p=0.000 n=10) TrailingZeros64-4 25.500n ± 0% 8.052n ± 0% -68.42% (p=0.000 n=10) geomean 23.09n 9.035n -60.88% Change-Id: I71edf2b988acb7a68e797afda4ee66d7a57d587e Reviewed-on: https://go-review.googlesource.com/c/go/+/652320 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>	2025-03-15 19:07:53 -07:00
Joel Sing	21417518a9	cmd/compile: combine negation and word sign extension on riscv64 Use NEGW to produce a negated and sign extended word, rather than doing the same via two instructions: neg t0, t0 sext.w a0, t0 Becomes: negw t0, t0 Change-Id: I824ab25001bd3304bdbd435e7b244fcc036ef212 Reviewed-on: https://go-review.googlesource.com/c/go/+/652319 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2025-03-15 06:05:16 -07:00
Joel Sing	10d070668c	cmd/compile/internal/ssa: remove double negation with addition on riscv64 On riscv64, subtraction from a constant is typically implemented as an ADDI with the negative constant, followed by a negation. However this can lead to multiple NEG/ADDI/NEG sequences that can be optimised out. For example, runtime.(*_panic).nextDefer currently contains: lbu t0, 0(t0) addi t0, t0, -8 neg t0, t0 addi t0, t0, -7 neg t0, t0 Which is now optimised to: lbu t0, 0(t0) addi t0, t0, -1 Change-Id: Idf5815e6db2e3705cc4a4811ca9130a064ae3d80 Reviewed-on: https://go-review.googlesource.com/c/go/+/652318 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: David Chase <drchase@google.com>	2025-03-15 06:04:28 -07:00
Joel Sing	a8f2e63f2f	test/codegen: add a test for negation and conversion to int32 Codify the current code generation used on riscv64 in this case. Change-Id: If4152e3652fc19d0aa28b79dba08abee2486d5ae Reviewed-on: https://go-review.googlesource.com/c/go/+/652317 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-03-15 06:02:57 -07:00
Joel Sing	e1f9013a58	test/codegen: add riscv64 codegen for arithmetic tests Codify the current riscv64 code generation for various subtract from constant and addition/subtraction tests. Change-Id: I54ad923280a0578a338bc4431fa5bdc0644c4729 Reviewed-on: https://go-review.googlesource.com/c/go/+/652316 Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: David Chase <drchase@google.com>	2025-03-15 06:02:27 -07:00
Joel Sing	c01fa0cc21	test/codegen: add riscv64/rva23u64 specifiers to existing tests Tests that exist for riscv64/rva22u64 should also be applied to riscv64/rva23u64. Change-Id: Ia529fdf0ac55b8bcb3dcd24fa80efef2351f3842 Reviewed-on: https://go-review.googlesource.com/c/go/+/652315 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: David Chase <drchase@google.com>	2025-03-15 05:58:43 -07:00
Joel Sing	c1c7e5902f	test/codegen: tighten the TrailingZeros64 test on 386 Make the TrailingZeros64 code generation check more specific for 386. Just checking for BSFL will match both the generic 64 bit decomposition and the custom 386 lowering. Change-Id: I62076f1889af0ef1f29704cba01ab419cae0c6e3 Reviewed-on: https://go-review.googlesource.com/c/go/+/656996 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-03-14 15:04:38 -07:00
Joel Sing	af92bb594d	test/codegen: remove plan9/amd64 specific array zeroing/copying tests The compiler previously avoided the use of MOVUPS on plan9/amd64. This was changed in CL 655875, however the codegen tests were not updated and now fail (seemingly the full codegen tests do not run anywhere, not even on the longtest builders). Change-Id: I388b60e7b0911048d4949c5029347f9801c018a9 Reviewed-on: https://go-review.googlesource.com/c/go/+/656997 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Auto-Submit: Keith Randall <khr@google.com>	2025-03-13 05:19:13 -07:00
Xiaolin Zhao	b143c98169	cmd/compile: simplify bounded shift on loong64 Use the shiftIsBounded function to generate more efficient shift instructions. This change also optimize shift ops when the shift value is v&63 and v&31. goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A6000-HV @ 2500.00MHz \| CL 627855 \| this CL \| \| sec/op \| sec/op vs base \| LeadingZeros 1.1005n ± 0% 0.8425n ± 1% -23.44% (p=0.000 n=10) LeadingZeros8 1.502n ± 0% 1.501n ± 0% -0.07% (p=0.001 n=10) LeadingZeros16 1.502n ± 0% 1.501n ± 0% -0.07% (p=0.000 n=10) LeadingZeros32 0.9511n ± 0% 0.8050n ± 0% -15.36% (p=0.000 n=10) LeadingZeros64 1.1195n ± 0% 0.8423n ± 0% -24.76% (p=0.000 n=10) TrailingZeros 0.8086n ± 0% 0.8005n ± 0% -1.00% (p=0.000 n=10) TrailingZeros8 1.031n ± 1% 1.035n ± 1% ~ (p=0.136 n=10) TrailingZeros16 0.8114n ± 0% 0.8254n ± 1% +1.73% (p=0.000 n=10) TrailingZeros32 0.8090n ± 0% 0.8005n ± 0% -1.05% (p=0.000 n=10) TrailingZeros64 0.8089n ± 1% 0.8005n ± 0% -1.04% (p=0.000 n=10) OnesCount 0.8677n ± 0% 1.2010n ± 0% +38.41% (p=0.000 n=10) OnesCount8 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10) OnesCount16 0.9344n ± 0% 1.2010n ± 0% +28.53% (p=0.000 n=10) OnesCount32 0.8677n ± 0% 1.2010n ± 0% +38.41% (p=0.000 n=10) OnesCount64 1.2010n ± 0% 0.8671n ± 0% -27.80% (p=0.000 n=10) RotateLeft 0.8009n ± 0% 0.6671n ± 0% -16.71% (p=0.000 n=10) RotateLeft8 1.202n ± 0% 1.327n ± 0% +10.40% (p=0.000 n=10) RotateLeft16 0.8036n ± 0% 0.8218n ± 0% +2.26% (p=0.000 n=10) RotateLeft32 0.6674n ± 0% 0.8004n ± 0% +19.94% (p=0.000 n=10) RotateLeft64 0.6674n ± 0% 0.8004n ± 0% +19.94% (p=0.000 n=10) Reverse 0.4067n ± 1% 0.4122n ± 1% +1.38% (p=0.001 n=10) Reverse8 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10) Reverse16 0.8009n ± 0% 0.8005n ± 0% -0.05% (p=0.000 n=10) Reverse32 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.001 n=10) Reverse64 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.008 n=10) ReverseBytes 0.4057n ± 1% 0.4133n ± 1% +1.90% (p=0.000 n=10) ReverseBytes16 0.8009n ± 0% 0.8004n ± 0% -0.07% (p=0.000 n=10) ReverseBytes32 0.8009n ± 0% 0.8005n ± 0% -0.05% (p=0.000 n=10) ReverseBytes64 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10) Add 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10) Add32 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10) Add64 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10) Add64multiple 1.832n ± 0% 1.828n ± 0% -0.22% (p=0.001 n=10) Sub 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10) Sub32 1.602n ± 0% 1.601n ± 0% -0.06% (p=0.000 n=10) Sub64 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10) Sub64multiple 2.402n ± 0% 2.400n ± 0% -0.10% (p=0.000 n=10) Mul 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10) Mul32 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10) Mul64 0.8008n ± 0% 0.8004n ± 0% -0.05% (p=0.000 n=10) Div 9.083n ± 0% 7.638n ± 0% -15.91% (p=0.000 n=10) Div32 4.011n ± 0% 4.009n ± 0% -0.05% (p=0.000 n=10) Div64 9.711n ± 0% 8.204n ± 0% -15.51% (p=0.000 n=10) geomean 1.083n 1.078n -0.40% goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A5000 @ 2500.00MHz \| CL 627855 \| this CL \| \| sec/op \| sec/op vs base \| LeadingZeros 1.341n ± 4% 1.331n ± 2% -0.71% (p=0.008 n=10) LeadingZeros8 1.781n ± 0% 1.766n ± 1% -0.84% (p=0.011 n=10) LeadingZeros16 1.782n ± 0% 1.767n ± 0% -0.79% (p=0.001 n=10) LeadingZeros32 1.341n ± 1% 1.333n ± 0% -0.52% (p=0.001 n=10) LeadingZeros64 1.338n ± 0% 1.333n ± 0% -0.37% (p=0.008 n=10) TrailingZeros 0.9025n ± 0% 0.8077n ± 0% -10.50% (p=0.000 n=10) TrailingZeros8 1.056n ± 0% 1.089n ± 1% +3.17% (p=0.001 n=10) TrailingZeros16 1.101n ± 0% 1.102n ± 0% +0.09% (p=0.011 n=10) TrailingZeros32 0.9024n ± 1% 0.8083n ± 0% -10.43% (p=0.000 n=10) TrailingZeros64 0.9028n ± 1% 0.8087n ± 0% -10.43% (p=0.000 n=10) OnesCount 1.482n ± 1% 1.302n ± 0% -12.15% (p=0.000 n=10) OnesCount8 1.206n ± 0% 1.207n ± 2% +0.12% (p=0.000 n=10) OnesCount16 1.534n ± 0% 1.402n ± 0% -8.58% (p=0.000 n=10) OnesCount32 1.531n ± 1% 1.302n ± 0% -14.99% (p=0.000 n=10) OnesCount64 1.302n ± 0% 1.538n ± 1% +18.16% (p=0.000 n=10) RotateLeft 0.8083n ± 0% 0.8087n ± 1% ~ (p=0.579 n=10) RotateLeft8 1.310n ± 0% 1.323n ± 0% +0.95% (p=0.001 n=10) RotateLeft16 1.149n ± 0% 1.165n ± 1% +1.35% (p=0.001 n=10) RotateLeft32 0.8093n ± 0% 0.8105n ± 0% ~ (p=0.393 n=10) RotateLeft64 0.8088n ± 0% 0.8090n ± 0% ~ (p=0.739 n=10) Reverse 0.5109n ± 0% 0.5172n ± 1% +1.25% (p=0.000 n=10) Reverse8 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.000 n=10) Reverse16 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.002 n=10) Reverse32 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.000 n=10) Reverse64 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.005 n=10) ReverseBytes 0.5122n ± 2% 0.5182n ± 1% ~ (p=0.060 n=10) ReverseBytes16 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.005 n=10) ReverseBytes32 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.005 n=10) ReverseBytes64 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.001 n=10) Add 1.201n ± 4% 1.202n ± 0% +0.08% (p=0.028 n=10) Add32 1.201n ± 0% 1.202n ± 2% +0.08% (p=0.014 n=10) Add64 1.201n ± 1% 1.202n ± 0% +0.08% (p=0.025 n=10) Add64multiple 1.902n ± 0% 1.913n ± 0% +0.55% (p=0.004 n=10) Sub 1.201n ± 0% 1.202n ± 3% +0.08% (p=0.001 n=10) Sub32 1.654n ± 0% 1.656n ± 1% ~ (p=0.117 n=10) Sub64 1.201n ± 0% 1.202n ± 0% +0.08% (p=0.001 n=10) Sub64multiple 2.180n ± 4% 2.159n ± 1% -0.96% (p=0.006 n=10) Mul 0.9345n ± 0% 0.9346n ± 0% +0.01% (p=0.000 n=10) Mul32 1.030n ± 0% 1.050n ± 1% +1.94% (p=0.000 n=10) Mul64 0.9345n ± 0% 0.9346n ± 1% +0.01% (p=0.000 n=10) Div 11.57n ± 1% 11.12n ± 0% -3.85% (p=0.000 n=10) Div32 4.337n ± 1% 4.341n ± 1% ~ (p=0.286 n=10) Div64 12.76n ± 0% 12.02n ± 3% -5.80% (p=0.000 n=10) geomean 1.252n 1.235n -1.32% Change-Id: Iec4cfd2b83bb0f946068c1d657369ff081d95b04 Reviewed-on: https://go-review.googlesource.com/c/go/+/628575 Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: David Chase <drchase@google.com>	2025-03-12 18:18:03 -07:00
Xiaolin Zhao	2a772a2fe7	cmd/compile: optimize shifts of int32 and uint32 on loong64 goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A6000-HV @ 2500.00MHz \| bench.old \| bench.new \| \| sec/op \| sec/op vs base \| LeadingZeros 1.100n ± 1% 1.101n ± 0% ~ (p=0.566 n=10) LeadingZeros8 1.501n ± 0% 1.502n ± 0% +0.07% (p=0.000 n=10) LeadingZeros16 1.501n ± 0% 1.502n ± 0% +0.07% (p=0.000 n=10) LeadingZeros32 1.2010n ± 0% 0.9511n ± 0% -20.81% (p=0.000 n=10) LeadingZeros64 1.104n ± 1% 1.119n ± 0% +1.40% (p=0.000 n=10) TrailingZeros 0.8137n ± 0% 0.8086n ± 0% -0.63% (p=0.001 n=10) TrailingZeros8 1.031n ± 1% 1.031n ± 1% ~ (p=0.956 n=10) TrailingZeros16 0.8204n ± 1% 0.8114n ± 0% -1.11% (p=0.000 n=10) TrailingZeros32 0.8145n ± 0% 0.8090n ± 0% -0.68% (p=0.000 n=10) TrailingZeros64 0.8159n ± 0% 0.8089n ± 1% -0.86% (p=0.000 n=10) OnesCount 0.8672n ± 0% 0.8677n ± 0% +0.06% (p=0.000 n=10) OnesCount8 0.8005n ± 0% 0.8009n ± 0% +0.06% (p=0.000 n=10) OnesCount16 0.9339n ± 0% 0.9344n ± 0% +0.05% (p=0.000 n=10) OnesCount32 0.8672n ± 0% 0.8677n ± 0% +0.06% (p=0.000 n=10) OnesCount64 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10) RotateLeft 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10) RotateLeft8 1.202n ± 0% 1.202n ± 0% ~ (p=0.210 n=10) RotateLeft16 0.8050n ± 0% 0.8036n ± 0% -0.17% (p=0.002 n=10) RotateLeft32 0.6674n ± 0% 0.6674n ± 0% ~ (p=1.000 n=10) RotateLeft64 0.6673n ± 0% 0.6674n ± 0% ~ (p=0.072 n=10) Reverse 0.4123n ± 0% 0.4067n ± 1% -1.37% (p=0.000 n=10) Reverse8 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10) Reverse16 0.8004n ± 0% 0.8009n ± 0% +0.06% (p=0.000 n=10) Reverse32 0.8004n ± 0% 0.8009n ± 0% +0.06% (p=0.000 n=10) Reverse64 0.8004n ± 0% 0.8009n ± 0% +0.06% (p=0.001 n=10) ReverseBytes 0.4100n ± 1% 0.4057n ± 1% -1.06% (p=0.002 n=10) ReverseBytes16 0.8004n ± 0% 0.8009n ± 0% +0.07% (p=0.000 n=10) ReverseBytes32 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10) ReverseBytes64 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10) Add 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10) Add32 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10) Add64 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10) Add64multiple 1.831n ± 0% 1.832n ± 0% ~ (p=1.000 n=10) Sub 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10) Sub32 1.601n ± 0% 1.602n ± 0% +0.06% (p=0.000 n=10) Sub64 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10) Sub64multiple 2.400n ± 0% 2.402n ± 0% +0.10% (p=0.000 n=10) Mul 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10) Mul32 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10) Mul64 0.8004n ± 0% 0.8008n ± 0% +0.05% (p=0.000 n=10) Div 9.107n ± 0% 9.083n ± 0% ~ (p=0.255 n=10) Div32 4.009n ± 0% 4.011n ± 0% +0.05% (p=0.000 n=10) Div64 9.705n ± 0% 9.711n ± 0% +0.06% (p=0.000 n=10) geomean 1.089n 1.083n -0.62% goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A5000 @ 2500.00MHz \| bench.old \| bench.new \| \| sec/op \| sec/op vs base \| LeadingZeros 1.352n ± 0% 1.341n ± 4% -0.81% (p=0.024 n=10) LeadingZeros8 1.766n ± 0% 1.781n ± 0% +0.88% (p=0.000 n=10) LeadingZeros16 1.766n ± 0% 1.782n ± 0% +0.88% (p=0.000 n=10) LeadingZeros32 1.536n ± 0% 1.341n ± 1% -12.73% (p=0.000 n=10) LeadingZeros64 1.351n ± 1% 1.338n ± 0% -0.96% (p=0.000 n=10) TrailingZeros 0.9037n ± 0% 0.9025n ± 0% -0.12% (p=0.020 n=10) TrailingZeros8 1.087n ± 3% 1.056n ± 0% ~ (p=0.060 n=10) TrailingZeros16 1.101n ± 0% 1.101n ± 0% ~ (p=0.211 n=10) TrailingZeros32 0.9040n ± 0% 0.9024n ± 1% -0.18% (p=0.017 n=10) TrailingZeros64 0.9043n ± 0% 0.9028n ± 1% ~ (p=0.118 n=10) OnesCount 1.503n ± 2% 1.482n ± 1% -1.43% (p=0.001 n=10) OnesCount8 1.207n ± 0% 1.206n ± 0% -0.12% (p=0.000 n=10) OnesCount16 1.501n ± 0% 1.534n ± 0% +2.13% (p=0.000 n=10) OnesCount32 1.483n ± 1% 1.531n ± 1% +3.27% (p=0.000 n=10) OnesCount64 1.301n ± 0% 1.302n ± 0% +0.08% (p=0.000 n=10) RotateLeft 0.8136n ± 4% 0.8083n ± 0% -0.66% (p=0.002 n=10) RotateLeft8 1.311n ± 0% 1.310n ± 0% ~ (p=0.786 n=10) RotateLeft16 1.165n ± 0% 1.149n ± 0% -1.33% (p=0.001 n=10) RotateLeft32 0.8138n ± 1% 0.8093n ± 0% -0.57% (p=0.017 n=10) RotateLeft64 0.8149n ± 1% 0.8088n ± 0% -0.74% (p=0.000 n=10) Reverse 0.5195n ± 1% 0.5109n ± 0% -1.67% (p=0.000 n=10) Reverse8 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10) Reverse16 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10) Reverse32 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.012 n=10) Reverse64 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.010 n=10) ReverseBytes 0.5120n ± 1% 0.5122n ± 2% ~ (p=0.306 n=10) ReverseBytes16 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10) ReverseBytes32 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10) ReverseBytes64 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10) Add 1.201n ± 0% 1.201n ± 4% ~ (p=0.334 n=10) Add32 1.201n ± 0% 1.201n ± 0% ~ (p=0.563 n=10) Add64 1.201n ± 0% 1.201n ± 1% ~ (p=0.652 n=10) Add64multiple 1.909n ± 0% 1.902n ± 0% ~ (p=0.126 n=10) Sub 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10) Sub32 1.655n ± 0% 1.654n ± 0% ~ (p=0.589 n=10) Sub64 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10) Sub64multiple 2.150n ± 0% 2.180n ± 4% +1.37% (p=0.000 n=10) Mul 0.9341n ± 0% 0.9345n ± 0% +0.04% (p=0.011 n=10) Mul32 1.053n ± 0% 1.030n ± 0% -2.23% (p=0.000 n=10) Mul64 0.9341n ± 0% 0.9345n ± 0% +0.04% (p=0.018 n=10) Div 11.59n ± 0% 11.57n ± 1% ~ (p=0.091 n=10) Div32 4.337n ± 0% 4.337n ± 1% ~ (p=0.783 n=10) Div64 12.81n ± 0% 12.76n ± 0% -0.39% (p=0.001 n=10) geomean 1.257n 1.252n -0.46% Change-Id: I9e93ea49736760c19dc6b6463d2aa95878121b7b Reviewed-on: https://go-review.googlesource.com/c/go/+/627855 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Junyang Shao <shaojunyang@google.com>	2025-03-10 17:55:10 -07:00
Joel Sing	927fdb7843	cmd/compile: simplify intrinsification of TrailingZeros16 and TrailingZeros8 Decompose Ctz16 and Ctz8 within the SSA rules for LOONG64, MIPS, PPC64 and S390X, rather than having a custom intrinsic. Note that for PPC64 this actually allows the existing Ctz16 and Ctz8 rules to be used. Change-Id: I27a5e978f852b9d75396d2a80f5d7dfcb5ef7dd4 Reviewed-on: https://go-review.googlesource.com/c/go/+/651816 Reviewed-by: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2025-02-27 03:45:44 -08:00
Mateusz Poliwczak	43e6525986	cmd/compile: load properly constant values from itabs While looking at the SSA of following code, i noticed that these rules do not work properly, and the types are loaded indirectly through an itab, instead of statically. type M interface{ M() } type A interface{ A() } type Impl struct{} func (Impl) M() {} func (Impl) A() {} func main() { var a M = &Impl{} a.(A).A() } Change-Id: Ia275993f81a2e7302102d4ff87ac28586023d13c GitHub-Last-Rev: `4bfc901917` GitHub-Pull-Request: golang/go#71784 Reviewed-on: https://go-review.googlesource.com/c/go/+/649500 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2025-02-19 13:39:00 -08:00
Jakub Ciolek	d524e1eccd	cmd/compile: on AMD64, turn x < 128 into x <= 127 x < 128 -> x <= 127 x >= 128 -> x > 127 This allows for shorter encoding as 127 fits into a single-byte immediate. archive/tar benchmark (Alder Lake 12600K) name old time/op new time/op delta /Writer/USTAR-16 1.46µs ± 0% 1.32µs ± 0% -9.43% (p=0.008 n=5+5) /Writer/GNU-16 1.85µs ± 1% 1.79µs ± 1% -3.47% (p=0.008 n=5+5) /Writer/PAX-16 3.21µs ± 0% 3.11µs ± 2% -2.96% (p=0.008 n=5+5) /Reader/USTAR-16 1.38µs ± 1% 1.37µs ± 0% ~ (p=0.127 n=5+4) /Reader/GNU-16 798ns ± 1% 800ns ± 2% ~ (p=0.548 n=5+5) /Reader/PAX-16 3.07µs ± 1% 3.00µs ± 0% -2.35% (p=0.008 n=5+5) [Geo mean] 1.76µs 1.70µs -3.15% compilecmp: hash/maphash hash/maphash.(Hash).Write 517 -> 510 (-1.35%) runtime runtime.traceReadCPU 1626 -> 1615 (-0.68%) runtime [cmd/compile] runtime.traceReadCPU 1626 -> 1615 (-0.68%) math/rand/v2 type:.eq.[128]float32 65 -> 59 (-9.23%) bytes bytes.trimLeftUnicode 378 -> 373 (-1.32%) bytes.IndexAny 1189 -> 1157 (-2.69%) bytes.LastIndexAny 1256 -> 1239 (-1.35%) bytes.lastIndexFunc 263 -> 261 (-0.76%) strings strings.FieldsFuncSeq.func1 411 -> 399 (-2.92%) strings.EqualFold 625 -> 624 (-0.16%) strings.trimLeftUnicode 248 -> 231 (-6.85%) math/rand type:.eq.[128]float32 65 -> 59 (-9.23%) bytes [cmd/compile] bytes.LastIndexAny 1256 -> 1239 (-1.35%) bytes.lastIndexFunc 263 -> 261 (-0.76%) bytes.trimLeftUnicode 378 -> 373 (-1.32%) bytes.IndexAny 1189 -> 1157 (-2.69%) regexp/syntax regexp/syntax.(parser).parseEscape 1113 -> 1102 (-0.99%) math/rand/v2 [cmd/compile] type:.eq.[128]float32 65 -> 59 (-9.23%) strings [cmd/compile] strings.EqualFold 625 -> 624 (-0.16%) strings.FieldsFuncSeq.func1 411 -> 399 (-2.92%) strings.trimLeftUnicode 248 -> 231 (-6.85%) math/rand [cmd/compile] type:.eq.[128]float32 65 -> 59 (-9.23%) regexp regexp.(inputString).context 198 -> 197 (-0.51%) regexp.(inputBytes).context 221 -> 212 (-4.07%) image/jpeg image/jpeg.(decoder).processDQT 500 -> 491 (-1.80%) regexp/syntax [cmd/compile] regexp/syntax.(parser).parseEscape 1113 -> 1102 (-0.99%) regexp [cmd/compile] regexp.(inputString).context 198 -> 197 (-0.51%) regexp.(inputBytes).context 221 -> 212 (-4.07%) encoding/csv encoding/csv.(Writer).fieldNeedsQuotes 269 -> 266 (-1.12%) cmd/vendor/golang.org/x/sys/unix type:.eq.[131]struct 855 -> 823 (-3.74%) vendor/golang.org/x/text/unicode/norm vendor/golang.org/x/text/unicode/norm.nextDecomposed 4831 -> 4826 (-0.10%) vendor/golang.org/x/text/unicode/norm.(Iter).returnSlice 281 -> 275 (-2.14%) vendor/golang.org/x/text/secure/bidirule vendor/golang.org/x/text/secure/bidirule.init.0 85 -> 83 (-2.35%) go/scanner go/scanner.isDigit 100 -> 98 (-2.00%) go/scanner.(Scanner).next 431 -> 422 (-2.09%) go/scanner.isLetter 142 -> 124 (-12.68%) encoding/asn1 encoding/asn1.parseTagAndLength 1189 -> 1182 (-0.59%) encoding/asn1.makeField 3481 -> 3463 (-0.52%) text/scanner text/scanner.(Scanner).next 1242 -> 1236 (-0.48%) archive/tar archive/tar.isASCII 133 -> 127 (-4.51%) archive/tar.(Writer).writeRawFile 1206 -> 1198 (-0.66%) archive/tar.(Reader).readHeader.func1 9 -> 7 (-22.22%) archive/tar.toASCII 393 -> 383 (-2.54%) archive/tar.splitUSTARPath 405 -> 396 (-2.22%) archive/tar.(Writer).writePAXHeader.func1 627 -> 620 (-1.12%) text/template text/template.jsIsSpecial 59 -> 57 (-3.39%) go/doc go/doc.assumedPackageName 714 -> 701 (-1.82%) vendor/golang.org/x/net/http/httpguts vendor/golang.org/x/net/http/httpguts.headerValueContainsToken 965 -> 952 (-1.35%) vendor/golang.org/x/net/http/httpguts.tokenEqual 280 -> 269 (-3.93%) vendor/golang.org/x/net/http/httpguts.IsTokenRune 28 -> 26 (-7.14%) net/mail net/mail.isVchar 26 -> 24 (-7.69%) net/mail.isAtext 106 -> 104 (-1.89%) net/mail.(Address).String 1084 -> 1052 (-2.95%) net/mail.isQtext 39 -> 37 (-5.13%) net/mail.isMultibyte 9 -> 7 (-22.22%) net/mail.isDtext 45 -> 43 (-4.44%) net/mail.(addrParser).consumeQuotedString 1050 -> 1029 (-2.00%) net/mail.quoteString 741 -> 714 (-3.64%) cmd/internal/obj/s390x cmd/internal/obj/s390x.preprocess 6405 -> 6393 (-0.19%) cmd/internal/obj/x86 cmd/internal/obj/x86.toDisp8 303 -> 301 (-0.66%) fmt [cmd/compile] fmt.Fprintf 4726 -> 4662 (-1.35%) go/scanner [cmd/compile] go/scanner.(Scanner).next 431 -> 422 (-2.09%) go/scanner.isLetter 142 -> 124 (-12.68%) go/scanner.isDigit 100 -> 98 (-2.00%) cmd/compile/internal/syntax cmd/compile/internal/syntax.(source).nextch 879 -> 847 (-3.64%) cmd/vendor/golang.org/x/mod/module cmd/vendor/golang.org/x/mod/module.checkElem 1253 -> 1235 (-1.44%) cmd/vendor/golang.org/x/mod/module.escapeString 519 -> 517 (-0.39%) go/doc [cmd/compile] go/doc.assumedPackageName 714 -> 701 (-1.82%) cmd/compile/internal/syntax [cmd/compile] cmd/compile/internal/syntax.(scanner).escape 1965 -> 1933 (-1.63%) cmd/compile/internal/syntax.(scanner).next 8975 -> 8847 (-1.43%) cmd/internal/obj/s390x [cmd/compile] cmd/internal/obj/s390x.preprocess 6405 -> 6393 (-0.19%) cmd/internal/obj/x86 [cmd/compile] cmd/internal/obj/x86.toDisp8 303 -> 301 (-0.66%) cmd/internal/gcprog cmd/internal/gcprog.(Writer).Repeat 688 -> 677 (-1.60%) cmd/internal/gcprog.(Writer).varint 442 -> 439 (-0.68%) cmd/compile/internal/ir cmd/compile/internal/ir.splitPkg 331 -> 325 (-1.81%) cmd/compile/internal/ir [cmd/compile] cmd/compile/internal/ir.splitPkg 331 -> 325 (-1.81%) net/http net/http.containsDotDot.FieldsFuncSeq.func1 411 -> 399 (-2.92%) net/http.isNotToken 33 -> 30 (-9.09%) net/http.containsDotDot 606 -> 588 (-2.97%) net/http.isCookieNameValid 197 -> 191 (-3.05%) net/http.parsePattern 4330 -> 4317 (-0.30%) net/http.ParseCookie 1099 -> 1096 (-0.27%) net/http.validMethod 197 -> 187 (-5.08%) cmd/vendor/golang.org/x/text/unicode/norm cmd/vendor/golang.org/x/text/unicode/norm.(Iter).returnSlice 281 -> 275 (-2.14%) cmd/vendor/golang.org/x/text/unicode/norm.nextDecomposed 4831 -> 4826 (-0.10%) net/http/cookiejar net/http/cookiejar.encode 1936 -> 1918 (-0.93%) expvar expvar.appendJSONQuote 972 -> 965 (-0.72%) cmd/cgo/internal/test cmd/cgo/internal/test.stack128 116 -> 114 (-1.72%) cmd/vendor/rsc.io/markdown cmd/vendor/rsc.io/markdown.newATXHeading 1637 -> 1628 (-0.55%) cmd/vendor/rsc.io/markdown.isUnicodePunct 197 -> 179 (-9.14%) Change-Id: I578bdf42ef229d687d526e378d697ced51e1880c Reviewed-on: https://go-review.googlesource.com/c/go/+/639935 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>	2025-02-16 07:23:13 -08:00
Keith Randall	beac2f7d3b	cmd/compile: fix sign extension of paired 32-bit loads on arm64 Fixes #71759 Change-Id: Iab05294ac933cc9972949158d3fe2bdc3073df5e Reviewed-on: https://go-review.googlesource.com/c/go/+/649895 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>	2025-02-15 07:53:28 -08:00
Keith Randall	187fd2698d	cmd/compile: make write barrier code amenable to paired loads/stores It currently isn't because it does load/store/load/store/... Rework to do overwrite processing in pairs so it is instead load/load/store/store/... Change-Id: If7be629bc4048da5f2386dafb8f05759b79e9e2b Reviewed-on: https://go-review.googlesource.com/c/go/+/631495 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-02-13 14:08:14 -08:00
Keith Randall	a0029e95e5	cmd/compile: regalloc: handle desired registers of 2-output insns Particularly with 2-word load instructions, this becomes important. Classic example is: func f(p string) string { return p } We want the two loads to put the return values directly into the two ABI return registers. At this point in the stack, cmd/go is 1.1% smaller. Change-Id: I51fd1710238e81d15aab2bfb816d73c8e7c207b1 Reviewed-on: https://go-review.googlesource.com/c/go/+/631137 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-02-13 14:08:07 -08:00

1 2 3 4 5 ...

620 commits