mirror of
https://github.com/golang/go.git
synced 2025-12-08 06:10:04 +00:00
85 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
0d0d5c9a82 |
test/codegen: test negation with add/sub on riscv64
Change-Id: Ic0eca86d3c93707ebd7c716e774ebda55af4f196 Reviewed-on: https://go-review.googlesource.com/c/go/+/703755 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Julian Zhu <jz531210@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> |
||
|
|
feae743bdb |
cmd/compile: use 32x32->64 multiplies on loong64
Gets rid of some sign extensions, like arm64. Change-Id: I9fc37e15a82718bfcf53db8cab0c4e7baaa0a747 Reviewed-on: https://go-review.googlesource.com/c/go/+/721522 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
e7d47ac33d |
cmd/compile: simplify negative on multiplication
goos: linux
goarch: amd64
pkg: cmd/compile/internal/test
cpu: AMD EPYC 7532 32-Core Processor
│ simplify_base │ simplify_new │
│ sec/op │ sec/op vs base │
SimplifyNegMul 623.0n ± 0% 319.3n ± 1% -48.75% (p=0.000 n=10)
goos: linux
goarch: riscv64
pkg: cmd/compile/internal/test
cpu: Spacemit(R) X60
│ simplify.base │ simplify.new │
│ sec/op │ sec/op vs base │
SimplifyNegMul 10.928µ ± 0% 6.432µ ± 0% -41.14% (p=0.000 n=10)
Change-Id: I1d9393cd19a0b948a5d3a512d627cdc0cf0b38be
Reviewed-on: https://go-review.googlesource.com/c/go/+/721520
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
||
|
|
e1a12c781f |
cmd/compile: use 32x32->64 multiplies on arm64
Gets rid of some sign extensions. Change-Id: Ie67ef36b4ca1cd1a2cd9fa5d84578db553578a22 Reviewed-on: https://go-review.googlesource.com/c/go/+/721241 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@google.com> |
||
|
|
2cdcc4150b |
cmd/compile: fold negation into multiplication
goos: linux
goarch: riscv64
pkg: cmd/compile/internal/test
cpu: Spacemit(R) X60
│ /root/mul.base.log │ /root/mul.new.log │
│ sec/op │ sec/op vs base │
MulNeg 6.426µ ± 0% 4.501µ ± 0% -29.96% (p=0.000 n=10)
Mul2Neg 9.000µ ± 0% 6.431µ ± 0% -28.54% (p=0.000 n=10)
Mul2 1.263µ ± 0% 1.263µ ± 0% ~ (p=1.000 n=10)
MulNeg2 1.577µ ± 0% 1.577µ ± 0% ~ (p=0.211 n=10)
geomean 3.276µ 2.756µ -15.89%
goos: linux
goarch: amd64
pkg: cmd/compile/internal/test
cpu: AMD EPYC 7532 32-Core Processor
│ /root/base │ /root/new │
│ sec/op │ sec/op vs base │
MulNeg 691.9n ± 1% 319.4n ± 0% -53.83% (p=0.000 n=10)
Mul2Neg 630.0n ± 0% 629.6n ± 0% -0.07% (p=0.000 n=10)
Mul2 438.1n ± 0% 438.1n ± 0% ~ (p=0.728 n=10)
MulNeg2 439.3n ± 0% 439.4n ± 0% ~ (p=0.656 n=10)
geomean 538.2n 443.6n -17.58%
Change-Id: Ice8e6c8d1e8e3009ba8a0b1b689205174e199019
Reviewed-on: https://go-review.googlesource.com/c/go/+/720180
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
||
|
|
9bbda7c99d |
cmd/compile: make prove understand div, mod better
This CL introduces new divisible and divmod passes that rewrite divisibility checks and div, mod, and mul. These happen after prove, so that prove can make better sense of the code for deriving bounds, and they must run before decompose, so that 64-bit ops can be lowered to 32-bit ops on 32-bit systems. And then they need another generic pass as well, to optimize the generated code before decomposing. The three opt passes are "opt", "middle opt", and "late opt". (Perhaps instead they should be "generic", "opt", and "late opt"?) The "late opt" pass repeats the "middle opt" work on any new code that has been generated in the interim. There will not be new divs or mods, but there may be new muls. The x%c==0 rewrite rules are much simpler now, since they can match before divs have been rewritten. This has the effect of applying them more consistently and making the rewrite rules independent of the exact div rewrites. Prove is also now charged with marking signed div/mod as unsigned when the arguments call for it, allowing simpler code to be emitted in various cases. For example, t.Seconds()/2 and len(x)/2 are now recognized as unsigned, meaning they compile to a simple shift (unsigned division), avoiding the more complex fixup we need for signed values. https://gist.github.com/rsc/99d9d3bd99cde87b6a1a390e3d85aa32 shows a diff of 'go build -a -gcflags=-d=ssa/prove/debug=1 std' output before and after. "Proved Rsh64x64 shifts to zero" is replaced by the higher-level "Proved Div64 is unsigned" (the shift was in the signed expansion of div by constant), but otherwise prove is only finding more things to prove. One short example, in code that does x[i%len(x)]: < runtime/mfinal.go:131:34: Proved Rsh64x64 shifts to zero --- > runtime/mfinal.go:131:34: Proved Div64 is unsigned > runtime/mfinal.go:131:38: Proved IsInBounds A longer example: < crypto/internal/fips140/sha3/shake.go:28:30: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:38:27: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:53:46: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:55:46: Proved Rsh64x64 shifts to zero --- > crypto/internal/fips140/sha3/shake.go:28:30: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:28:30: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:28:30: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:38:27: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:45:7: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:46:4: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:53:46: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:53:46: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:53:46: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:55:46: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:55:46: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:55:46: Proved IsSliceInBounds These diffs are due to the smaller opt being better and taking work away from prove: < image/jpeg/dct.go:307:5: Proved IsInBounds < image/jpeg/dct.go:308:5: Proved IsInBounds ... < image/jpeg/dct.go:442:5: Proved IsInBounds In the old opt, Mul by 8 was rewritten to Lsh by 3 early. This CL delays that rule to help prove recognize mods, but it also helps opt constant-fold the slice x[8*i:8*i+8:8*i+8]. Specifically, computing the length, opt can now do: (Sub64 (Add (Mul 8 i) 8) (Add (Mul 8 i) 8)) -> (Add 8 (Sub (Mul 8 i) (Mul 8 i))) -> (Add 8 (Mul 8 (Sub i i))) -> (Add 8 (Mul 8 0)) -> (Add 8 0) -> 8 The key step is (Sub (Mul x y) (Mul x z)) -> (Mul x (Sub y z)), Leaving the multiply as Mul enables using that step; the old rewrite to Lsh blocked it, leaving prove to figure out the length and then remove the bounds checks. But now opt can evaluate the length down to a constant 8 and then constant-fold away the bounds checks 0 < 8, 1 < 8, and so on. After that, the compiler has nothing left to prove. Benchmarks are noisy in general; I checked the assembly for the many large increases below, and the vast majority are unchanged and presumably hitting the caches differently in some way. The divisibility optimizations were not reliably triggering before. This leads to a very large improvement in some cases, like DivisiblePow2constI64, DivisibleconstI64 on 64-bit systems and DivisbleconstU64 on 32-bit systems. Another way the divisibility optimizations were unreliable before was incorrectly triggering for x/3, x%3 even though they are written not to do that. There is a real but small slowdown in the DivisibleWDivconst benchmarks on Mac because in the cases used in the benchmark, it is still faster (on Mac) to do the divisibility check than to remultiply. This may be worth further study. Perhaps when there is no rotate (meaning the divisor is odd), the divisibility optimization should be enabled always. In any event, this CL makes it possible to study that. benchmark \ host s7 linux-amd64 mac linux-arm64 linux-ppc64le linux-386 s7:GOARCH=386 linux-arm vs base vs base vs base vs base vs base vs base vs base vs base LoadAdd ~ ~ ~ ~ ~ -1.59% ~ ~ ExtShift ~ ~ -42.14% +0.10% ~ +1.44% +5.66% +8.50% Modify ~ ~ ~ ~ ~ ~ ~ -1.53% MullImm ~ ~ ~ ~ ~ +37.90% -21.87% +3.05% ConstModify ~ ~ ~ ~ -49.14% ~ ~ ~ BitSet ~ ~ ~ ~ -15.86% -14.57% +6.44% +0.06% BitClear ~ ~ ~ ~ ~ +1.78% +3.50% +0.06% BitToggle ~ ~ ~ ~ ~ -16.09% +2.91% ~ BitSetConst ~ ~ ~ ~ ~ ~ ~ -0.49% BitClearConst ~ ~ ~ ~ -28.29% ~ ~ -0.40% BitToggleConst ~ ~ ~ +8.89% -31.19% ~ ~ -0.77% MulNeg ~ ~ ~ ~ ~ ~ ~ ~ Mul2Neg ~ ~ -4.83% ~ ~ -13.75% -5.92% ~ DivconstI64 ~ ~ ~ ~ ~ -30.12% ~ +0.50% ModconstI64 ~ ~ -9.94% -4.63% ~ +3.15% ~ +5.32% DivisiblePow2constI64 -34.49% -12.58% ~ ~ -12.25% ~ ~ ~ DivisibleconstI64 -24.69% -25.06% -0.40% -2.27% -42.61% -3.31% ~ +1.63% DivisibleWDivconstI64 ~ ~ ~ ~ ~ -17.55% ~ -0.60% DivconstU64/3 ~ ~ ~ ~ ~ +1.51% ~ ~ DivconstU64/5 ~ ~ ~ ~ ~ ~ ~ ~ DivconstU64/37 ~ ~ -0.18% ~ ~ +2.70% ~ ~ DivconstU64/1234567 ~ ~ ~ ~ ~ ~ ~ +0.12% ModconstU64 ~ ~ ~ -0.24% ~ -5.10% -1.07% -1.56% DivisibleconstU64 ~ ~ ~ ~ ~ -29.01% -59.13% -50.72% DivisibleWDivconstU64 ~ ~ -12.18% -18.88% ~ -5.50% -3.91% +5.17% DivconstI32 ~ ~ -0.48% ~ -34.69% +89.01% -6.01% -16.67% ModconstI32 ~ +2.95% -0.33% ~ ~ -2.98% -5.40% -8.30% DivisiblePow2constI32 ~ ~ ~ ~ ~ ~ ~ -16.22% DivisibleconstI32 ~ ~ ~ ~ ~ -37.27% -47.75% -25.03% DivisibleWDivconstI32 -11.59% +5.22% -12.99% -23.83% ~ +45.95% -7.03% -10.01% DivconstU32 ~ ~ ~ ~ ~ +74.71% +4.81% ~ ModconstU32 ~ ~ +0.53% +0.18% ~ +51.16% ~ ~ DivisibleconstU32 ~ ~ ~ -0.62% ~ -4.25% ~ ~ DivisibleWDivconstU32 -2.77% +5.56% +11.12% -5.15% ~ +48.70% +25.11% -4.07% DivconstI16 -6.06% ~ -0.33% +0.22% ~ ~ -9.68% +5.47% ModconstI16 ~ ~ +4.44% +2.82% ~ ~ ~ +5.06% DivisiblePow2constI16 ~ ~ ~ ~ ~ ~ ~ -0.17% DivisibleconstI16 ~ ~ -0.23% ~ ~ ~ +4.60% +6.64% DivisibleWDivconstI16 -1.44% -0.43% +13.48% -5.76% ~ +1.62% -23.15% -9.06% DivconstU16 +1.61% ~ -0.35% -0.47% ~ ~ +15.59% ~ ModconstU16 ~ ~ ~ ~ ~ -0.72% ~ +14.23% DivisibleconstU16 ~ ~ -0.05% +3.00% ~ ~ ~ +5.06% DivisibleWDivconstU16 +52.10% +0.75% +17.28% +4.79% ~ -37.39% +5.28% -9.06% DivconstI8 ~ ~ -0.34% -0.96% ~ ~ -9.20% ~ ModconstI8 +2.29% ~ +4.38% +2.96% ~ ~ ~ ~ DivisiblePow2constI8 ~ ~ ~ ~ ~ ~ ~ ~ DivisibleconstI8 ~ ~ ~ ~ ~ ~ +6.04% ~ DivisibleWDivconstI8 -26.44% +1.69% +17.03% +4.05% ~ +32.48% -24.90% ~ DivconstU8 -4.50% +14.06% -0.28% ~ ~ ~ +4.16% +0.88% ModconstU8 ~ ~ +25.84% -0.64% ~ ~ ~ ~ DivisibleconstU8 ~ ~ -5.70% ~ ~ ~ ~ ~ DivisibleWDivconstU8 +49.55% +9.07% ~ +4.03% +53.87% -40.03% +39.72% -3.01% Mul2 ~ ~ ~ ~ ~ ~ ~ ~ MulNeg2 ~ ~ ~ ~ -11.73% ~ ~ -0.02% EfaceInteger ~ ~ ~ ~ ~ +18.11% ~ +2.53% TypeAssert +33.90% +2.86% ~ ~ ~ -1.07% -5.29% -1.04% Div64UnsignedSmall ~ ~ ~ ~ ~ ~ ~ ~ Div64Small ~ ~ ~ ~ ~ -0.88% ~ +2.39% Div64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +0.35% Div64SmallNegDividend ~ ~ ~ ~ ~ -0.84% ~ +3.57% Div64SmallNegBoth ~ ~ ~ ~ ~ -0.86% ~ +3.55% Div64Unsigned ~ ~ ~ ~ ~ ~ ~ -0.11% Div64 ~ ~ ~ ~ ~ ~ ~ +0.11% Div64NegDivisor ~ ~ ~ ~ ~ -1.29% ~ ~ Div64NegDividend ~ ~ ~ ~ ~ -1.44% ~ ~ Div64NegBoth ~ ~ ~ ~ ~ ~ ~ +0.28% Mod64UnsignedSmall ~ ~ ~ ~ ~ +0.48% ~ +0.93% Mod64Small ~ ~ ~ ~ ~ ~ ~ ~ Mod64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +1.44% Mod64SmallNegDividend ~ ~ ~ ~ ~ +0.22% ~ +1.37% Mod64SmallNegBoth ~ ~ ~ ~ ~ ~ ~ -2.22% Mod64Unsigned ~ ~ ~ ~ ~ -0.95% ~ +0.11% Mod64 ~ ~ ~ ~ ~ ~ ~ ~ Mod64NegDivisor ~ ~ ~ ~ ~ ~ ~ -0.02% Mod64NegDividend ~ ~ ~ ~ ~ ~ ~ ~ Mod64NegBoth ~ ~ ~ ~ ~ ~ ~ -0.02% MulconstI32/3 ~ ~ ~ -25.00% ~ ~ ~ +47.37% MulconstI32/5 ~ ~ ~ +33.28% ~ ~ ~ +32.21% MulconstI32/12 ~ ~ ~ -2.13% ~ ~ ~ -0.02% MulconstI32/120 ~ ~ ~ +2.93% ~ ~ ~ -0.03% MulconstI32/-120 ~ ~ ~ -2.17% ~ ~ ~ -0.03% MulconstI32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03% MulconstI32/65538 ~ ~ ~ ~ ~ -33.38% ~ +0.04% MulconstI64/3 ~ ~ ~ +33.35% ~ -0.37% ~ -0.13% MulconstI64/5 ~ ~ ~ -25.00% ~ -0.34% ~ ~ MulconstI64/12 ~ ~ ~ +2.13% ~ +11.62% ~ +2.30% MulconstI64/120 ~ ~ ~ -1.98% ~ ~ ~ ~ MulconstI64/-120 ~ ~ ~ +0.75% ~ ~ ~ ~ MulconstI64/65537 ~ ~ ~ ~ ~ +5.61% ~ ~ MulconstI64/65538 ~ ~ ~ ~ ~ +5.25% ~ ~ MulconstU32/3 ~ +0.81% ~ +33.39% ~ +77.92% ~ -32.31% MulconstU32/5 ~ ~ ~ -24.97% ~ +77.92% ~ -24.47% MulconstU32/12 ~ ~ ~ +2.06% ~ ~ ~ +0.03% MulconstU32/120 ~ ~ ~ -2.74% ~ ~ ~ +0.03% MulconstU32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03% MulconstU32/65538 ~ ~ ~ ~ ~ -33.42% ~ -0.03% MulconstU64/3 ~ ~ ~ +33.33% ~ -0.28% ~ +1.22% MulconstU64/5 ~ ~ ~ -25.00% ~ ~ ~ -0.64% MulconstU64/12 ~ ~ ~ +2.30% ~ +11.59% ~ +0.14% MulconstU64/120 ~ ~ ~ -2.82% ~ ~ ~ +0.04% MulconstU64/65537 ~ +0.37% ~ ~ ~ +5.58% ~ ~ MulconstU64/65538 ~ ~ ~ ~ ~ +5.16% ~ ~ ShiftArithmeticRight ~ ~ ~ ~ ~ -10.81% ~ +0.31% Switch8Predictable +14.69% ~ ~ ~ ~ -24.85% ~ ~ Switch8Unpredictable ~ -0.58% -3.80% ~ ~ -11.78% ~ -0.79% Switch32Predictable -10.33% +17.89% ~ ~ ~ +5.76% ~ ~ Switch32Unpredictable -3.15% +1.19% +9.42% ~ ~ -10.30% -5.09% +0.44% SwitchStringPredictable +70.88% +20.48% ~ ~ ~ +2.39% ~ +0.31% SwitchStringUnpredictable ~ +3.91% -5.06% -0.98% ~ +0.61% +2.03% ~ SwitchTypePredictable +146.58% -1.10% ~ -12.45% ~ -0.46% -3.81% ~ SwitchTypeUnpredictable +0.46% -0.83% ~ +4.18% ~ +0.43% ~ +0.62% SwitchInterfaceTypePredictable -13.41% -10.13% +11.03% ~ ~ -4.38% ~ +0.75% SwitchInterfaceTypeUnpredictable -6.37% -2.14% ~ -3.21% ~ -4.20% ~ +1.08% Fixes #63110. Fixes #75954. Change-Id: I55a876f08c6c14f419ce1a8cbba2eaae6c6efbf0 Reviewed-on: https://go-review.googlesource.com/c/go/+/714160 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
915c1839fe |
test/codegen: simplify asmcheck pattern matching
Separate patterns in asmcheck by spaces instead of commas. Many patterns end in comma (like "MOV [$]123,") so separating patterns by comma is not great; they're already quoted, so spaces are fine. Also replace all tabs in the assembly lines with spaces before matching. Finally, replace \$ or \\$ with [$] as the matching idiom. The effect of all these is to make the patterns look like: // amd64:"BSFQ" "ORQ [$]256" instead of the old: // amd64:"BSFQ","ORQ\t\\$256" Update all tests as well. Change-Id: Ia39febe5d7f67ba115846422789e11b185d5c807 Reviewed-on: https://go-review.googlesource.com/c/go/+/716060 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com> |
||
|
|
3492e4262b |
cmd/compile: simplify specific addition operations using the ADDV16 instruction
On loong64, the addi.d instruction can only directly handle 12-bit immediate numbers. If a larger immediate number needs to be processed, it must first be placed in a register, and then the add.d instruction is used to complete the processing of the larger immediate number. If a larger immediate number c satisfies is32Bit(c) && c&0xffff == 0, then the ADDV16 instruction can be used to complete the addition operation. Removes 164 instructions from the go binary on loong64. Change-Id: I404de93cc4eaaa12fe424f5a0d61b03231215d1a Reviewed-on: https://go-review.googlesource.com/c/go/+/700536 Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> |
||
|
|
bd71b94659 |
cmd/compile/internal: optimizing add+sll rule using ALSLV instruction on loong64
Reduce the number of go toolchain instructions on loong64 as follows: file before after Δ % go 1573148 1571708 -1,440 -0.0915% gofmt 320578 320090 -488 -0.1522% asm 555066 554406 -660 -0.1189% cgo 481566 480926 -640 -0.1329% compile 2475962 2473880 -2,082 -0.0841% cover 516536 515920 -616 -0.1193% link 702172 701404 -768 -0.1094% preprofile 238626 238274 -352 -0.1475% vet 792928 792100 -828 -0.1044% Change-Id: I61e462726835959c60e1b4e5256d4020202418ab Reviewed-on: https://go-review.googlesource.com/c/go/+/693877 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> |
||
|
|
90b7d7aaa2 |
cmd/compile/internal: optimize multiplication use new operation 'ADDshiftLLV' on loong64
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
│ old │ new │
│ sec/op │ sec/op vs base │
MulconstI32/3 0.8004n ± 0% 0.4247n ± 2% -46.94% (p=0.000 n=10)
MulconstI32/5 0.8005n ± 0% 0.4256n ± 1% -46.83% (p=0.000 n=10)
MulconstI32/12 1.2010n ± 0% 0.8005n ± 0% -33.35% (p=0.000 n=10)
MulconstI32/120 0.8090n ± 0% 0.8067n ± 0% -0.28% (p=0.007 n=10)
MulconstI32/-120 0.8109n ± 0% 0.8072n ± 0% -0.47% (p=0.000 n=10)
MulconstI32/65537 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10)
MulconstI32/65538 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.265 n=10)
MulconstI64/3 0.8005n ± 0% 0.4241n ± 1% -47.02% (p=0.000 n=10)
MulconstI64/5 0.8004n ± 0% 0.4249n ± 1% -46.91% (p=0.000 n=10)
MulconstI64/12 1.2010n ± 0% 0.8004n ± 0% -33.36% (p=0.000 n=10)
MulconstI64/120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.635 n=10)
MulconstI64/-120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.837 n=10)
MulconstI64/65537 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.837 n=10)
MulconstI64/65538 0.8096n ± 0% 0.8004n ± 0% -1.14% (p=0.000 n=10)
MulconstU32/3 0.8004n ± 0% 0.4263n ± 1% -46.75% (p=0.000 n=10)
MulconstU32/5 0.8005n ± 0% 0.4262n ± 1% -46.76% (p=0.000 n=10)
MulconstU32/12 1.2010n ± 0% 0.8005n ± 0% -33.35% (p=0.000 n=10)
MulconstU32/120 0.8105n ± 0% 0.8096n ± 0% ~ (p=0.183 n=10)
MulconstU32/65537 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10)
MulconstU32/65538 0.8005n ± 0% 0.8005n ± 0% ~ (p=1.000 n=10)
MulconstU64/3 0.8004n ± 0% 0.4265n ± 4% -46.71% (p=0.000 n=10)
MulconstU64/5 0.8004n ± 0% 0.4256n ± 0% -46.82% (p=0.000 n=10)
MulconstU64/12 1.2010n ± 0% 0.8004n ± 0% -33.36% (p=0.000 n=10)
MulconstU64/120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.387 n=10)
MulconstU64/65537 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.265 n=10)
MulconstU64/65538 0.8080n ± 0% 0.8004n ± 0% -0.93% (p=0.000 n=10)
geomean 0.8539n 0.6597n -22.74%
Change-Id: Ie33e88985d7639f481bbba540bc917b9f185c357
Reviewed-on: https://go-review.googlesource.com/c/go/+/693855
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
||
|
|
a552737418 |
cmd/compile: fold negation into multiplication on loong64
This change also add corresponding benchmark tests and codegen tests.
The performance improvement on CPU Loongson-3A6000-HV is as follows:
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
MulNeg 828.4n ± 0% 655.9n ± 0% -20.82% (p=0.000 n=10)
Mul2Neg 1062.0n ± 0% 826.8n ± 0% -22.15% (p=0.000 n=10)
geomean 938.0n 736.4n -21.49%
Change-Id: Ia999732880ec65be0c66cddc757a4868847e5b15
Reviewed-on: https://go-review.googlesource.com/c/go/+/682535
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
|
||
|
|
e071617222 |
cmd/compile: optimize multiplication rules on loong64
Improve multiplication strength reduction, refer to CL 626998,
add additional 3 linear combination instructions for loong64.
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
MulconstI32/3 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstI32/5 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstI32/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10)
MulconstI32/120 1.6010n ± 0% 0.8130n ± 0% -49.22% (p=0.000 n=10)
MulconstI32/-120 1.6010n ± 0% 0.8109n ± 0% -49.35% (p=0.000 n=10)
MulconstI32/65537 1.6275n ± 0% 0.8005n ± 0% -50.81% (p=0.000 n=10)
MulconstI32/65538 1.6290n ± 0% 0.8004n ± 0% -50.87% (p=0.000 n=10)
MulconstI64/3 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10)
MulconstI64/5 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10)
MulconstI64/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10)
MulconstI64/120 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstI64/-120 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstI64/65537 1.6270n ± 0% 0.8005n ± 0% -50.80% (p=0.000 n=10)
MulconstI64/65538 1.6290n ± 0% 0.8071n ± 1% -50.45% (p=0.000 n=10)
MulconstU32/3 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10)
MulconstU32/5 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10)
MulconstU32/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10)
MulconstU32/120 1.6010n ± 0% 0.8066n ± 0% -49.62% (p=0.000 n=10)
MulconstU32/65537 1.6290n ± 0% 0.8005n ± 0% -50.86% (p=0.000 n=10)
MulconstU32/65538 1.6280n ± 0% 0.8005n ± 0% -50.83% (p=0.000 n=10)
MulconstU64/3 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstU64/5 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstU64/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10)
MulconstU64/120 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstU64/65537 1.6290n ± 0% 0.8005n ± 0% -50.86% (p=0.000 n=10)
MulconstU64/65538 1.6300n ± 0% 0.8067n ± 0% -50.51% (p=0.000 n=10)
geomean 1.609n 0.8537n -46.95%
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
MulconstI32/3 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstI32/5 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstI32/12 1.601n ± 0% 1.202n ± 0% -24.92% (p=0.000 n=10)
MulconstI32/120 1.6020n ± 0% 0.8012n ± 0% -49.99% (p=0.000 n=10)
MulconstI32/-120 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstI32/65537 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10)
MulconstI32/65538 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstI64/3 1.6015n ± 0% 0.8007n ± 0% -50.00% (p=0.000 n=10)
MulconstI64/5 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10)
MulconstI64/12 1.602n ± 0% 1.202n ± 0% -25.00% (p=0.000 n=10)
MulconstI64/120 1.6030n ± 0% 0.8011n ± 0% -50.02% (p=0.000 n=10)
MulconstI64/-120 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10)
MulconstI64/65537 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstI64/65538 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstU32/3 1.6010n ± 0% 0.8006n ± 0% -49.99% (p=0.000 n=10)
MulconstU32/5 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstU32/12 1.601n ± 0% 1.202n ± 0% -24.92% (p=0.000 n=10)
MulconstU32/120 1.6010n ± 0% 0.8006n ± 0% -49.99% (p=0.000 n=10)
MulconstU32/65537 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstU32/65538 1.6020n ± 0% 0.8009n ± 0% -50.01% (p=0.000 n=10)
MulconstU64/3 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstU64/5 1.6010n ± 0% 0.8007n ± 0% -49.98% (p=0.000 n=10)
MulconstU64/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10)
MulconstU64/120 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10)
MulconstU64/65537 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstU64/65538 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
geomean 1.601n 0.8523n -46.77%
Change-Id: I9fb0e47ca57875da171a347bf4828adfab41b875
Reviewed-on: https://go-review.googlesource.com/c/go/+/675455
Reviewed-by: Mark Freeman <mark@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
||
|
|
dfebef1c04 |
cmd/compile: fold negation into addition/subtraction on arm64
Fold negation into addition/subtraction and avoid double negation. platform: linux/arm64 file before after Δ % addr2line 3628108 3628116 +8 +0.000% asm 6208353 6207857 -496 -0.008% buildid 3460682 3460418 -264 -0.008% cgo 5572988 5572492 -496 -0.009% compile 26042159 26041039 -1120 -0.004% cover 6304328 6303472 -856 -0.014% dist 4139330 4139098 -232 -0.006% doc 9429305 9428065 -1240 -0.013% fix 3997189 3996733 -456 -0.011% link 8212128 8210280 -1848 -0.023% nm 3620056 3619696 -360 -0.010% objdump 5920289 5919233 -1056 -0.018% pack 2892250 2891778 -472 -0.016% pprof 17094569 17092745 -1824 -0.011% test2json 3335825 3335529 -296 -0.009% trace 15842080 15841456 -624 -0.004% vet 9472194 9471106 -1088 -0.011% go 19081541 19081509 -32 -0.000% total 154253374 154240622 -12752 -0.008% platform: darwin/arm64 file before after Δ % compile 27152002 27135490 -16512 -0.061% link 8372914 8356402 -16512 -0.197% go 19154802 19154778 -24 -0.000% total 157734180 157701132 -33048 -0.021% Change-Id: I15a349bfbaf7333ec3e4a62ae4d06f3f371dfb1d Reviewed-on: https://go-review.googlesource.com/c/go/+/673715 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
d52679006c |
cmd/compile: fold negation into addition/subtraction on mipsx
Fold negation into addition/subtraction and avoid double negation. file before after Δ % addr2line 3742022 3741986 -36 -0.001% asm 6668616 6668628 +12 +0.000% buildid 3583786 3583630 -156 -0.004% cgo 6020370 6019634 -736 -0.012% compile 29416016 29417336 +1320 +0.004% cover 6801903 6801675 -228 -0.003% dist 4485916 4485816 -100 -0.002% doc 10652787 10652251 -536 -0.005% fix 4115988 4115560 -428 -0.010% link 9002328 9001616 -712 -0.008% nm 3733148 3732780 -368 -0.010% objdump 6163292 6163068 -224 -0.004% pack 2944768 2944604 -164 -0.006% pprof 18909973 18908773 -1200 -0.006% test2json 3394662 3394778 +116 +0.003% trace 17350911 17349751 -1160 -0.007% vet 10077727 10077527 -200 -0.002% go 19118769 19118609 -160 -0.001% total 166182982 166178022 -4960 -0.003% Change-Id: Id55698800fd70f3cb2ff48393584456b87208921 Reviewed-on: https://go-review.googlesource.com/c/go/+/673556 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> |
||
|
|
8097cf14d2 |
cmd/compile: fold negation into addition/subtraction on mips64x
Fold negation into addition/subtraction and avoid double negation. file before after Δ % addr2line 4007310 4007470 +160 +0.004% asm 7007636 7007436 -200 -0.003% buildid 3839268 3838972 -296 -0.008% cgo 6353466 6352738 -728 -0.011% compile 30426920 30426896 -24 -0.000% cover 7005408 7004744 -664 -0.009% dist 4651192 4650872 -320 -0.007% doc 10606050 10606034 -16 -0.000% fix 4446414 4446390 -24 -0.001% link 9237736 9237024 -712 -0.008% nm 3999107 3999323 +216 +0.005% objdump 6762424 6762144 -280 -0.004% pack 3270757 3270493 -264 -0.008% pprof 19428299 19361939 -66360 -0.342% test2json 3717345 3717217 -128 -0.003% trace 17382273 17381657 -616 -0.004% vet 10689481 10688985 -496 -0.005% go 19118769 19118609 -160 -0.001% total 171949855 171878943 -70912 -0.041% Change-Id: I35c1f264d216c214ea3f56252a9ddab8ea850fa6 Reviewed-on: https://go-review.googlesource.com/c/go/+/673555 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> |
||
|
|
c31a5c571f |
cmd/compile: fold negation into addition/subtraction on loong64
This change also avoid double negation, and add loong64 codegen for arithmetic tests.
Reduce the number of go toolchain instructions on loong64 as follows.
file before after Δ %
addr2line 279972 279896 -76 -0.0271%
asm 556390 556310 -80 -0.0144%
buildid 272376 272300 -76 -0.0279%
cgo 481534 481550 +16 +0.0033%
compile 2457992 2457396 -596 -0.0242%
covdata 323488 323404 -84 -0.0260%
cover 518630 518490 -140 -0.0270%
dist 340894 340814 -80 -0.0235%
distpack 282568 282484 -84 -0.0297%
doc 790224 789984 -240 -0.0304%
fix 324408 324348 -60 -0.0185%
link 704910 704666 -244 -0.0346%
nm 277220 277144 -76 -0.0274%
objdump 508026 507878 -148 -0.0291%
pack 221810 221786 -24 -0.0108%
pprof 1470284 1469880 -404 -0.0275%
test2json 254896 254852 -44 -0.0173%
trace 1100390 1100074 -316 -0.0287%
vet 781398 781142 -256 -0.0328%
go 1529668 1529128 -540 -0.0353%
gofmt 318668 318568 -100 -0.0314%
total 13795746 13792094 -3652 -0.0265%
Change-Id: I88d1f12cfc4be0e92687c48e06a57213aa484aca
Reviewed-on: https://go-review.googlesource.com/c/go/+/672555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
||
|
|
12110c3f7e |
cmd/compile: improve multiplication strength reduction
Use an automatic algorithm to generate strength reduction code. You give it all the linear combination (a*x+b*y) instructions in your architecture, it figures out the rest. Just amd64 and arm64 for now. Fixes #67575 Change-Id: I35c69382bebb1d2abf4bb4e7c43fd8548c6c59a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/626998 Reviewed-by: Jakub Ciolek <jakub@ciolek.dev> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
21417518a9 |
cmd/compile: combine negation and word sign extension on riscv64
Use NEGW to produce a negated and sign extended word, rather than doing the same via two instructions: neg t0, t0 sext.w a0, t0 Becomes: negw t0, t0 Change-Id: I824ab25001bd3304bdbd435e7b244fcc036ef212 Reviewed-on: https://go-review.googlesource.com/c/go/+/652319 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
||
|
|
10d070668c |
cmd/compile/internal/ssa: remove double negation with addition on riscv64
On riscv64, subtraction from a constant is typically implemented as an ADDI with the negative constant, followed by a negation. However this can lead to multiple NEG/ADDI/NEG sequences that can be optimised out. For example, runtime.(*_panic).nextDefer currently contains: lbu t0, 0(t0) addi t0, t0, -8 neg t0, t0 addi t0, t0, -7 neg t0, t0 Which is now optimised to: lbu t0, 0(t0) addi t0, t0, -1 Change-Id: Idf5815e6db2e3705cc4a4811ca9130a064ae3d80 Reviewed-on: https://go-review.googlesource.com/c/go/+/652318 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: David Chase <drchase@google.com> |
||
|
|
a8f2e63f2f |
test/codegen: add a test for negation and conversion to int32
Codify the current code generation used on riscv64 in this case. Change-Id: If4152e3652fc19d0aa28b79dba08abee2486d5ae Reviewed-on: https://go-review.googlesource.com/c/go/+/652317 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
e1f9013a58 |
test/codegen: add riscv64 codegen for arithmetic tests
Codify the current riscv64 code generation for various subtract from constant and addition/subtraction tests. Change-Id: I54ad923280a0578a338bc4431fa5bdc0644c4729 Reviewed-on: https://go-review.googlesource.com/c/go/+/652316 Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: David Chase <drchase@google.com> |
||
|
|
c01fa0cc21 |
test/codegen: add riscv64/rva23u64 specifiers to existing tests
Tests that exist for riscv64/rva22u64 should also be applied to riscv64/rva23u64. Change-Id: Ia529fdf0ac55b8bcb3dcd24fa80efef2351f3842 Reviewed-on: https://go-review.googlesource.com/c/go/+/652315 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: David Chase <drchase@google.com> |
||
|
|
cd595be6d6 |
cmd/compile: prefer an add when shifting left by 1
ADD(Q|L) has generally twice the throughput.
Came up in CL 626998.
Throughput by arch:
Zen 4:
SHLL (R64, 1): 0.5
ADD (R64, R64): 0.25
Intel Alder Lake:
SHLL (R64, 1): 0.5
ADD (R64, R64): 0.2
Intel Haswell:
SHLL (R64, 1): 0.5
ADD (R64, R64): 0.25
Also include a minor opt for:
(x + x) << c -> x << (c + 1)
Before this, the code:
func addShift(x int64) int64 {
return (x + x) << 1
}
emitted two instructions:
ADDQ AX, AX
SHLQ $1, AX
but we can do it in a single shift:
SHLQ $2, AX
Add a codegen test for clearing the last bit.
compilecmp linux/amd64:
math
math.sqrt 243 -> 242 (-0.41%)
math [cmd/compile]
math.sqrt 243 -> 242 (-0.41%)
runtime
runtime.selectgo 5455 -> 5445 (-0.18%)
runtime.sysargs 665 -> 662 (-0.45%)
runtime.isPinned 145 -> 141 (-2.76%)
runtime.atoi64 198 -> 194 (-2.02%)
runtime.setPinned 714 -> 709 (-0.70%)
runtime [cmd/compile]
runtime.sysargs 665 -> 662 (-0.45%)
runtime.setPinned 714 -> 709 (-0.70%)
runtime.atoi64 198 -> 194 (-2.02%)
runtime.isPinned 145 -> 141 (-2.76%)
strconv
strconv.computeBounds 109 -> 107 (-1.83%)
strconv.FormatInt 201 -> 197 (-1.99%)
strconv.ryuFtoaShortest 1298 -> 1266 (-2.47%)
strconv.small 144 -> 134 (-6.94%)
strconv.AppendInt 357 -> 344 (-3.64%)
strconv.ryuDigits32 490 -> 488 (-0.41%)
strconv.AppendUint 342 -> 340 (-0.58%)
strconv [cmd/compile]
strconv.FormatInt 201 -> 197 (-1.99%)
strconv.ryuFtoaShortest 1298 -> 1266 (-2.47%)
strconv.ryuDigits32 490 -> 488 (-0.41%)
strconv.AppendUint 342 -> 340 (-0.58%)
strconv.computeBounds 109 -> 107 (-1.83%)
strconv.small 144 -> 134 (-6.94%)
strconv.AppendInt 357 -> 344 (-3.64%)
image
image.Rectangle.Inset 101 -> 97 (-3.96%)
regexp/syntax
regexp/syntax.inCharClass.func1 111 -> 110 (-0.90%)
regexp/syntax.(*compiler).quest 586 -> 573 (-2.22%)
regexp/syntax.ranges.Less 153 -> 150 (-1.96%)
regexp/syntax.(*compiler).loop 583 -> 568 (-2.57%)
time
time.Time.Before 179 -> 161 (-10.06%)
time.Time.Compare 189 -> 166 (-12.17%)
time.Time.Sub 444 -> 425 (-4.28%)
time.Time.UnixMicro 106 -> 95 (-10.38%)
time.div 592 -> 587 (-0.84%)
time.Time.UnixNano 85 -> 78 (-8.24%)
time.(*Time).UnixMilli 141 -> 140 (-0.71%)
time.Time.UnixMilli 106 -> 95 (-10.38%)
time.(*Time).UnixMicro 141 -> 140 (-0.71%)
time.Time.After 179 -> 161 (-10.06%)
time.Time.Equal 170 -> 150 (-11.76%)
time.Time.AppendBinary 766 -> 757 (-1.17%)
time.Time.IsZero 74 -> 66 (-10.81%)
time.(*Time).UnixNano 124 -> 113 (-8.87%)
time.(*Time).IsZero 113 -> 108 (-4.42%)
regexp
regexp.(*Regexp).FindAllStringSubmatch.func1 590 -> 569 (-3.56%)
regexp.QuoteMeta 485 -> 469 (-3.30%)
regexp/syntax [cmd/compile]
regexp/syntax.inCharClass.func1 111 -> 110 (-0.90%)
regexp/syntax.(*compiler).loop 583 -> 568 (-2.57%)
regexp/syntax.(*compiler).quest 586 -> 573 (-2.22%)
regexp/syntax.ranges.Less 153 -> 150 (-1.96%)
encoding/base64
encoding/base64.decodedLen 92 -> 90 (-2.17%)
encoding/base64.(*Encoding).DecodedLen 99 -> 97 (-2.02%)
time [cmd/compile]
time.(*Time).IsZero 113 -> 108 (-4.42%)
time.Time.IsZero 74 -> 66 (-10.81%)
time.(*Time).UnixNano 124 -> 113 (-8.87%)
time.Time.UnixMilli 106 -> 95 (-10.38%)
time.Time.Equal 170 -> 150 (-11.76%)
time.Time.UnixMicro 106 -> 95 (-10.38%)
time.(*Time).UnixMicro 141 -> 140 (-0.71%)
time.Time.Before 179 -> 161 (-10.06%)
time.Time.UnixNano 85 -> 78 (-8.24%)
time.Time.AppendBinary 766 -> 757 (-1.17%)
time.div 592 -> 587 (-0.84%)
time.Time.After 179 -> 161 (-10.06%)
time.Time.Compare 189 -> 166 (-12.17%)
time.(*Time).UnixMilli 141 -> 140 (-0.71%)
time.Time.Sub 444 -> 425 (-4.28%)
index/suffixarray
index/suffixarray.sais_8_32 1677 -> 1645 (-1.91%)
index/suffixarray.sais_32 1677 -> 1645 (-1.91%)
index/suffixarray.sais_64 1677 -> 1654 (-1.37%)
index/suffixarray.sais_8_64 1677 -> 1654 (-1.37%)
index/suffixarray.writeInt 249 -> 247 (-0.80%)
os
os.Expand 1070 -> 1051 (-1.78%)
os.Chtimes 787 -> 774 (-1.65%)
regexp [cmd/compile]
regexp.(*Regexp).FindAllStringSubmatch.func1 590 -> 569 (-3.56%)
regexp.QuoteMeta 485 -> 469 (-3.30%)
encoding/base64 [cmd/compile]
encoding/base64.decodedLen 92 -> 90 (-2.17%)
encoding/base64.(*Encoding).DecodedLen 99 -> 97 (-2.02%)
encoding/hex
encoding/hex.Encode 138 -> 136 (-1.45%)
encoding/hex.(*decoder).Read 830 -> 824 (-0.72%)
crypto/des
crypto/des.initFeistelBox 235 -> 229 (-2.55%)
crypto/des.cryptBlock 549 -> 538 (-2.00%)
os [cmd/compile]
os.Chtimes 787 -> 774 (-1.65%)
os.Expand 1070 -> 1051 (-1.78%)
math/big
math/big.newFloat 238 -> 223 (-6.30%)
math/big.nat.mul 2138 -> 2122 (-0.75%)
math/big.karatsubaSqr 1372 -> 1369 (-0.22%)
math/big.(*Float).sqrtInverse 895 -> 878 (-1.90%)
math/big.basicSqr 1032 -> 1017 (-1.45%)
cmd/vendor/golang.org/x/sys/unix
cmd/vendor/golang.org/x/sys/unix.TimeToTimespec 72 -> 66 (-8.33%)
encoding/json
encoding/json.Indent 404 -> 403 (-0.25%)
encoding/json.MarshalIndent 303 -> 297 (-1.98%)
testing
testing.(*T).Deadline 84 -> 82 (-2.38%)
testing.(*M).Run 3545 -> 3525 (-0.56%)
archive/zip
archive/zip.headerFileInfo.ModTime 229 -> 223 (-2.62%)
encoding/gob
encoding/gob.(*encoderState).encodeInt 474 -> 469 (-1.05%)
crypto/elliptic
crypto/elliptic.Marshal 728 -> 714 (-1.92%)
debug/buildinfo
debug/buildinfo.readString 325 -> 315 (-3.08%)
image/png
image/png.(*decoder).readImagePass 10866 -> 10834 (-0.29%)
archive/tar
archive/tar.Header.allowedFormats.func3 1768 -> 1736 (-1.81%)
archive/tar.formatPAXTime 389 -> 358 (-7.97%)
archive/tar.(*Writer).writeGNUHeader 741 -> 727 (-1.89%)
archive/tar.readGNUSparseMap0x1 709 -> 695 (-1.97%)
archive/tar.(*Writer).templateV7Plus 915 -> 909 (-0.66%)
crypto/internal/cryptotest
crypto/internal/cryptotest.TestHash.func4 890 -> 879 (-1.24%)
crypto/internal/cryptotest.TestStream.func6.1 646 -> 645 (-0.15%)
crypto/internal/cryptotest.testCipher.func3 1300 -> 1289 (-0.85%)
internal/pkgbits
internal/pkgbits.(*Encoder).Int64 113 -> 103 (-8.85%)
internal/pkgbits.(*Encoder).rawVarint 74 -> 72 (-2.70%)
testing/quick
testing/quick.(*Config).getRand 316 -> 315 (-0.32%)
log/slog
log/slog.TimeValue 489 -> 479 (-2.04%)
runtime/pprof
runtime/pprof.(*profileBuilder).build 2341 -> 2322 (-0.81%)
internal/coverage/cfile
internal/coverage/cfile.(*emitState).openMetaFile 824 -> 822 (-0.24%)
internal/coverage/cfile.(*emitState).openCounterFile 904 -> 892 (-1.33%)
cmd/internal/objabi
cmd/internal/objabi.expandArgs 1177 -> 1169 (-0.68%)
crypto/ecdsa
crypto/ecdsa.pointFromAffine 1162 -> 1144 (-1.55%)
net
net.minNonzeroTime 313 -> 308 (-1.60%)
net.cgoLookupAddrPTR 812 -> 797 (-1.85%)
net.(*IPNet).String 851 -> 827 (-2.82%)
net.IP.AppendText 488 -> 471 (-3.48%)
net.IPMask.String 281 -> 270 (-3.91%)
net.partialDeadline 374 -> 366 (-2.14%)
net.hexString 249 -> 240 (-3.61%)
net.IP.String 454 -> 453 (-0.22%)
internal/fuzz
internal/fuzz.newPcgRand 240 -> 234 (-2.50%)
crypto/x509
crypto/x509.(*Certificate).isValid 2642 -> 2611 (-1.17%)
cmd/internal/obj/s390x
cmd/internal/obj/s390x.buildop 33676 -> 33644 (-0.10%)
encoding/hex [cmd/compile]
encoding/hex.(*decoder).Read 830 -> 824 (-0.72%)
encoding/hex.Encode 138 -> 136 (-1.45%)
cmd/internal/objabi [cmd/compile]
cmd/internal/objabi.expandArgs 1177 -> 1169 (-0.68%)
math/big [cmd/compile]
math/big.(*Float).sqrtInverse 895 -> 878 (-1.90%)
math/big.nat.mul 2138 -> 2122 (-0.75%)
math/big.karatsubaSqr 1372 -> 1369 (-0.22%)
math/big.basicSqr 1032 -> 1017 (-1.45%)
math/big.newFloat 238 -> 223 (-6.30%)
encoding/json [cmd/compile]
encoding/json.MarshalIndent 303 -> 297 (-1.98%)
encoding/json.Indent 404 -> 403 (-0.25%)
cmd/covdata
main.(*metaMerge).emitCounters 985 -> 973 (-1.22%)
runtime/pprof [cmd/compile]
runtime/pprof.(*profileBuilder).build 2341 -> 2322 (-0.81%)
cmd/compile/internal/syntax
cmd/compile/internal/syntax.(*source).fill 722 -> 703 (-2.63%)
cmd/dist
main.runInstall 19081 -> 19049 (-0.17%)
crypto/tls
crypto/tls.extractPadding 176 -> 175 (-0.57%)
slices.Clone[[]crypto/tls.SignatureScheme,crypto/tls.SignatureScheme] 253 -> 247 (-2.37%)
slices.Clone[[]uint16,uint16] 253 -> 247 (-2.37%)
slices.Clone[[]crypto/tls.CurveID,crypto/tls.CurveID] 253 -> 247 (-2.37%)
crypto/tls.(*Config).cipherSuites 335 -> 326 (-2.69%)
slices.DeleteFunc[go.shape.[]crypto/tls.CurveID,go.shape.uint16] 437 -> 434 (-0.69%)
crypto/tls.dial 1349 -> 1339 (-0.74%)
slices.DeleteFunc[go.shape.[]uint16,go.shape.uint16] 437 -> 434 (-0.69%)
internal/pkgbits [cmd/compile]
internal/pkgbits.(*Encoder).Int64 113 -> 103 (-8.85%)
internal/pkgbits.(*Encoder).rawVarint 74 -> 72 (-2.70%)
cmd/compile/internal/syntax [cmd/compile]
cmd/compile/internal/syntax.(*source).fill 722 -> 703 (-2.63%)
cmd/internal/obj/s390x [cmd/compile]
cmd/internal/obj/s390x.buildop 33676 -> 33644 (-0.10%)
cmd/go/internal/trace
cmd/go/internal/trace.Flow 910 -> 886 (-2.64%)
cmd/go/internal/trace.(*Span).Done 311 -> 304 (-2.25%)
cmd/go/internal/trace.StartSpan 620 -> 615 (-0.81%)
cmd/internal/script
cmd/internal/script.(*Engine).Execute.func2 534 -> 528 (-1.12%)
cmd/link/internal/loader
cmd/link/internal/loader.(*Loader).SetSymSect 344 -> 338 (-1.74%)
net/http
net/http.(*Transport).queueForIdleConn 1797 -> 1766 (-1.73%)
net/http.(*Transport).getConn 2149 -> 2131 (-0.84%)
net/http.(*http2ClientConn).tooIdleLocked 207 -> 197 (-4.83%)
net/http.(*http2responseWriter).SetWriteDeadline.func1 520 -> 508 (-2.31%)
net/http.(*Cookie).Valid 837 -> 818 (-2.27%)
net/http.(*http2responseWriter).SetReadDeadline 373 -> 357 (-4.29%)
net/http.checkIfRange 701 -> 690 (-1.57%)
net/http.(*http2SettingsFrame).Value 325 -> 298 (-8.31%)
net/http.(*http2SettingsFrame).HasDuplicates 777 -> 767 (-1.29%)
net/http.(*Server).Serve 1746 -> 1739 (-0.40%)
net/http.http2traceGotConn 569 -> 556 (-2.28%)
net/http/pprof
net/http/pprof.collectProfile 242 -> 239 (-1.24%)
cmd/compile/internal/coverage
cmd/compile/internal/coverage.metaHashAndLen 439 -> 438 (-0.23%)
cmd/vendor/golang.org/x/telemetry/internal/upload
cmd/vendor/golang.org/x/telemetry/internal/upload.(*uploader).findWork 4570 -> 4540 (-0.66%)
cmd/vendor/golang.org/x/telemetry/internal/upload.(*uploader).reports 3604 -> 3572 (-0.89%)
cmd/compile/internal/coverage [cmd/compile]
cmd/compile/internal/coverage.metaHashAndLen 439 -> 438 (-0.23%)
cmd/vendor/golang.org/x/text/language
cmd/vendor/golang.org/x/text/language.regionGroupDist 287 -> 284 (-1.05%)
cmd/go/internal/vcweb
cmd/go/internal/vcweb.(*Server).overview.func1 1045 -> 1041 (-0.38%)
cmd/go/internal/vcs
cmd/go/internal/vcs.expand 761 -> 741 (-2.63%)
cmd/compile/internal/inline/inlheur
slices.stableCmpFunc[go.shape.struct 2300 -> 2284 (-0.70%)
cmd/compile/internal/inline/inlheur [cmd/compile]
slices.stableCmpFunc[go.shape.struct 2300 -> 2284 (-0.70%)
cmd/go/internal/modfetch/codehost
cmd/go/internal/modfetch/codehost.bzrParseStat 2217 -> 2213 (-0.18%)
cmd/link/internal/ld
cmd/link/internal/ld.decodetypeStructFieldCount 157 -> 152 (-3.18%)
cmd/link/internal/ld.(*Link).address 12559 -> 12495 (-0.51%)
cmd/link/internal/ld.(*dodataState).allocateDataSections 18345 -> 18205 (-0.76%)
cmd/link/internal/ld.elfshreloc 618 -> 616 (-0.32%)
cmd/link/internal/ld.(*deadcodePass).decodetypeMethods 794 -> 779 (-1.89%)
cmd/link/internal/ld.(*dodataState).assignDsymsToSection 668 -> 663 (-0.75%)
cmd/link/internal/ld.relocSectFn 285 -> 284 (-0.35%)
cmd/link/internal/ld.decodetypeIfaceMethodCount 146 -> 144 (-1.37%)
cmd/link/internal/ld.decodetypeArrayLen 157 -> 152 (-3.18%)
cmd/link/internal/arm64
cmd/link/internal/arm64.gensymlate.func1 895 -> 888 (-0.78%)
cmd/go/internal/modload
cmd/go/internal/modload.queryProxy.func3 1029 -> 1012 (-1.65%)
cmd/go/internal/load
cmd/go/internal/load.(*Package).setBuildInfo 8453 -> 8447 (-0.07%)
cmd/go/internal/clean
cmd/go/internal/clean.runClean 2120 -> 2104 (-0.75%)
cmd/compile/internal/ssa
cmd/compile/internal/ssa.(*poset).aliasnodes 2010 -> 1978 (-1.59%)
cmd/compile/internal/ssa.rewriteValueARM64_OpARM64MOVHstoreidx2 730 -> 719 (-1.51%)
cmd/compile/internal/ssa.(*debugState).buildLocationLists 3326 -> 3294 (-0.96%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDLconst 3069 -> 2941 (-4.17%)
cmd/compile/internal/ssa.(*debugState).processValue 9756 -> 9724 (-0.33%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDQconst 3069 -> 2941 (-4.17%)
cmd/compile/internal/ssa.(*poset).mergeroot 1079 -> 1054 (-2.32%)
cmd/compile/internal/ssa [cmd/compile]
cmd/compile/internal/ssa.rewriteValueARM64_OpARM64MOVHstoreidx2 730 -> 719 (-1.51%)
cmd/compile/internal/ssa.(*poset).aliasnodes 2010 -> 1978 (-1.59%)
cmd/compile/internal/ssa.(*poset).mergeroot 1079 -> 1054 (-2.32%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDQconst 3069 -> 2941 (-4.17%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDLconst 3069 -> 2941 (-4.17%)
file before after Δ %
math/bits.s 2352 2354 +2 +0.085%
math/bits [cmd/compile].s 2352 2354 +2 +0.085%
math.s 35675 35674 -1 -0.003%
math [cmd/compile].s 35675 35674 -1 -0.003%
runtime.s 577251 577245 -6 -0.001%
runtime [cmd/compile].s 642419 642438 +19 +0.003%
sort.s 37434 37435 +1 +0.003%
strconv.s 48391 48343 -48 -0.099%
sort [cmd/compile].s 37434 37435 +1 +0.003%
bufio.s 21386 21418 +32 +0.150%
strconv [cmd/compile].s 48391 48343 -48 -0.099%
image.s 34978 35022 +44 +0.126%
regexp/syntax.s 81719 81781 +62 +0.076%
time.s 94341 94184 -157 -0.166%
regexp.s 60411 60399 -12 -0.020%
bufio [cmd/compile].s 21512 21544 +32 +0.149%
encoding/binary.s 34062 34087 +25 +0.073%
regexp/syntax [cmd/compile].s 81719 81781 +62 +0.076%
encoding/base64.s 11907 11903 -4 -0.034%
time [cmd/compile].s 94341 94184 -157 -0.166%
index/suffixarray.s 41633 41527 -106 -0.255%
os.s 101770 101738 -32 -0.031%
regexp [cmd/compile].s 60411 60399 -12 -0.020%
encoding/binary [cmd/compile].s 37173 37198 +25 +0.067%
encoding/base64 [cmd/compile].s 11907 11903 -4 -0.034%
os/exec.s 23900 23907 +7 +0.029%
encoding/hex.s 6038 6030 -8 -0.132%
crypto/des.s 5073 5056 -17 -0.335%
os [cmd/compile].s 102030 101998 -32 -0.031%
vendor/golang.org/x/net/http2/hpack.s 22027 22033 +6 +0.027%
math/big.s 164808 164753 -55 -0.033%
cmd/vendor/golang.org/x/sys/unix.s 121450 121444 -6 -0.005%
encoding/json.s 110294 110287 -7 -0.006%
testing.s 115303 115281 -22 -0.019%
archive/zip.s 65329 65325 -4 -0.006%
os/user.s 10078 10080 +2 +0.020%
encoding/gob.s 143788 143783 -5 -0.003%
crypto/elliptic.s 30686 30704 +18 +0.059%
go/doc/comment.s 49401 49433 +32 +0.065%
debug/buildinfo.s 9095 9085 -10 -0.110%
image/png.s 36113 36081 -32 -0.089%
archive/tar.s 71994 71897 -97 -0.135%
crypto/internal/cryptotest.s 60872 60849 -23 -0.038%
internal/pkgbits.s 20441 20429 -12 -0.059%
testing/quick.s 8236 8235 -1 -0.012%
log/slog.s 77568 77558 -10 -0.013%
internal/trace/internal/oldtrace.s 52885 52896 +11 +0.021%
runtime/pprof.s 123978 123969 -9 -0.007%
internal/coverage/cfile.s 25198 25184 -14 -0.056%
cmd/internal/objabi.s 19954 19946 -8 -0.040%
crypto/ecdsa.s 29159 29141 -18 -0.062%
log/slog/internal/benchmarks.s 6694 6695 +1 +0.015%
net.s 299569 299503 -66 -0.022%
os/exec [cmd/compile].s 23888 23895 +7 +0.029%
internal/trace.s 179226 179240 +14 +0.008%
internal/fuzz.s 86190 86191 +1 +0.001%
crypto/x509.s 177195 177164 -31 -0.017%
cmd/internal/obj/s390x.s 121642 121610 -32 -0.026%
cmd/internal/obj/ppc64.s 140118 140122 +4 +0.003%
encoding/hex [cmd/compile].s 6149 6141 -8 -0.130%
cmd/internal/objabi [cmd/compile].s 19954 19946 -8 -0.040%
cmd/internal/obj/arm64.s 158523 158555 +32 +0.020%
go/doc/comment [cmd/compile].s 49512 49544 +32 +0.065%
math/big [cmd/compile].s 166394 166339 -55 -0.033%
encoding/json [cmd/compile].s 110712 110705 -7 -0.006%
cmd/covdata.s 39699 39687 -12 -0.030%
runtime/pprof [cmd/compile].s 125209 125200 -9 -0.007%
cmd/compile/internal/syntax.s 181755 181736 -19 -0.010%
cmd/dist.s 177893 177861 -32 -0.018%
crypto/tls.s 389157 389113 -44 -0.011%
internal/pkgbits [cmd/compile].s 41644 41632 -12 -0.029%
cmd/compile/internal/syntax [cmd/compile].s 196105 196086 -19 -0.010%
cmd/compile/internal/types.s 71315 71345 +30 +0.042%
cmd/internal/obj/s390x [cmd/compile].s 121733 121701 -32 -0.026%
cmd/go/internal/trace.s 4796 4760 -36 -0.751%
cmd/internal/obj/arm64 [cmd/compile].s 168120 168147 +27 +0.016%
cmd/internal/obj/ppc64 [cmd/compile].s 140219 140223 +4 +0.003%
cmd/internal/script.s 83442 83436 -6 -0.007%
cmd/link/internal/loader.s 93299 93294 -5 -0.005%
net/http.s 620639 620472 -167 -0.027%
net/http/pprof.s 35016 35013 -3 -0.009%
cmd/compile/internal/coverage.s 6668 6667 -1 -0.015%
cmd/vendor/golang.org/x/telemetry/internal/upload.s 34210 34148 -62 -0.181%
cmd/compile/internal/coverage [cmd/compile].s 6664 6663 -1 -0.015%
cmd/vendor/golang.org/x/text/language.s 48077 48074 -3 -0.006%
cmd/go/internal/vcweb.s 45193 45189 -4 -0.009%
cmd/go/internal/vcs.s 44749 44729 -20 -0.045%
cmd/compile/internal/inline/inlheur.s 83758 83742 -16 -0.019%
cmd/compile/internal/inline/inlheur [cmd/compile].s 84773 84757 -16 -0.019%
cmd/go/internal/modfetch/codehost.s 89098 89094 -4 -0.004%
cmd/trace.s 257550 257564 +14 +0.005%
cmd/link/internal/ld.s 641945 641706 -239 -0.037%
cmd/link/internal/arm64.s 34805 34798 -7 -0.020%
cmd/go/internal/modload.s 328971 328954 -17 -0.005%
cmd/go/internal/load.s 178877 178871 -6 -0.003%
cmd/go/internal/clean.s 11006 10990 -16 -0.145%
cmd/compile/internal/ssa.s 3552843 3553347 +504 +0.014%
cmd/compile/internal/ssa [cmd/compile].s 3752511 3753123 +612 +0.016%
total 36179015 36178687 -328 -0.001%
Change-Id: I251c2898ccf3c9931d162d87dabbd49cf4ec73a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/641757
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
||
|
|
02a9f51011 |
test/codegen: add initial codegen tests for integer min/max
Change-Id: I006370053748edbec930c7279ee88a805009aa0d Reviewed-on: https://go-review.googlesource.com/c/go/+/606976 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
d11e417285 |
cmd/compile/internal/ssa: cleanup ANDCCconst rewrite rules on PPC64
Avoid creating duplicate usages of ANDCCconst. This is preparation for a patch to reintroduce ANDconst to simplify the lower pass while treating ANDCCconst like other *CC* ssa opcodes. Also, move many of the similar rules wich retarget ANDCCconst users to the flag result to a common rule for all compares against zero. Change-Id: Ida86efe17ff413cb82c349d8ef69d2899361f4c0 Reviewed-on: https://go-review.googlesource.com/c/go/+/585400 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> |
||
|
|
6e5398bad1 |
cmd/asm,cmd/compile: generate less instructions for most 32 bit constant adds on ppc64x
For GOPPC64 < 10 targets, most large 32 bit constants (those exceeding int16 capacity) can be added using two instructions instead of 3. This cannot be done for values greater than 0x7FFF7FFF, so this must be done during asm preprocessing as the optab matching rules cannot differentiate this special case. Likewise, constants 0x8000 <= x < 0x10000 are not converted. The assembler currently generates 2 instructions sequences for these constants. Change-Id: I1ccc839c6c28fc32f15d286b2e52e2d22a2a06d4 Reviewed-on: https://go-review.googlesource.com/c/go/+/568116 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
|
|
1c5862cc0a |
test/codegen: fix PPC64 AddLargeConst test
Commit |
||
|
|
061d77cb70 |
cmd/compile/internal/ssa: on PPC64, generate large constant paddi
This is only supported power10/linux/PPC64. This generates smaller, faster code by merging a pli + add into paddi. Change-Id: I1f4d522fce53aea4c072713cc119a9e0d7065acc Reviewed-on: https://go-review.googlesource.com/c/go/+/531717 Run-TryBot: Paul Murphy <murp@ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> |
||
|
|
80834af206 |
cmd/compile: avoid ANDCCconst on PPC64 if condition not needed
In the PPC64 ISA, the instruction to do an 'and' operation using an immediate constant is only available in the form that also sets CR0 (i.e. clobbers the condition register.) This means CR0 is being clobbered unnecessarily in many cases. That affects some decisions made during some compiler passes that check for it. In those cases when the constant used by the ANDCC is a right justified consecutive set of bits, a shift instruction can be used which has the same effect if CR0 does not need to be set. The rule to do that has been added to the late rules file after other rules using ANDCCconst have been processed in the main rules file. Some codegen tests had to be updated since ANDCC is no longer generated for some cases. A new test case was added to verify the ANDCC is present if the results for both the AND and CR0 are used. Change-Id: I304f607c039a458e2d67d25351dd00aea72ba542 Reviewed-on: https://go-review.googlesource.com/c/go/+/531435 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> |
||
|
|
da4687923b |
cmd/compile: add rewrite rules for arithmetic operations
Add the following common local transformations
(t + x) - (t + y) == x - y
(t + x) - (y + t) == x - y
(x + t) - (y + t) == x - y
(x + t) - (t + y) == x - y
(x - t) + (t + y) == x + y
(x - t) + (y + t) == x + y
The compiler itself matches such patterns many times. This also aligns with other popular compilers.
Fixes #59111
Change-Id: Ibdfdb414782f8fcaa20b84ac5d43d0d9ae2c7b60
GitHub-Last-Rev:
|
||
|
|
1540531746 |
test/codegen: merge identical ppc64 and ppc64le tests
Manually consolidate the remaining ppc64/ppc64le test which are not so trivial to automatically merge. The remaining ppc64le tests are limited to cases where load/stores are merged (this only happens on ppc64le) and the race detector (only supported on ppc64le). Change-Id: I1f9c0f3d3ddbb7fbbd8c81fbbd6537394fba63ce Reviewed-on: https://go-review.googlesource.com/c/go/+/463217 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> |
||
|
|
0301c6c351 |
test/codegen: combine trivial PPC64 tests into ppc64x
Use a small python script to consolidate duplicate
ppc64/ppc64le tests into a single ppc64x codegen test.
This makes small assumption that anytime two tests with
for different arch/variant combos exists, those tests
can be combined into a single ppc64x test.
E.x:
// ppc64le: foo
// ppc64le/power9: foo
into
// ppc64x: foo
or
// ppc64: foo
// ppc64le: foo
into
// ppc64x: foo
import glob
import re
files = glob.glob("codegen/*.go")
for file in files:
with open(file) as f:
text = [l for l in f]
i = 0
while i < len(text):
first = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[i])
if first:
j = i+1
while j < len(text):
second = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[j])
if not second:
break
if (not first.group(2) or first.group(2) == second.group(2)) and first.group(3) == second.group(3):
text[i] = re.sub(" ?ppc64(le|x)?"," ppc64x",text[i])
text=text[:j] + (text[j+1:])
else:
j += 1
i+=1
with open(file, 'w') as f:
f.write("".join(text))
Change-Id: Ic6b009b54eacaadc5a23db9c5a3bf7331b595821
Reviewed-on: https://go-review.googlesource.com/c/go/+/463220
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
||
|
|
f959fb3872 |
cmd/compile: add anchored version of SP
The SPanchored opcode is identical to SP, except that it takes a memory argument so that it (and more importantly, anything that uses it) must be scheduled at or after that memory argument. This opcode ensures that a LEAQ of a variable gets scheduled after the corresponding VARDEF for that variable. This may lead to less CSE of LEAQ operations. The effect is very small. The go binary is only 80 bytes bigger after this CL. Usually LEAQs get folded into load/store operations, so the effect is only for pointerful types, large enough to need a duffzero, and have their address passed somewhere. Even then, usually the CSEd LEAQs will be un-CSEd because the two uses are on different sides of a function call and the LEAQ ends up being rematerialized at the second use anyway. Change-Id: Ib893562cd05369b91dd563b48fb83f5250950293 Reviewed-on: https://go-review.googlesource.com/c/go/+/452916 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Martin Möhrmann <martin@golang.org> Reviewed-by: Keith Randall <khr@google.com> |
||
|
|
ddc7d2a80c |
cmd/compile: add late lower pass for last rules to run
Usually optimization rules have corresponding priorities, some need to
be run first, some run next, and some run last, which produces the best
code. But currently our optimization rules have no priority, this CL
adds a late lower pass that runs those rules that need to be run at last,
such as split unreasonable constant folding. This pass can be seen as
the second round of the lower pass.
For example:
func foo(a, b uint64) uint64 {
d := a+0x1234568
d1 := b+0x1234568
return d&d1
}
The code generated by the master branch:
0x0004 00004 ADD $19088744, R0, R2 // movz+movk+add
0x0010 00016 ADD $19088744, R1, R1 // movz+movk+add
0x001c 00028 AND R1, R2, R0
This is because the current constant folding optimization rules do not
take into account the range of constants, causing the constant to be
loaded repeatedly. This CL splits these unreasonable constants folding
in the late lower pass. With this CL the generated code:
0x0004 00004 MOVD $19088744, R2 // movz+movk
0x000c 00012 ADD R0, R2, R3
0x0010 00016 ADD R1, R2, R1
0x0014 00020 AND R1, R3, R0
This CL also adds constant folding optimization for ADDS instruction.
In addition, in order not to introduce the codegen regression, an
optimization rule is added to change the addition of a negative number
into a subtraction of a positive number.
go1 benchmarks:
name old time/op new time/op delta
BinaryTree17-8 1.22s ± 1% 1.24s ± 0% +1.56% (p=0.008 n=5+5)
Fannkuch11-8 1.54s ± 0% 1.53s ± 0% -0.69% (p=0.016 n=4+5)
FmtFprintfEmpty-8 14.1ns ± 0% 14.1ns ± 0% ~ (p=0.079 n=4+5)
FmtFprintfString-8 26.0ns ± 0% 26.1ns ± 0% +0.23% (p=0.008 n=5+5)
FmtFprintfInt-8 32.3ns ± 0% 32.9ns ± 1% +1.72% (p=0.008 n=5+5)
FmtFprintfIntInt-8 54.5ns ± 0% 55.5ns ± 0% +1.83% (p=0.008 n=5+5)
FmtFprintfPrefixedInt-8 61.5ns ± 0% 62.0ns ± 0% +0.93% (p=0.008 n=5+5)
FmtFprintfFloat-8 72.0ns ± 0% 73.6ns ± 0% +2.24% (p=0.008 n=5+5)
FmtManyArgs-8 221ns ± 0% 224ns ± 0% +1.22% (p=0.008 n=5+5)
GobDecode-8 1.91ms ± 0% 1.93ms ± 0% +0.98% (p=0.008 n=5+5)
GobEncode-8 1.40ms ± 1% 1.39ms ± 0% -0.79% (p=0.032 n=5+5)
Gzip-8 115ms ± 0% 117ms ± 1% +1.17% (p=0.008 n=5+5)
Gunzip-8 19.4ms ± 1% 19.3ms ± 0% -0.71% (p=0.016 n=5+4)
HTTPClientServer-8 27.0µs ± 0% 27.3µs ± 0% +0.80% (p=0.008 n=5+5)
JSONEncode-8 3.36ms ± 1% 3.33ms ± 0% ~ (p=0.056 n=5+5)
JSONDecode-8 17.5ms ± 2% 17.8ms ± 0% +1.71% (p=0.016 n=5+4)
Mandelbrot200-8 2.29ms ± 0% 2.29ms ± 0% ~ (p=0.151 n=5+5)
GoParse-8 1.35ms ± 1% 1.36ms ± 1% ~ (p=0.056 n=5+5)
RegexpMatchEasy0_32-8 24.5ns ± 0% 24.5ns ± 0% ~ (p=0.444 n=4+5)
RegexpMatchEasy0_1K-8 131ns ±11% 118ns ± 6% ~ (p=0.056 n=5+5)
RegexpMatchEasy1_32-8 22.9ns ± 0% 22.9ns ± 0% ~ (p=0.905 n=4+5)
RegexpMatchEasy1_1K-8 126ns ± 0% 127ns ± 0% ~ (p=0.063 n=4+5)
RegexpMatchMedium_32-8 486ns ± 5% 483ns ± 0% ~ (p=0.381 n=5+4)
RegexpMatchMedium_1K-8 15.4µs ± 1% 15.5µs ± 0% ~ (p=0.151 n=5+5)
RegexpMatchHard_32-8 687ns ± 0% 686ns ± 0% ~ (p=0.103 n=5+5)
RegexpMatchHard_1K-8 20.7µs ± 0% 20.7µs ± 1% ~ (p=0.151 n=5+5)
Revcomp-8 175ms ± 2% 176ms ± 3% ~ (p=1.000 n=5+5)
Template-8 20.4ms ± 6% 20.1ms ± 2% ~ (p=0.151 n=5+5)
TimeParse-8 112ns ± 0% 113ns ± 0% +0.97% (p=0.016 n=5+4)
TimeFormat-8 156ns ± 0% 145ns ± 0% -7.14% (p=0.029 n=4+4)
Change-Id: I3ced26e89041f873ac989586514ccc5ee09f13da
Reviewed-on: https://go-review.googlesource.com/c/go/+/425134
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
|
||
|
|
efe5929dbd |
cmd/compile/internal/ssa: optimize ARM64 code with TST
For signed comparisons, the following four optimization rules hold: (CMPconst [0] z:(AND x y)) && z.Uses == 1 => (TST x y) (CMPWconst [0] z:(AND x y)) && z.Uses == 1 => (TSTW x y) (CMPconst [0] x:(ANDconst [c] y)) && x.Uses == 1 => (TSTconst [c] y) (CMPWconst [0] x:(ANDconst [c] y)) && x.Uses == 1 => (TSTWconst [int32(c)] y) But currently they only apply to jump instructions, not to conditional instructions within a block, such as cset, csel, etc. This CL extends the above rules into blocks so that conditional instructions can also be optimized. name old time/op new time/op delta DivisiblePow2constI64-160 1.04ns ± 0% 0.86ns ± 0% -17.30% (p=0.008 n=5+5) DivisiblePow2constI32-160 1.04ns ± 0% 0.87ns ± 0% -16.16% (p=0.016 n=4+5) DivisiblePow2constI16-160 1.04ns ± 0% 0.87ns ± 0% -16.03% (p=0.008 n=5+5) DivisiblePow2constI8-160 1.04ns ± 0% 0.86ns ± 0% -17.15% (p=0.008 n=5+5) Change-Id: I6bc34bff30862210e8dd001e0340b8fe502fe3de Reviewed-on: https://go-review.googlesource.com/c/go/+/420434 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com> |
||
|
|
4056934e48 |
test/codegen: updated arithmetic tests to verify on ppc64,ppc64le
Updated multiple tests in test/codegen/arithmetic.go to verify on ppc64/ppc64le as well Change-Id: I79ca9f87017ea31147a4ba16f5d42ba0fcae64e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/358546 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
||
|
|
732f6fa9d5 |
cmd/compile: use ANDL for small immediates
We can rewrite ANDQ with an immediate fitting in 32bit with an ANDL, which is shorter to encode. Looking at Go binary itself, before the change there was: ANDL: 2337 ANDQ: 4476 After the change: ANDL: 3790 ANDQ: 3024 So we got rid of 1452 ANDQs This makes the Linux x86_64 binary 0.03% smaller. There seems to be an impact on performance. Intel Cascade Lake benchmarks (with perflock): name old time/op new time/op delta BinaryTree17-8 1.91s ± 1% 1.89s ± 1% -1.22% (p=0.000 n=21+18) Fannkuch11-8 2.34s ± 0% 2.34s ± 0% ~ (p=0.052 n=20+20) FmtFprintfEmpty-8 27.7ns ± 1% 27.4ns ± 3% ~ (p=0.497 n=21+21) FmtFprintfString-8 53.2ns ± 0% 51.5ns ± 0% -3.21% (p=0.000 n=20+19) FmtFprintfInt-8 57.3ns ± 0% 55.7ns ± 0% -2.89% (p=0.000 n=19+19) FmtFprintfIntInt-8 92.3ns ± 0% 88.4ns ± 1% -4.23% (p=0.000 n=20+21) FmtFprintfPrefixedInt-8 103ns ± 0% 103ns ± 0% +0.23% (p=0.000 n=20+21) FmtFprintfFloat-8 147ns ± 0% 148ns ± 0% +0.75% (p=0.000 n=20+21) FmtManyArgs-8 384ns ± 0% 381ns ± 0% -0.63% (p=0.000 n=21+21) GobDecode-8 3.86ms ± 1% 3.88ms ± 1% +0.52% (p=0.000 n=20+21) GobEncode-8 2.77ms ± 1% 2.77ms ± 0% ~ (p=0.078 n=21+21) Gzip-8 168ms ± 1% 168ms ± 0% +0.24% (p=0.000 n=20+20) Gunzip-8 25.1ms ± 0% 24.3ms ± 0% -3.03% (p=0.000 n=21+21) HTTPClientServer-8 61.4µs ± 8% 59.1µs ±10% ~ (p=0.088 n=20+21) JSONEncode-8 6.86ms ± 0% 6.70ms ± 0% -2.29% (p=0.000 n=20+19) JSONDecode-8 30.8ms ± 1% 30.6ms ± 1% -0.82% (p=0.000 n=20+20) Mandelbrot200-8 3.85ms ± 0% 3.85ms ± 0% ~ (p=0.191 n=16+17) GoParse-8 2.61ms ± 2% 2.60ms ± 1% ~ (p=0.561 n=21+20) RegexpMatchEasy0_32-8 48.5ns ± 2% 45.9ns ± 3% -5.26% (p=0.000 n=20+21) RegexpMatchEasy0_1K-8 139ns ± 0% 139ns ± 0% +0.27% (p=0.000 n=18+20) RegexpMatchEasy1_32-8 41.3ns ± 0% 42.1ns ± 4% +1.95% (p=0.000 n=17+21) RegexpMatchEasy1_1K-8 216ns ± 2% 216ns ± 0% +0.17% (p=0.020 n=21+19) RegexpMatchMedium_32-8 790ns ± 7% 803ns ± 8% ~ (p=0.178 n=21+21) RegexpMatchMedium_1K-8 23.5µs ± 5% 23.7µs ± 5% ~ (p=0.421 n=21+21) RegexpMatchHard_32-8 1.09µs ± 1% 1.09µs ± 1% -0.53% (p=0.000 n=19+18) RegexpMatchHard_1K-8 33.0µs ± 0% 33.0µs ± 0% ~ (p=0.610 n=21+20) Revcomp-8 348ms ± 0% 353ms ± 0% +1.38% (p=0.000 n=17+18) Template-8 42.0ms ± 1% 41.9ms ± 1% -0.30% (p=0.049 n=20+20) TimeParse-8 185ns ± 0% 185ns ± 0% ~ (p=0.387 n=20+18) TimeFormat-8 237ns ± 1% 241ns ± 1% +1.57% (p=0.000 n=21+21) [Geo mean] 35.4µs 35.2µs -0.66% name old speed new speed delta GobDecode-8 199MB/s ± 1% 198MB/s ± 1% -0.52% (p=0.000 n=20+21) GobEncode-8 277MB/s ± 1% 277MB/s ± 0% ~ (p=0.075 n=21+21) Gzip-8 116MB/s ± 1% 115MB/s ± 0% -0.25% (p=0.000 n=20+20) Gunzip-8 773MB/s ± 0% 797MB/s ± 0% +3.12% (p=0.000 n=21+21) JSONEncode-8 283MB/s ± 0% 290MB/s ± 0% +2.35% (p=0.000 n=20+19) JSONDecode-8 63.0MB/s ± 1% 63.5MB/s ± 1% +0.82% (p=0.000 n=20+20) GoParse-8 22.2MB/s ± 2% 22.3MB/s ± 1% ~ (p=0.539 n=21+20) RegexpMatchEasy0_32-8 660MB/s ± 2% 697MB/s ± 3% +5.57% (p=0.000 n=20+21) RegexpMatchEasy0_1K-8 7.36GB/s ± 0% 7.34GB/s ± 0% -0.26% (p=0.000 n=18+20) RegexpMatchEasy1_32-8 775MB/s ± 0% 761MB/s ± 4% -1.88% (p=0.000 n=17+21) RegexpMatchEasy1_1K-8 4.74GB/s ± 2% 4.74GB/s ± 0% -0.18% (p=0.020 n=21+19) RegexpMatchMedium_32-8 40.6MB/s ± 7% 39.9MB/s ± 9% ~ (p=0.191 n=21+21) RegexpMatchMedium_1K-8 43.7MB/s ± 5% 43.2MB/s ± 5% ~ (p=0.435 n=21+21) RegexpMatchHard_32-8 29.3MB/s ± 1% 29.4MB/s ± 1% +0.53% (p=0.000 n=19+18) RegexpMatchHard_1K-8 31.0MB/s ± 0% 31.0MB/s ± 0% ~ (p=0.572 n=21+20) Revcomp-8 730MB/s ± 0% 720MB/s ± 0% -1.36% (p=0.000 n=17+18) Template-8 46.2MB/s ± 1% 46.3MB/s ± 1% +0.30% (p=0.041 n=20+20) [Geo mean] 204MB/s 205MB/s +0.30% Change-Id: Iac75d0ec184a515ce0e65e19559d5fe2e9840514 Reviewed-on: https://go-review.googlesource.com/c/go/+/354970 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Trust: Josh Bleecher Snyder <josharian@gmail.com> Trust: Keith Randall <khr@golang.org> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> |
||
|
|
fdc2072420 |
test/codegen: remove broken riscv64 test
This test is not executed by default (see #48247) and does not actually pass. It was added in CL 346689. The code generation changes made in that CL only change how instructions are assembled, they do not actually affect the output of the compiler. This test is unfortunately therefore invalid and will never pass. Updates #48247. Change-Id: I0c807e4a111336e5a097fe4e3af2805f9932a87f Reviewed-on: https://go-review.googlesource.com/c/go/+/348390 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> |
||
|
|
37d4532867 |
cmd/internal/obj/riscv: simplify addition with constant
This CL simplifies riscv addition (add r, imm) to
(ADDI (ADDI r, imm/2), imm-imm/2) if imm is in specific ranges.
(-4096 <= imm <= -2049 or 2048 <= imm <= 4094)
There is little impact to the go1 benchmark, while the total
size of pkg/linux_riscv64 decreased by about 11KB.
Change-Id: I236eb8af3b83bb35ce9c0b318fc1d235e8ab9a4e
GitHub-Last-Rev:
|
||
|
|
6cf1d5d0fa |
cmd/compile: generic SSA rules for simplifying 2 and 3 operand integer arithmetic expressions
This applies the following generic integer addition/subtraction transformations: x - (x + y) = -y (x - y) - x = -y y + (x - y) = x y + (z + (x - y) = x + z There's over 40 unique functions matching in Go. Hits 2 funcs in the runtime itself: runtime.stackfree() runtime.runqdrain() Go binary size reduced by 0.05% on Linux x86_64. StackCopy bench (perflocked Cascade Lake x86): name old time/op new time/op delta StackCopyPtr-8 87.3ms ± 1% 86.9ms ± 0% -0.45% (p=0.000 n=20+20) StackCopy-8 77.6ms ± 1% 77.0ms ± 0% -0.76% (p=0.000 n=20+20) StackCopyNoCache-8 2.28ms ± 2% 2.26ms ± 2% -0.93% (p=0.008 n=19+20) test/bench/go1 benchmarks (perflocked Cascade Lake x86): name old time/op new time/op delta BinaryTree17-8 1.88s ± 1% 1.88s ± 0% ~ (p=0.373 n=15+12) Fannkuch11-8 2.31s ± 0% 2.35s ± 0% +1.52% (p=0.000 n=15+14) FmtFprintfEmpty-8 26.6ns ± 0% 26.6ns ± 0% ~ (p=0.081 n=14+13) FmtFprintfString-8 48.6ns ± 0% 50.0ns ± 0% +2.86% (p=0.000 n=15+14) FmtFprintfInt-8 56.9ns ± 0% 54.8ns ± 0% -3.70% (p=0.000 n=15+15) FmtFprintfIntInt-8 90.4ns ± 0% 88.8ns ± 0% -1.78% (p=0.000 n=15+15) FmtFprintfPrefixedInt-8 104ns ± 0% 104ns ± 0% ~ (p=0.905 n=14+13) FmtFprintfFloat-8 148ns ± 0% 144ns ± 0% -2.19% (p=0.000 n=14+15) FmtManyArgs-8 389ns ± 0% 390ns ± 0% +0.35% (p=0.000 n=12+15) GobDecode-8 3.90ms ± 1% 3.88ms ± 0% -0.49% (p=0.000 n=15+14) GobEncode-8 2.73ms ± 0% 2.73ms ± 0% ~ (p=0.425 n=15+14) Gzip-8 169ms ± 0% 168ms ± 0% -0.52% (p=0.000 n=13+13) Gunzip-8 24.7ms ± 0% 24.8ms ± 0% +0.61% (p=0.000 n=15+15) HTTPClientServer-8 60.5µs ± 6% 60.4µs ± 7% ~ (p=0.595 n=15+15) JSONEncode-8 6.97ms ± 1% 6.93ms ± 0% -0.69% (p=0.000 n=14+14) JSONDecode-8 31.2ms ± 1% 30.8ms ± 1% -1.27% (p=0.000 n=14+14) Mandelbrot200-8 3.87ms ± 0% 3.87ms ± 0% ~ (p=0.652 n=15+14) GoParse-8 2.65ms ± 2% 2.64ms ± 1% ~ (p=0.202 n=15+15) RegexpMatchEasy0_32-8 45.1ns ± 0% 45.9ns ± 0% +1.68% (p=0.000 n=14+15) RegexpMatchEasy0_1K-8 140ns ± 0% 139ns ± 0% -0.44% (p=0.000 n=15+14) RegexpMatchEasy1_32-8 40.9ns ± 3% 40.5ns ± 0% -0.88% (p=0.000 n=15+13) RegexpMatchEasy1_1K-8 215ns ± 1% 220ns ± 1% +2.27% (p=0.000 n=15+15) RegexpMatchMedium_32-8 783ns ± 7% 738ns ± 0% ~ (p=0.361 n=15+15) RegexpMatchMedium_1K-8 24.1µs ± 6% 23.4µs ± 6% -2.94% (p=0.004 n=15+15) RegexpMatchHard_32-8 1.10µs ± 1% 1.09µs ± 1% -0.40% (p=0.006 n=15+14) RegexpMatchHard_1K-8 33.0µs ± 0% 33.0µs ± 0% ~ (p=0.535 n=12+14) Revcomp-8 354ms ± 0% 353ms ± 0% -0.23% (p=0.002 n=15+13) Template-8 42.0ms ± 1% 41.8ms ± 2% -0.37% (p=0.023 n=14+15) TimeParse-8 181ns ± 0% 180ns ± 1% -0.18% (p=0.014 n=12+13) TimeFormat-8 240ns ± 0% 242ns ± 1% +0.69% (p=0.000 n=12+15) [Geo mean] 35.2µs 35.1µs -0.43% name old speed new speed delta GobDecode-8 197MB/s ± 1% 198MB/s ± 0% +0.49% (p=0.000 n=15+14) GobEncode-8 281MB/s ± 0% 281MB/s ± 0% ~ (p=0.419 n=15+14) Gzip-8 115MB/s ± 0% 115MB/s ± 0% +0.52% (p=0.000 n=13+13) Gunzip-8 786MB/s ± 0% 781MB/s ± 0% -0.60% (p=0.000 n=15+15) JSONEncode-8 278MB/s ± 1% 280MB/s ± 0% +0.69% (p=0.000 n=14+14) JSONDecode-8 62.3MB/s ± 1% 63.1MB/s ± 1% +1.29% (p=0.000 n=14+14) GoParse-8 21.9MB/s ± 2% 22.0MB/s ± 1% ~ (p=0.205 n=15+15) RegexpMatchEasy0_32-8 709MB/s ± 0% 697MB/s ± 0% -1.65% (p=0.000 n=14+15) RegexpMatchEasy0_1K-8 7.34GB/s ± 0% 7.37GB/s ± 0% +0.43% (p=0.000 n=15+15) RegexpMatchEasy1_32-8 783MB/s ± 2% 790MB/s ± 0% +0.88% (p=0.000 n=15+13) RegexpMatchEasy1_1K-8 4.77GB/s ± 1% 4.66GB/s ± 1% -2.23% (p=0.000 n=15+15) RegexpMatchMedium_32-8 41.0MB/s ± 7% 43.3MB/s ± 0% ~ (p=0.360 n=15+15) RegexpMatchMedium_1K-8 42.5MB/s ± 6% 43.8MB/s ± 6% +3.07% (p=0.004 n=15+15) RegexpMatchHard_32-8 29.2MB/s ± 1% 29.3MB/s ± 1% +0.41% (p=0.006 n=15+14) RegexpMatchHard_1K-8 31.1MB/s ± 0% 31.1MB/s ± 0% ~ (p=0.495 n=12+14) Revcomp-8 718MB/s ± 0% 720MB/s ± 0% +0.23% (p=0.002 n=15+13) Template-8 46.3MB/s ± 1% 46.4MB/s ± 2% +0.38% (p=0.021 n=14+15) [Geo mean] 205MB/s 206MB/s +0.57% Change-Id: Ibd1afdf8b6c0b08087dcc3acd8f943637eb95ac0 Reviewed-on: https://go-review.googlesource.com/c/go/+/344930 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com> |
||
|
|
165d39a1d4 |
[dev.typeparams] test: adjust codegen test for register ABI on ARM64
In codegen/arithmetic.go, previously there are MOVD's that match for loads of arguments. With register ABI there are no more such loads. Remove the MOVD matches. Change-Id: I920ee2629c8c04d454f13a0c08e283d3528d9a64 Reviewed-on: https://go-review.googlesource.com/c/go/+/324251 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> |
||
|
|
263e13d1f7 |
test: make codegen tests work with both ABIs
Some codegen tests were written with the assumption that arguments and results are in memory, and with a specific stack layout. With the register ABI, the assumption is no longer true. Adjust the tests to work with both cases. - For tests expecting in memory arguments/results, change to use global variables or memory-assigned argument/results. - Allow more registers. E.g. some tests expecting register names contain only letters (e.g. AX), but it can also contain numbers (e.g. R10). - Some instruction selection changes when operate on register vs. memory, e.g. ADDQ vs. LEAQ, MOVB vs. MOVL. Accept both. TODO: mathbits.go and memops.go still need fix. Change-Id: Ic5932b4b5dd3f5d30ed078d296476b641420c4c5 Reviewed-on: https://go-review.googlesource.com/c/go/+/309335 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
||
|
|
600259b099 |
cmd/compile: use depth first topological sort algorithm for layout
The current layout algorithm tries to put consecutive blocks together, so the priority of the successor block is higher than the priority of the zero indegree block. This algorithm is beneficial for subsequent register allocation, but will result in more branch instructions. The depth-first topological sorting algorithm is a well-known layout algorithm, which has applications in many languages, and it helps to reduce branch instructions. This CL applies it to the layout pass. The test results show that it helps to reduce the code size. This CL also includes the following changes: 1, Removed the primary predecessor mechanism. The new layout algorithm is not very friendly to register allocator in some cases, in order to adapt to the new layout algorithm, a new primary predecessor selection strategy is introduced. 2, Since the new layout implementation may place non-loop blocks between loop blocks, some adaptive modifications have also been made to looprotate pass. 3, The layout also affects the results of codegen, so this CL also adjusted several codegen tests accordingly. It is inevitable that this CL will cause the code size or performance of a few functions to decrease, but the number of cases it improves is much larger than the number of cases it drops. Statistical data from compilecmp on linux/amd64 is as follow: name old time/op new time/op delta Template 382ms ± 4% 382ms ± 4% ~ (p=0.497 n=49+50) Unicode 170ms ± 9% 169ms ± 8% ~ (p=0.344 n=48+50) GoTypes 2.01s ± 4% 2.01s ± 4% ~ (p=0.628 n=50+48) Compiler 190ms ±10% 189ms ± 9% ~ (p=0.734 n=50+50) SSA 11.8s ± 2% 11.8s ± 3% ~ (p=0.877 n=50+50) Flate 241ms ± 9% 241ms ± 8% ~ (p=0.897 n=50+49) GoParser 366ms ± 3% 361ms ± 4% -1.21% (p=0.004 n=47+50) Reflect 835ms ± 3% 838ms ± 3% ~ (p=0.275 n=50+49) Tar 336ms ± 4% 335ms ± 3% ~ (p=0.454 n=48+48) XML 433ms ± 4% 431ms ± 3% ~ (p=0.071 n=49+48) LinkCompiler 706ms ± 4% 705ms ± 4% ~ (p=0.608 n=50+49) ExternalLinkCompiler 1.85s ± 3% 1.83s ± 2% -1.47% (p=0.000 n=49+48) LinkWithoutDebugCompiler 437ms ± 5% 437ms ± 6% ~ (p=0.953 n=49+50) [Geo mean] 615ms 613ms -0.37% name old alloc/op new alloc/op delta Template 38.7MB ± 1% 38.7MB ± 1% ~ (p=0.834 n=50+50) Unicode 28.1MB ± 0% 28.1MB ± 0% -0.22% (p=0.000 n=49+50) GoTypes 168MB ± 1% 168MB ± 1% ~ (p=0.054 n=47+47) Compiler 23.0MB ± 1% 23.0MB ± 1% ~ (p=0.432 n=50+50) SSA 1.54GB ± 0% 1.54GB ± 0% +0.21% (p=0.000 n=50+50) Flate 23.6MB ± 1% 23.6MB ± 1% ~ (p=0.153 n=43+46) GoParser 35.1MB ± 1% 35.1MB ± 2% ~ (p=0.202 n=50+50) Reflect 84.7MB ± 1% 84.7MB ± 1% ~ (p=0.333 n=48+49) Tar 34.5MB ± 1% 34.5MB ± 1% ~ (p=0.406 n=46+49) XML 44.3MB ± 2% 44.2MB ± 3% ~ (p=0.981 n=50+50) LinkCompiler 131MB ± 0% 128MB ± 0% -2.74% (p=0.000 n=50+50) ExternalLinkCompiler 120MB ± 0% 120MB ± 0% +0.01% (p=0.007 n=50+50) LinkWithoutDebugCompiler 77.3MB ± 0% 77.3MB ± 0% -0.02% (p=0.000 n=50+50) [Geo mean] 69.3MB 69.1MB -0.22% file before after Δ % addr2line 4104220 4043684 -60536 -1.475% api 5342502 5249678 -92824 -1.737% asm 4973785 4858257 -115528 -2.323% buildid 2667844 2625660 -42184 -1.581% cgo 4686849 4616313 -70536 -1.505% compile 23667431 23268406 -399025 -1.686% cover 4959676 4874108 -85568 -1.725% dist 3515934 3450422 -65512 -1.863% doc 3995581 3925469 -70112 -1.755% fix 3379202 3318522 -60680 -1.796% link 6743249 6629913 -113336 -1.681% nm 4047529 3991777 -55752 -1.377% objdump 4456151 4388151 -68000 -1.526% pack 2435040 2398072 -36968 -1.518% pprof 13804080 13565808 -238272 -1.726% test2json 2690043 2645987 -44056 -1.638% trace 10418492 10232716 -185776 -1.783% vet 7258259 7121259 -137000 -1.888% total 113145867 111204202 -1941665 -1.716% The situation on linux/arm64 is as follow: name old time/op new time/op delta Template 280ms ± 1% 282ms ± 1% +0.75% (p=0.000 n=46+48) Unicode 124ms ± 2% 124ms ± 2% +0.37% (p=0.045 n=50+50) GoTypes 1.69s ± 1% 1.70s ± 1% +0.56% (p=0.000 n=49+50) Compiler 122ms ± 1% 123ms ± 1% +0.93% (p=0.000 n=50+50) SSA 12.6s ± 1% 12.7s ± 0% +0.72% (p=0.000 n=50+50) Flate 170ms ± 1% 172ms ± 1% +0.97% (p=0.000 n=49+49) GoParser 262ms ± 1% 263ms ± 1% +0.39% (p=0.000 n=49+48) Reflect 639ms ± 1% 650ms ± 1% +1.63% (p=0.000 n=49+49) Tar 243ms ± 1% 245ms ± 1% +0.82% (p=0.000 n=50+50) XML 324ms ± 1% 327ms ± 1% +0.72% (p=0.000 n=50+49) LinkCompiler 597ms ± 1% 596ms ± 1% -0.27% (p=0.001 n=48+47) ExternalLinkCompiler 1.90s ± 1% 1.88s ± 1% -1.00% (p=0.000 n=50+50) LinkWithoutDebugCompiler 364ms ± 1% 363ms ± 1% ~ (p=0.220 n=49+50) [Geo mean] 485ms 488ms +0.49% name old alloc/op new alloc/op delta Template 38.7MB ± 0% 38.8MB ± 1% ~ (p=0.093 n=43+49) Unicode 28.4MB ± 0% 28.4MB ± 0% +0.03% (p=0.000 n=49+45) GoTypes 169MB ± 1% 169MB ± 1% +0.23% (p=0.010 n=50+50) Compiler 23.2MB ± 1% 23.2MB ± 1% +0.11% (p=0.000 n=40+44) SSA 1.54GB ± 0% 1.55GB ± 0% +0.45% (p=0.000 n=47+49) Flate 23.8MB ± 2% 23.8MB ± 1% ~ (p=0.543 n=50+50) GoParser 35.3MB ± 1% 35.4MB ± 1% ~ (p=0.792 n=50+50) Reflect 85.2MB ± 1% 85.2MB ± 0% ~ (p=0.055 n=50+47) Tar 34.5MB ± 1% 34.5MB ± 1% +0.06% (p=0.015 n=50+50) XML 43.8MB ± 2% 43.9MB ± 2% +0.19% (p=0.000 n=48+48) LinkCompiler 137MB ± 0% 136MB ± 0% -0.92% (p=0.000 n=50+50) ExternalLinkCompiler 127MB ± 0% 127MB ± 0% ~ (p=0.516 n=50+50) LinkWithoutDebugCompiler 84.0MB ± 0% 84.0MB ± 0% ~ (p=0.057 n=50+50) [Geo mean] 70.4MB 70.4MB +0.01% file before after Δ % addr2line 4021557 4002933 -18624 -0.463% api 5127847 5028503 -99344 -1.937% asm 5034716 4936836 -97880 -1.944% buildid 2608118 2594094 -14024 -0.538% cgo 4488592 4398320 -90272 -2.011% compile 22501129 22213592 -287537 -1.278% cover 4742301 4713573 -28728 -0.606% dist 3388071 3365311 -22760 -0.672% doc 3802250 3776082 -26168 -0.688% fix 3306147 3216939 -89208 -2.698% link 6404483 6363699 -40784 -0.637% nm 3941026 3921930 -19096 -0.485% objdump 4383330 4295122 -88208 -2.012% pack 2404547 2389515 -15032 -0.625% pprof 12996234 12856818 -139416 -1.073% test2json 2668500 2586788 -81712 -3.062% trace 9816276 9609580 -206696 -2.106% vet 6900682 6787338 -113344 -1.643% total 108535806 107056973 -1478833 -1.363% Change-Id: Iaec1cdcaacca8025e9babb0fb8a532fddb70c87d Reviewed-on: https://go-review.googlesource.com/c/go/+/255239 Reviewed-by: eric fang <eric.fang@arm.com> Reviewed-by: Keith Randall <khr@golang.org> Trust: eric fang <eric.fang@arm.com> |
||
|
|
04b8a9fea5 |
all: implement GO386=softfloat
Backstop support for non-sse2 chips now that 387 is gone. RELNOTE=yes Change-Id: Ib10e69c4a3654c15a03568f93393437e1939e013 Reviewed-on: https://go-review.googlesource.com/c/go/+/260017 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> |
||
|
|
fe2cfb74ba |
all: drop 387 support
My last 387 CL. So sad ... ... ... ... not! Fixes #40255 Change-Id: I8d4ddb744b234b8adc735db2f7c3c7b6d8bbdfa4 Reviewed-on: https://go-review.googlesource.com/c/go/+/258957 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
|
7615b20d06 |
cmd/compile: generate subfic on ppc64
This merges an lis + subf into subfic, and for 32b constants lwa + subf into oris + ori + subf. The carry bit is no longer used in code generation, therefore I think we can clobber it as needed. Note, lowered borrow/carry arithmetic is self-contained and thus is not affected. A few extra rules are added to ensure early transformations to SUBFCconst don't trip up earlier rules, fold constant operations, or otherwise simplify lowering. Likewise, tests are added to ensure all rules are hit. Generic constant folding catches trivial cases, however some lowering rules insert arithmetic which can introduce new opportunities (e.g BitLen or Slicemask). I couldn't find a specific benchmark to demonstrate noteworthy improvements, but this is generating subfic in many of the default bent test binaries, so we are at least saving a little code space. Change-Id: Iad7c6e5767eaa9dc24dc1c989bd1c8cfe1982012 Reviewed-on: https://go-review.googlesource.com/c/go/+/249461 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> |
||
|
|
d556c251a1 |
cmd/compile: add more generic rewrite rules to reassociate (op (op y C) x|C)
With this patch, opt pass can expose more obvious constant-folding
opportunites.
Example:
func test(i int) int {return (i+8)-(i+4)}
The previous version:
MOVD "".i(FP), R0
ADD $8, R0, R1
ADD $4, R0, R0
SUB R0, R1, R0
MOVD R0, "".~r1+8(FP)
RET (R30)
The optimized version:
MOVD $4, R0
MOVD R0, "".~r1+8(FP)
RET (R30)
This patch removes some existing reassociation rules, such as "x+(z-C)",
because the current generic rewrite rules will canonicalize "x-const"
to "x+(-const)", making "x+(z-C)" equal to "x+(z+(-C))".
This patch also adds test cases.
Change-Id: I857108ba0b5fcc18a879eeab38e2551bc4277797
Reviewed-on: https://go-review.googlesource.com/c/go/+/237137
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
||
|
|
e7c7ce646f |
cmd/compile: combine multiply/add into maddld on ppc64le/power9
Add a new lowering rule to match and replace such instances with the MADDLD instruction available on power9 where possible. Likewise, this plumbs in a new ppc64 ssa opcode to house the newly generated MADDLD instructions. When testing ed25519, this reduced binary size by 936B. Similarly, MADDLD combination occcurs in a few other less obvious cases such as division by constant. Testing of golang.org/x/crypto/ed25519 shows non-trivial speedup during keygeneration: name old time/op new time/op delta KeyGeneration 65.2µs ± 0% 63.1µs ± 0% -3.19% Signing 64.3µs ± 0% 64.4µs ± 0% +0.16% Verification 147µs ± 0% 147µs ± 0% +0.11% Similarly, this test binary has shrunk by 66488B. Change-Id: I077aeda7943119b41f07e4e62e44a648f16e4ad0 Reviewed-on: https://go-review.googlesource.com/c/go/+/248723 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> |
||
|
|
6e876f1985 |
cmd/compile: clean up and optimize s390x multiplication rules
Some of the existing optimizations aren't triggered because they are handled by the generic rules so this CL removes them. Also some constraints were copied without much thought from the amd64 rules and they don't make sense on s390x, so we remove those constraints. Finally, add a 'multiply by the sum of two powers of two' optimization. This makes sense on s390x as shifts are low latency and can also sometimes be optimized further (especially if we add support for RISBG instructions). name old time/op new time/op delta IntMulByConst/3-8 1.70ns ±11% 1.10ns ± 5% -35.26% (p=0.000 n=10+10) IntMulByConst/5-8 1.64ns ± 7% 1.10ns ± 4% -32.94% (p=0.000 n=10+9) IntMulByConst/12-8 1.65ns ± 6% 1.20ns ± 4% -27.16% (p=0.000 n=10+9) IntMulByConst/120-8 1.66ns ± 4% 1.22ns ±13% -26.43% (p=0.000 n=10+10) IntMulByConst/-120-8 1.65ns ± 7% 1.19ns ± 4% -28.06% (p=0.000 n=9+10) IntMulByConst/65537-8 0.86ns ± 9% 1.12ns ±12% +30.41% (p=0.000 n=10+10) IntMulByConst/65538-8 1.65ns ± 5% 1.23ns ± 5% -25.11% (p=0.000 n=10+10) Change-Id: Ib196e6bff1e97febfd266134d0a2b2a62897989f Reviewed-on: https://go-review.googlesource.com/c/go/+/248937 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
|
2cb10d42b7 |
cmd/compile: in prove, zero right shifts of positive int by #bits - 1
Taking over Zach's CL 212277. Just cleaned up and added a test.
For a positive, signed integer, an arithmetic right shift of count
(bit-width - 1) equals zero. e.g. int64(22) >> 63 -> 0. This CL makes
prove replace these right shifts with a zero-valued constant.
These shifts may arise in source code explicitly, but can also be
created by the generic rewrite of signed division by a power of 2.
// Signed divide by power of 2.
// n / c = n >> log(c) if n >= 0
// = (n+c-1) >> log(c) if n < 0
// We conditionally add c-1 by adding n>>63>>(64-log(c))
(first shift signed, second shift unsigned).
(Div64 <t> n (Const64 [c])) && isPowerOfTwo(c) ->
(Rsh64x64
(Add64 <t> n (Rsh64Ux64 <t>
(Rsh64x64 <t> n (Const64 <typ.UInt64> [63]))
(Const64 <typ.UInt64> [64-log2(c)])))
(Const64 <typ.UInt64> [log2(c)]))
If n is known to be positive, this rewrite includes an extra Add and 2
extra Rsh. This CL will allow prove to replace one of the extra Rsh with
a 0. That replacement then allows lateopt to remove all the unneccesary
fixups from the generic rewrite.
There is a rewrite rule to handle this case directly:
(Div64 n (Const64 [c])) && isNonNegative(n) && isPowerOfTwo(c) ->
(Rsh64Ux64 n (Const64 <typ.UInt64> [log2(c)]))
But this implementation of isNonNegative really only handles constants
and a few special operations like len/cap. The division could be
handled if the factsTable version of isNonNegative were available.
Unfortunately, the first opt pass happens before prove even has a
chance to deduce the numerator is non-negative, so the generic rewrite
has already fired and created the extra Ops discussed above.
Fixes #36159
By Printf count, this zeroes 137 right shifts when building std and cmd.
Change-Id: Iab486910ac9d7cfb86ace2835456002732b384a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/232857
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|