mirror of
https://github.com/golang/go.git
synced 2025-12-08 06:10:04 +00:00
620 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
e7d47ac33d |
cmd/compile: simplify negative on multiplication
goos: linux
goarch: amd64
pkg: cmd/compile/internal/test
cpu: AMD EPYC 7532 32-Core Processor
│ simplify_base │ simplify_new │
│ sec/op │ sec/op vs base │
SimplifyNegMul 623.0n ± 0% 319.3n ± 1% -48.75% (p=0.000 n=10)
goos: linux
goarch: riscv64
pkg: cmd/compile/internal/test
cpu: Spacemit(R) X60
│ simplify.base │ simplify.new │
│ sec/op │ sec/op vs base │
SimplifyNegMul 10.928µ ± 0% 6.432µ ± 0% -41.14% (p=0.000 n=10)
Change-Id: I1d9393cd19a0b948a5d3a512d627cdc0cf0b38be
Reviewed-on: https://go-review.googlesource.com/c/go/+/721520
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
||
|
|
32f5aadd2f |
cmd/compile: stack allocate backing stores during append
We can already stack allocate the backing store during append if the
resulting backing store doesn't escape. See CL 664299.
This CL enables us to often stack allocate the backing store during
append *even if* the result escapes. Typically, for code like:
func f(n int) []int {
var r []int
for i := range n {
r = append(r, i)
}
return r
}
the backing store for r escapes, but only by returning it.
Could we operate with r on the stack for most of its lifeime,
and only move it to the heap at the return point?
The current implementation of append will need to do an allocation
each time it calls growslice. This will happen on the 1st, 2nd, 4th,
8th, etc. append calls. The allocations done by all but the
last growslice call will then immediately be garbage.
We'd like to avoid doing some of those intermediate allocations
if possible. We rewrite the above code by introducing a move2heap
operation:
func f(n int) []int {
var r []int
for i := range n {
r = append(r, i)
}
r = move2heap(r)
return r
}
Using the move2heap runtime function, which does:
move2heap(r):
If r is already backed by heap storage, return r.
Otherwise, copy r to the heap and return the copy.
Now we can treat the backing store of r allocated at the
append site as not escaping. Previous stack allocation
optimizations now apply, which can use a fixed-size
stack-allocated backing store for r when appending.
See the description in cmd/compile/internal/slice/slice.go
for how we ensure that this optimization is safe.
Change-Id: I81f36e58bade2241d07f67967d8d547fff5302b8
Reviewed-on: https://go-review.googlesource.com/c/go/+/707755
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
||
|
|
ba634ca5c7 |
cmd/compile: fold boolean NOT into branches
Gets rid of an EOR $1 instruction. Change-Id: Ib032b0cee9ac484329c978af9b1305446f8d5dac Reviewed-on: https://go-review.googlesource.com/c/go/+/721501 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@google.com> |
||
|
|
e1a12c781f |
cmd/compile: use 32x32->64 multiplies on arm64
Gets rid of some sign extensions. Change-Id: Ie67ef36b4ca1cd1a2cd9fa5d84578db553578a22 Reviewed-on: https://go-review.googlesource.com/c/go/+/721241 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@google.com> |
||
|
|
2cdcc4150b |
cmd/compile: fold negation into multiplication
goos: linux
goarch: riscv64
pkg: cmd/compile/internal/test
cpu: Spacemit(R) X60
│ /root/mul.base.log │ /root/mul.new.log │
│ sec/op │ sec/op vs base │
MulNeg 6.426µ ± 0% 4.501µ ± 0% -29.96% (p=0.000 n=10)
Mul2Neg 9.000µ ± 0% 6.431µ ± 0% -28.54% (p=0.000 n=10)
Mul2 1.263µ ± 0% 1.263µ ± 0% ~ (p=1.000 n=10)
MulNeg2 1.577µ ± 0% 1.577µ ± 0% ~ (p=0.211 n=10)
geomean 3.276µ 2.756µ -15.89%
goos: linux
goarch: amd64
pkg: cmd/compile/internal/test
cpu: AMD EPYC 7532 32-Core Processor
│ /root/base │ /root/new │
│ sec/op │ sec/op vs base │
MulNeg 691.9n ± 1% 319.4n ± 0% -53.83% (p=0.000 n=10)
Mul2Neg 630.0n ± 0% 629.6n ± 0% -0.07% (p=0.000 n=10)
Mul2 438.1n ± 0% 438.1n ± 0% ~ (p=0.728 n=10)
MulNeg2 439.3n ± 0% 439.4n ± 0% ~ (p=0.656 n=10)
geomean 538.2n 443.6n -17.58%
Change-Id: Ice8e6c8d1e8e3009ba8a0b1b689205174e199019
Reviewed-on: https://go-review.googlesource.com/c/go/+/720180
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
||
|
|
0a569528ea |
cmd/compile: optimize comparisons with single bit difference
Optimize comparisons with constants that only differ by 1 bit (i.e.
a power of 2). For example:
x == 4 || x == 6 -> x|2 == 6
x != 1 && x != 5 -> x|4 != 5
Change-Id: Ic61719e5118446d21cf15652d9da22f7d95b2a15
Reviewed-on: https://go-review.googlesource.com/c/go/+/719420
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
||
|
|
d50a571ddf |
test: fix tests to work with sizespecializedmalloc turned off
Cq-Include-Trybots: luci.golang.try:gotip-linux-386-nosizespecializedmalloc,gotip-linux-amd64-nosizespecializedmalloc,gotip-linux-arm64-nosizespecializedmalloc Change-Id: I6a6a696465004b939c989afc058c4c3e1fb7134f Reviewed-on: https://go-review.googlesource.com/c/go/+/720401 Auto-Submit: Michael Matloob <matloob@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Matloob <matloob@google.com> |
||
|
|
34aef89366 |
cmd/compile: use FCLASSD for subnormal checks on riscv64
Only implemented for 64 bit floating point operations for now.
goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
│ sec/op │ sec/op vs base │
Acos 154.1n ± 0% 154.1n ± 0% ~ (p=0.303 n=10)
Acosh 215.8n ± 6% 226.7n ± 0% ~ (p=0.439 n=10)
Asin 149.2n ± 1% 149.2n ± 0% ~ (p=0.700 n=10)
Asinh 262.1n ± 0% 258.5n ± 0% -1.37% (p=0.000 n=10)
Atan 99.48n ± 0% 99.49n ± 0% ~ (p=0.836 n=10)
Atanh 244.9n ± 0% 243.8n ± 0% -0.43% (p=0.002 n=10)
Atan2 158.2n ± 1% 153.3n ± 0% -3.10% (p=0.000 n=10)
Cbrt 186.8n ± 0% 181.1n ± 0% -3.03% (p=0.000 n=10)
Ceil 36.71n ± 1% 36.71n ± 0% ~ (p=0.434 n=10)
Copysign 6.531n ± 1% 6.526n ± 0% ~ (p=0.268 n=10)
Cos 98.19n ± 0% 95.40n ± 0% -2.84% (p=0.000 n=10)
Cosh 233.1n ± 0% 222.6n ± 0% -4.50% (p=0.000 n=10)
Erf 122.5n ± 0% 114.2n ± 0% -6.78% (p=0.000 n=10)
Erfc 126.0n ± 1% 116.6n ± 0% -7.46% (p=0.000 n=10)
Erfinv 138.8n ± 0% 138.6n ± 0% ~ (p=0.082 n=10)
Erfcinv 140.0n ± 0% 139.7n ± 0% ~ (p=0.359 n=10)
Exp 193.3n ± 0% 184.2n ± 0% -4.68% (p=0.000 n=10)
ExpGo 204.8n ± 0% 194.5n ± 0% -5.03% (p=0.000 n=10)
Expm1 152.5n ± 1% 145.0n ± 0% -4.92% (p=0.000 n=10)
Exp2 174.5n ± 0% 164.2n ± 0% -5.85% (p=0.000 n=10)
Exp2Go 184.4n ± 1% 175.4n ± 0% -4.88% (p=0.000 n=10)
Abs 4.912n ± 0% 4.914n ± 0% ~ (p=0.283 n=10)
Dim 15.50n ± 1% 15.52n ± 1% ~ (p=0.331 n=10)
Floor 36.89n ± 1% 36.76n ± 1% ~ (p=0.325 n=10)
Max 31.05n ± 1% 31.17n ± 1% ~ (p=0.628 n=10)
Min 31.01n ± 0% 31.06n ± 0% ~ (p=0.767 n=10)
Mod 294.1n ± 0% 245.6n ± 0% -16.52% (p=0.000 n=10)
Frexp 44.86n ± 1% 35.20n ± 0% -21.53% (p=0.000 n=10)
Gamma 195.8n ± 0% 185.4n ± 1% -5.29% (p=0.000 n=10)
Hypot 84.91n ± 0% 84.54n ± 1% -0.43% (p=0.006 n=10)
HypotGo 96.70n ± 0% 95.42n ± 1% -1.32% (p=0.000 n=10)
Ilogb 45.03n ± 0% 35.07n ± 1% -22.10% (p=0.000 n=10)
J0 634.5n ± 0% 627.2n ± 0% -1.16% (p=0.000 n=10)
J1 644.5n ± 0% 636.9n ± 0% -1.18% (p=0.000 n=10)
Jn 1.357µ ± 0% 1.344µ ± 0% -0.92% (p=0.000 n=10)
Ldexp 49.89n ± 0% 39.96n ± 0% -19.90% (p=0.000 n=10)
Lgamma 186.6n ± 0% 184.3n ± 0% -1.21% (p=0.000 n=10)
Log 150.4n ± 0% 141.1n ± 0% -6.15% (p=0.000 n=10)
Logb 46.70n ± 0% 35.89n ± 0% -23.15% (p=0.000 n=10)
Log1p 164.1n ± 0% 163.9n ± 0% ~ (p=0.122 n=10)
Log10 153.1n ± 0% 143.5n ± 0% -6.24% (p=0.000 n=10)
Log2 58.83n ± 0% 49.75n ± 0% -15.43% (p=0.000 n=10)
Modf 40.82n ± 1% 40.78n ± 0% ~ (p=0.239 n=10)
Nextafter32 49.15n ± 0% 48.93n ± 0% -0.44% (p=0.011 n=10)
Nextafter64 43.33n ± 0% 43.23n ± 0% ~ (p=0.228 n=10)
PowInt 269.4n ± 0% 243.8n ± 0% -9.49% (p=0.000 n=10)
PowFrac 618.0n ± 0% 571.7n ± 0% -7.48% (p=0.000 n=10)
Pow10Pos 13.09n ± 0% 13.05n ± 0% -0.31% (p=0.003 n=10)
Pow10Neg 30.99n ± 1% 30.99n ± 0% ~ (p=0.173 n=10)
Round 23.73n ± 0% 23.65n ± 0% -0.36% (p=0.011 n=10)
RoundToEven 27.87n ± 0% 27.73n ± 0% -0.48% (p=0.003 n=10)
Remainder 282.1n ± 0% 249.6n ± 0% -11.52% (p=0.000 n=10)
Signbit 11.46n ± 0% 11.42n ± 0% -0.39% (p=0.003 n=10)
Sin 115.2n ± 0% 113.2n ± 0% -1.74% (p=0.000 n=10)
Sincos 140.6n ± 0% 138.6n ± 0% -1.39% (p=0.000 n=10)
Sinh 252.0n ± 0% 241.4n ± 0% -4.21% (p=0.000 n=10)
SqrtIndirect 4.909n ± 0% 4.893n ± 0% -0.34% (p=0.021 n=10)
SqrtLatency 19.57n ± 1% 19.57n ± 0% ~ (p=0.087 n=10)
SqrtIndirectLatency 19.64n ± 0% 19.57n ± 0% -0.36% (p=0.025 n=10)
SqrtGoLatency 198.1n ± 0% 197.4n ± 0% -0.35% (p=0.014 n=10)
SqrtPrime 5.733µ ± 0% 5.725µ ± 0% ~ (p=0.116 n=10)
Tan 149.1n ± 0% 146.8n ± 0% -1.54% (p=0.000 n=10)
Tanh 248.2n ± 1% 238.1n ± 0% -4.05% (p=0.000 n=10)
Trunc 36.86n ± 0% 36.70n ± 0% -0.43% (p=0.029 n=10)
Y0 638.2n ± 0% 633.6n ± 0% -0.71% (p=0.000 n=10)
Y1 641.8n ± 0% 636.1n ± 0% -0.87% (p=0.000 n=10)
Yn 1.358µ ± 0% 1.345µ ± 0% -0.92% (p=0.000 n=10)
Float64bits 5.721n ± 0% 5.709n ± 0% -0.22% (p=0.044 n=10)
Float64frombits 4.905n ± 0% 4.893n ± 0% ~ (p=0.266 n=10)
Float32bits 12.27n ± 0% 12.23n ± 0% ~ (p=0.122 n=10)
Float32frombits 4.909n ± 0% 4.893n ± 0% -0.32% (p=0.024 n=10)
FMA 6.556n ± 0% 6.526n ± 0% ~ (p=0.283 n=10)
geomean 86.82n 83.75n -3.54%
Change-Id: I522297a79646d76543d516accce291f5a3cea337
Reviewed-on: https://go-review.googlesource.com/c/go/+/717560
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
||
|
|
c7ccbddf22 |
cmd/compile/internal/ssa: more aggressive on dead auto elim
Propagate "unread" across OpMoves. If the addr of this auto is only used
by an OpMove as its source arg, and the OpMove's target arg is the addr
of another auto. If the 2nd auto can be eliminated, this one can also be
eliminated.
This CL eliminates unnecessary memory copies and makes the frame smaller
in the following code snippet:
func contains(m map[string][16]int, k string) bool {
_, ok := m[k]
return ok
}
These are the benchmark results followed by the benchmark code:
goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
Map1Access2Ok-8 9.582n ± 2% 9.226n ± 0% -3.72% (p=0.000 n=20)
Map2Access2Ok-8 13.79n ± 1% 10.24n ± 1% -25.77% (p=0.000 n=20)
Map3Access2Ok-8 68.68n ± 1% 12.65n ± 1% -81.58% (p=0.000 n=20)
package main_test
import "testing"
var (
m1 = map[int]int{}
m2 = map[int][16]int{}
m3 = map[int][256]int{}
)
func init() {
for i := range 1000 {
m1[i] = i
m2[i] = [16]int{15:i}
m3[i] = [256]int{255:i}
}
}
func BenchmarkMap1Access2Ok(b *testing.B) {
for i := range b.N {
_, ok := m1[i%1000]
if !ok {
b.Errorf("%d not found", i)
}
}
}
func BenchmarkMap2Access2Ok(b *testing.B) {
for i := range b.N {
_, ok := m2[i%1000]
if !ok {
b.Errorf("%d not found", i)
}
}
}
func BenchmarkMap3Access2Ok(b *testing.B) {
for i := range b.N {
_, ok := m3[i%1000]
if !ok {
b.Errorf("%d not found", i)
}
}
}
Fixes #75398
Change-Id: If75e9caaa50d460efc31a94565b9ba28c8158771
Reviewed-on: https://go-review.googlesource.com/c/go/+/702875
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
||
|
|
6e165b4d17 |
cmd/compile: implement Avg64u, Hmul64, Hmul64u for wasm
This lets us remove useAvg and useHmul from the division rules.
The compiler is simpler and the generated code is faster.
goos: wasip1
goarch: wasm
pkg: internal/strconv
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
AppendFloat/Decimal 192.8n ± 1% 194.6n ± 0% +0.91% (p=0.000 n=10)
AppendFloat/Float 328.6n ± 0% 279.6n ± 0% -14.93% (p=0.000 n=10)
AppendFloat/Exp 335.6n ± 1% 289.2n ± 1% -13.80% (p=0.000 n=10)
AppendFloat/NegExp 336.0n ± 0% 289.1n ± 1% -13.97% (p=0.000 n=10)
AppendFloat/LongExp 332.4n ± 0% 285.2n ± 1% -14.20% (p=0.000 n=10)
AppendFloat/Big 348.2n ± 0% 300.1n ± 0% -13.83% (p=0.000 n=10)
AppendFloat/BinaryExp 137.4n ± 0% 138.2n ± 0% +0.55% (p=0.001 n=10)
AppendFloat/32Integer 193.3n ± 1% 196.5n ± 0% +1.66% (p=0.000 n=10)
AppendFloat/32ExactFraction 283.3n ± 0% 268.9n ± 1% -5.08% (p=0.000 n=10)
AppendFloat/32Point 279.9n ± 0% 266.5n ± 0% -4.80% (p=0.000 n=10)
AppendFloat/32Exp 300.1n ± 0% 288.3n ± 1% -3.90% (p=0.000 n=10)
AppendFloat/32NegExp 288.2n ± 1% 277.9n ± 1% -3.59% (p=0.000 n=10)
AppendFloat/32Shortest 261.7n ± 0% 250.2n ± 0% -4.39% (p=0.000 n=10)
AppendFloat/32Fixed8Hard 173.3n ± 1% 158.9n ± 1% -8.31% (p=0.000 n=10)
AppendFloat/32Fixed9Hard 180.0n ± 0% 167.9n ± 2% -6.70% (p=0.000 n=10)
AppendFloat/64Fixed1 167.1n ± 0% 149.6n ± 1% -10.50% (p=0.000 n=10)
AppendFloat/64Fixed2 162.4n ± 1% 146.5n ± 0% -9.73% (p=0.000 n=10)
AppendFloat/64Fixed2.5 165.5n ± 0% 149.4n ± 1% -9.70% (p=0.000 n=10)
AppendFloat/64Fixed3 166.4n ± 1% 150.2n ± 0% -9.74% (p=0.000 n=10)
AppendFloat/64Fixed4 163.7n ± 0% 149.6n ± 1% -8.62% (p=0.000 n=10)
AppendFloat/64Fixed5Hard 182.8n ± 1% 167.1n ± 1% -8.61% (p=0.000 n=10)
AppendFloat/64Fixed12 222.2n ± 0% 208.8n ± 0% -6.05% (p=0.000 n=10)
AppendFloat/64Fixed16 197.6n ± 1% 181.7n ± 0% -8.02% (p=0.000 n=10)
AppendFloat/64Fixed12Hard 194.5n ± 0% 181.0n ± 0% -6.99% (p=0.000 n=10)
AppendFloat/64Fixed17Hard 205.1n ± 1% 191.9n ± 0% -6.44% (p=0.000 n=10)
AppendFloat/64Fixed18Hard 6.269µ ± 0% 6.643µ ± 0% +5.97% (p=0.000 n=10)
AppendFloat/64FixedF1 211.7n ± 1% 197.0n ± 0% -6.95% (p=0.000 n=10)
AppendFloat/64FixedF2 189.4n ± 0% 174.2n ± 0% -8.08% (p=0.000 n=10)
AppendFloat/64FixedF3 169.0n ± 0% 154.9n ± 0% -8.32% (p=0.000 n=10)
AppendFloat/Slowpath64 321.2n ± 0% 274.2n ± 1% -14.63% (p=0.000 n=10)
AppendFloat/SlowpathDenormal64 307.4n ± 1% 261.2n ± 0% -15.03% (p=0.000 n=10)
AppendInt 3.367µ ± 1% 3.376µ ± 0% ~ (p=0.517 n=10)
AppendUint 675.5n ± 0% 676.9n ± 0% ~ (p=0.196 n=10)
AppendIntSmall 28.13n ± 1% 28.17n ± 0% +0.14% (p=0.015 n=10)
AppendUintVarlen/digits=1 20.70n ± 0% 20.51n ± 1% -0.89% (p=0.018 n=10)
AppendUintVarlen/digits=2 20.43n ± 0% 20.27n ± 0% -0.81% (p=0.001 n=10)
AppendUintVarlen/digits=3 38.48n ± 0% 37.93n ± 0% -1.43% (p=0.000 n=10)
AppendUintVarlen/digits=4 41.10n ± 0% 38.78n ± 1% -5.62% (p=0.000 n=10)
AppendUintVarlen/digits=5 42.25n ± 1% 42.11n ± 0% -0.32% (p=0.041 n=10)
AppendUintVarlen/digits=6 45.40n ± 1% 43.14n ± 0% -4.98% (p=0.000 n=10)
AppendUintVarlen/digits=7 46.81n ± 1% 46.03n ± 0% -1.66% (p=0.000 n=10)
AppendUintVarlen/digits=8 48.88n ± 1% 46.59n ± 1% -4.68% (p=0.000 n=10)
AppendUintVarlen/digits=9 49.94n ± 2% 49.41n ± 1% -1.06% (p=0.000 n=10)
AppendUintVarlen/digits=10 57.28n ± 1% 56.92n ± 1% -0.62% (p=0.045 n=10)
AppendUintVarlen/digits=11 60.09n ± 1% 58.11n ± 2% -3.30% (p=0.000 n=10)
AppendUintVarlen/digits=12 62.22n ± 0% 61.85n ± 0% -0.59% (p=0.000 n=10)
AppendUintVarlen/digits=13 64.94n ± 0% 62.92n ± 0% -3.10% (p=0.000 n=10)
AppendUintVarlen/digits=14 65.42n ± 1% 65.19n ± 1% -0.34% (p=0.005 n=10)
AppendUintVarlen/digits=15 68.17n ± 0% 66.13n ± 0% -2.99% (p=0.000 n=10)
AppendUintVarlen/digits=16 70.21n ± 1% 70.09n ± 1% ~ (p=0.517 n=10)
AppendUintVarlen/digits=17 72.93n ± 0% 70.49n ± 0% -3.34% (p=0.000 n=10)
AppendUintVarlen/digits=18 73.01n ± 0% 72.75n ± 0% -0.35% (p=0.000 n=10)
AppendUintVarlen/digits=19 79.27n ± 1% 79.49n ± 1% ~ (p=0.671 n=10)
AppendUintVarlen/digits=20 82.18n ± 0% 80.43n ± 1% -2.14% (p=0.000 n=10)
geomean 143.4n 136.0n -5.20%
Change-Id: I8245814a0259ad13cf9225f57db8e9fe3d2e4267
Reviewed-on: https://go-review.googlesource.com/c/go/+/717407
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
||
|
|
1e5bb416d8 |
cmd/compile: implement bits.Mul64 on 32-bit systems
This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u
on 32-bit systems, with the effect that constant division of both
int64s and uint64s can now be emitted directly in all cases,
and also that bits.Mul64 can be intrinsified on 32-bit systems.
Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were
implemented as uint32 divisions by c and some fixup. After expanding
those smaller constant divisions, the code for i/999 required:
(386) 7 mul, 10 add, 2 sub, 3 rotate, 3 shift (104 bytes)
(arm) 7 mul, 9 add, 3 sub, 2 shift (104 bytes)
(mips) 7 mul, 10 add, 5 sub, 6 shift, 3 sgtu (176 bytes)
For that much code, we might as well use a full 64x64->128 multiply
that can be used for all divisors, not just small ones.
Having done that, the same i/999 now generates:
(386) 4 mul, 9 add, 2 sub, 2 or, 6 shift (112 bytes)
(arm) 4 mul, 8 add, 2 sub, 2 or, 3 shift (92 bytes)
(mips) 4 mul, 11 add, 3 sub, 6 shift, 8 sgtu, 4 or (196 bytes)
The size increase on 386 is due to a few extra register spills.
The size increase on mips is due to add-with-carry being hard.
The new approach is more general, letting us delete the old special case
and guarantee that all int64 and uint64 divisions by constants are
generated directly on 32-bit systems.
This especially speeds up code making heavy use of bits.Mul64 with
a constant argument, which happens in strconv and various crypto
packages. A few examples are benchmarked below.
pkg: cmd/compile/internal/test
benchmark \ host local linux-amd64 s7 linux-386 s7:GOARCH=386
vs base vs base vs base vs base vs base
DivconstI64 ~ ~ ~ -49.66% -21.02%
ModconstI64 ~ ~ ~ -13.45% +14.52%
DivisiblePow2constI64 ~ ~ ~ +0.97% -1.32%
DivisibleconstI64 ~ ~ ~ -20.01% -48.28%
DivisibleWDivconstI64 ~ ~ -1.76% -38.59% -42.74%
DivconstU64/3 ~ ~ ~ -13.82% -4.09%
DivconstU64/5 ~ ~ ~ -14.10% -3.54%
DivconstU64/37 -2.07% -4.45% ~ -19.60% -9.55%
DivconstU64/1234567 ~ ~ ~ -61.55% -56.93%
ModconstU64 ~ ~ ~ -6.25% ~
DivisibleconstU64 ~ ~ ~ -2.78% -7.82%
DivisibleWDivconstU64 ~ ~ ~ +4.23% +2.56%
pkg: math/bits
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
Add ~ ~ ~ ~
Add32 +1.59% ~ ~ ~
Add64 ~ ~ ~ ~
Add64multiple ~ ~ ~ ~
Sub ~ ~ ~ ~
Sub32 ~ ~ ~ ~
Sub64 ~ ~ -9.20% ~
Sub64multiple ~ ~ ~ ~
Mul ~ ~ ~ ~
Mul32 ~ ~ ~ ~
Mul64 ~ ~ -41.58% -53.21%
Div ~ ~ ~ ~
Div32 ~ ~ ~ ~
Div64 ~ ~ ~ ~
pkg: strconv
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
ParseInt/Pos/7bit ~ ~ -11.08% -6.75%
ParseInt/Pos/26bit ~ ~ -13.65% -11.02%
ParseInt/Pos/31bit ~ ~ -14.65% -9.71%
ParseInt/Pos/56bit -1.80% ~ -17.97% -10.78%
ParseInt/Pos/63bit ~ ~ -13.85% -9.63%
ParseInt/Neg/7bit ~ ~ -12.14% -7.26%
ParseInt/Neg/26bit ~ ~ -14.18% -9.81%
ParseInt/Neg/31bit ~ ~ -14.51% -9.02%
ParseInt/Neg/56bit ~ ~ -15.79% -9.79%
ParseInt/Neg/63bit ~ ~ -15.68% -11.07%
AppendFloat/Decimal ~ ~ -7.25% -12.26%
AppendFloat/Float ~ ~ -15.96% -19.45%
AppendFloat/Exp ~ ~ -13.96% -17.76%
AppendFloat/NegExp ~ ~ -14.89% -20.27%
AppendFloat/LongExp ~ ~ -12.68% -17.97%
AppendFloat/Big ~ ~ -11.10% -16.64%
AppendFloat/BinaryExp ~ ~ ~ ~
AppendFloat/32Integer ~ ~ -10.05% -10.91%
AppendFloat/32ExactFraction ~ ~ -8.93% -13.00%
AppendFloat/32Point ~ ~ -10.36% -14.89%
AppendFloat/32Exp ~ ~ -9.88% -13.54%
AppendFloat/32NegExp ~ ~ -10.16% -14.26%
AppendFloat/32Shortest ~ ~ -11.39% -14.96%
AppendFloat/32Fixed8Hard ~ ~ ~ -2.31%
AppendFloat/32Fixed9Hard ~ ~ ~ -7.01%
AppendFloat/64Fixed1 ~ ~ -2.83% -8.23%
AppendFloat/64Fixed2 ~ ~ ~ -7.94%
AppendFloat/64Fixed3 ~ ~ -4.07% -7.22%
AppendFloat/64Fixed4 ~ ~ -7.24% -7.62%
AppendFloat/64Fixed12 ~ ~ -6.57% -4.82%
AppendFloat/64Fixed16 ~ ~ -4.00% -5.81%
AppendFloat/64Fixed12Hard -2.22% ~ -4.07% -6.35%
AppendFloat/64Fixed17Hard -2.12% ~ ~ -3.79%
AppendFloat/64Fixed18Hard -1.89% ~ +2.48% ~
AppendFloat/Slowpath64 -1.85% ~ -14.49% -18.21%
AppendFloat/SlowpathDenormal64 ~ ~ -13.08% -19.41%
pkg: crypto/internal/fips140/nistec/fiat
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
Mul/P224 ~ ~ -29.95% -39.60%
Mul/P384 ~ ~ -37.11% -63.33%
Mul/P521 ~ ~ -26.62% -12.42%
Square/P224 +1.46% ~ -40.62% -49.18%
Square/P384 ~ ~ -45.51% -69.68%
Square/P521 +90.37% ~ -25.26% -11.23%
(The +90% is a separate problem and not real; that much variation
can be seen on that system by running the same binary from two
different files.)
pkg: crypto/internal/fips140/edwards25519
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
EncodingDecoding ~ ~ -34.67% -35.75%
ScalarBaseMult ~ ~ -31.25% -30.29%
ScalarMult ~ ~ -33.45% -32.54%
VarTimeDoubleScalarBaseMult ~ ~ -33.78% -33.68%
Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65
Reviewed-on: https://go-review.googlesource.com/c/go/+/716061
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
||
|
|
9bbda7c99d |
cmd/compile: make prove understand div, mod better
This CL introduces new divisible and divmod passes that rewrite divisibility checks and div, mod, and mul. These happen after prove, so that prove can make better sense of the code for deriving bounds, and they must run before decompose, so that 64-bit ops can be lowered to 32-bit ops on 32-bit systems. And then they need another generic pass as well, to optimize the generated code before decomposing. The three opt passes are "opt", "middle opt", and "late opt". (Perhaps instead they should be "generic", "opt", and "late opt"?) The "late opt" pass repeats the "middle opt" work on any new code that has been generated in the interim. There will not be new divs or mods, but there may be new muls. The x%c==0 rewrite rules are much simpler now, since they can match before divs have been rewritten. This has the effect of applying them more consistently and making the rewrite rules independent of the exact div rewrites. Prove is also now charged with marking signed div/mod as unsigned when the arguments call for it, allowing simpler code to be emitted in various cases. For example, t.Seconds()/2 and len(x)/2 are now recognized as unsigned, meaning they compile to a simple shift (unsigned division), avoiding the more complex fixup we need for signed values. https://gist.github.com/rsc/99d9d3bd99cde87b6a1a390e3d85aa32 shows a diff of 'go build -a -gcflags=-d=ssa/prove/debug=1 std' output before and after. "Proved Rsh64x64 shifts to zero" is replaced by the higher-level "Proved Div64 is unsigned" (the shift was in the signed expansion of div by constant), but otherwise prove is only finding more things to prove. One short example, in code that does x[i%len(x)]: < runtime/mfinal.go:131:34: Proved Rsh64x64 shifts to zero --- > runtime/mfinal.go:131:34: Proved Div64 is unsigned > runtime/mfinal.go:131:38: Proved IsInBounds A longer example: < crypto/internal/fips140/sha3/shake.go:28:30: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:38:27: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:53:46: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:55:46: Proved Rsh64x64 shifts to zero --- > crypto/internal/fips140/sha3/shake.go:28:30: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:28:30: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:28:30: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:38:27: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:45:7: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:46:4: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:53:46: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:53:46: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:53:46: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:55:46: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:55:46: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:55:46: Proved IsSliceInBounds These diffs are due to the smaller opt being better and taking work away from prove: < image/jpeg/dct.go:307:5: Proved IsInBounds < image/jpeg/dct.go:308:5: Proved IsInBounds ... < image/jpeg/dct.go:442:5: Proved IsInBounds In the old opt, Mul by 8 was rewritten to Lsh by 3 early. This CL delays that rule to help prove recognize mods, but it also helps opt constant-fold the slice x[8*i:8*i+8:8*i+8]. Specifically, computing the length, opt can now do: (Sub64 (Add (Mul 8 i) 8) (Add (Mul 8 i) 8)) -> (Add 8 (Sub (Mul 8 i) (Mul 8 i))) -> (Add 8 (Mul 8 (Sub i i))) -> (Add 8 (Mul 8 0)) -> (Add 8 0) -> 8 The key step is (Sub (Mul x y) (Mul x z)) -> (Mul x (Sub y z)), Leaving the multiply as Mul enables using that step; the old rewrite to Lsh blocked it, leaving prove to figure out the length and then remove the bounds checks. But now opt can evaluate the length down to a constant 8 and then constant-fold away the bounds checks 0 < 8, 1 < 8, and so on. After that, the compiler has nothing left to prove. Benchmarks are noisy in general; I checked the assembly for the many large increases below, and the vast majority are unchanged and presumably hitting the caches differently in some way. The divisibility optimizations were not reliably triggering before. This leads to a very large improvement in some cases, like DivisiblePow2constI64, DivisibleconstI64 on 64-bit systems and DivisbleconstU64 on 32-bit systems. Another way the divisibility optimizations were unreliable before was incorrectly triggering for x/3, x%3 even though they are written not to do that. There is a real but small slowdown in the DivisibleWDivconst benchmarks on Mac because in the cases used in the benchmark, it is still faster (on Mac) to do the divisibility check than to remultiply. This may be worth further study. Perhaps when there is no rotate (meaning the divisor is odd), the divisibility optimization should be enabled always. In any event, this CL makes it possible to study that. benchmark \ host s7 linux-amd64 mac linux-arm64 linux-ppc64le linux-386 s7:GOARCH=386 linux-arm vs base vs base vs base vs base vs base vs base vs base vs base LoadAdd ~ ~ ~ ~ ~ -1.59% ~ ~ ExtShift ~ ~ -42.14% +0.10% ~ +1.44% +5.66% +8.50% Modify ~ ~ ~ ~ ~ ~ ~ -1.53% MullImm ~ ~ ~ ~ ~ +37.90% -21.87% +3.05% ConstModify ~ ~ ~ ~ -49.14% ~ ~ ~ BitSet ~ ~ ~ ~ -15.86% -14.57% +6.44% +0.06% BitClear ~ ~ ~ ~ ~ +1.78% +3.50% +0.06% BitToggle ~ ~ ~ ~ ~ -16.09% +2.91% ~ BitSetConst ~ ~ ~ ~ ~ ~ ~ -0.49% BitClearConst ~ ~ ~ ~ -28.29% ~ ~ -0.40% BitToggleConst ~ ~ ~ +8.89% -31.19% ~ ~ -0.77% MulNeg ~ ~ ~ ~ ~ ~ ~ ~ Mul2Neg ~ ~ -4.83% ~ ~ -13.75% -5.92% ~ DivconstI64 ~ ~ ~ ~ ~ -30.12% ~ +0.50% ModconstI64 ~ ~ -9.94% -4.63% ~ +3.15% ~ +5.32% DivisiblePow2constI64 -34.49% -12.58% ~ ~ -12.25% ~ ~ ~ DivisibleconstI64 -24.69% -25.06% -0.40% -2.27% -42.61% -3.31% ~ +1.63% DivisibleWDivconstI64 ~ ~ ~ ~ ~ -17.55% ~ -0.60% DivconstU64/3 ~ ~ ~ ~ ~ +1.51% ~ ~ DivconstU64/5 ~ ~ ~ ~ ~ ~ ~ ~ DivconstU64/37 ~ ~ -0.18% ~ ~ +2.70% ~ ~ DivconstU64/1234567 ~ ~ ~ ~ ~ ~ ~ +0.12% ModconstU64 ~ ~ ~ -0.24% ~ -5.10% -1.07% -1.56% DivisibleconstU64 ~ ~ ~ ~ ~ -29.01% -59.13% -50.72% DivisibleWDivconstU64 ~ ~ -12.18% -18.88% ~ -5.50% -3.91% +5.17% DivconstI32 ~ ~ -0.48% ~ -34.69% +89.01% -6.01% -16.67% ModconstI32 ~ +2.95% -0.33% ~ ~ -2.98% -5.40% -8.30% DivisiblePow2constI32 ~ ~ ~ ~ ~ ~ ~ -16.22% DivisibleconstI32 ~ ~ ~ ~ ~ -37.27% -47.75% -25.03% DivisibleWDivconstI32 -11.59% +5.22% -12.99% -23.83% ~ +45.95% -7.03% -10.01% DivconstU32 ~ ~ ~ ~ ~ +74.71% +4.81% ~ ModconstU32 ~ ~ +0.53% +0.18% ~ +51.16% ~ ~ DivisibleconstU32 ~ ~ ~ -0.62% ~ -4.25% ~ ~ DivisibleWDivconstU32 -2.77% +5.56% +11.12% -5.15% ~ +48.70% +25.11% -4.07% DivconstI16 -6.06% ~ -0.33% +0.22% ~ ~ -9.68% +5.47% ModconstI16 ~ ~ +4.44% +2.82% ~ ~ ~ +5.06% DivisiblePow2constI16 ~ ~ ~ ~ ~ ~ ~ -0.17% DivisibleconstI16 ~ ~ -0.23% ~ ~ ~ +4.60% +6.64% DivisibleWDivconstI16 -1.44% -0.43% +13.48% -5.76% ~ +1.62% -23.15% -9.06% DivconstU16 +1.61% ~ -0.35% -0.47% ~ ~ +15.59% ~ ModconstU16 ~ ~ ~ ~ ~ -0.72% ~ +14.23% DivisibleconstU16 ~ ~ -0.05% +3.00% ~ ~ ~ +5.06% DivisibleWDivconstU16 +52.10% +0.75% +17.28% +4.79% ~ -37.39% +5.28% -9.06% DivconstI8 ~ ~ -0.34% -0.96% ~ ~ -9.20% ~ ModconstI8 +2.29% ~ +4.38% +2.96% ~ ~ ~ ~ DivisiblePow2constI8 ~ ~ ~ ~ ~ ~ ~ ~ DivisibleconstI8 ~ ~ ~ ~ ~ ~ +6.04% ~ DivisibleWDivconstI8 -26.44% +1.69% +17.03% +4.05% ~ +32.48% -24.90% ~ DivconstU8 -4.50% +14.06% -0.28% ~ ~ ~ +4.16% +0.88% ModconstU8 ~ ~ +25.84% -0.64% ~ ~ ~ ~ DivisibleconstU8 ~ ~ -5.70% ~ ~ ~ ~ ~ DivisibleWDivconstU8 +49.55% +9.07% ~ +4.03% +53.87% -40.03% +39.72% -3.01% Mul2 ~ ~ ~ ~ ~ ~ ~ ~ MulNeg2 ~ ~ ~ ~ -11.73% ~ ~ -0.02% EfaceInteger ~ ~ ~ ~ ~ +18.11% ~ +2.53% TypeAssert +33.90% +2.86% ~ ~ ~ -1.07% -5.29% -1.04% Div64UnsignedSmall ~ ~ ~ ~ ~ ~ ~ ~ Div64Small ~ ~ ~ ~ ~ -0.88% ~ +2.39% Div64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +0.35% Div64SmallNegDividend ~ ~ ~ ~ ~ -0.84% ~ +3.57% Div64SmallNegBoth ~ ~ ~ ~ ~ -0.86% ~ +3.55% Div64Unsigned ~ ~ ~ ~ ~ ~ ~ -0.11% Div64 ~ ~ ~ ~ ~ ~ ~ +0.11% Div64NegDivisor ~ ~ ~ ~ ~ -1.29% ~ ~ Div64NegDividend ~ ~ ~ ~ ~ -1.44% ~ ~ Div64NegBoth ~ ~ ~ ~ ~ ~ ~ +0.28% Mod64UnsignedSmall ~ ~ ~ ~ ~ +0.48% ~ +0.93% Mod64Small ~ ~ ~ ~ ~ ~ ~ ~ Mod64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +1.44% Mod64SmallNegDividend ~ ~ ~ ~ ~ +0.22% ~ +1.37% Mod64SmallNegBoth ~ ~ ~ ~ ~ ~ ~ -2.22% Mod64Unsigned ~ ~ ~ ~ ~ -0.95% ~ +0.11% Mod64 ~ ~ ~ ~ ~ ~ ~ ~ Mod64NegDivisor ~ ~ ~ ~ ~ ~ ~ -0.02% Mod64NegDividend ~ ~ ~ ~ ~ ~ ~ ~ Mod64NegBoth ~ ~ ~ ~ ~ ~ ~ -0.02% MulconstI32/3 ~ ~ ~ -25.00% ~ ~ ~ +47.37% MulconstI32/5 ~ ~ ~ +33.28% ~ ~ ~ +32.21% MulconstI32/12 ~ ~ ~ -2.13% ~ ~ ~ -0.02% MulconstI32/120 ~ ~ ~ +2.93% ~ ~ ~ -0.03% MulconstI32/-120 ~ ~ ~ -2.17% ~ ~ ~ -0.03% MulconstI32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03% MulconstI32/65538 ~ ~ ~ ~ ~ -33.38% ~ +0.04% MulconstI64/3 ~ ~ ~ +33.35% ~ -0.37% ~ -0.13% MulconstI64/5 ~ ~ ~ -25.00% ~ -0.34% ~ ~ MulconstI64/12 ~ ~ ~ +2.13% ~ +11.62% ~ +2.30% MulconstI64/120 ~ ~ ~ -1.98% ~ ~ ~ ~ MulconstI64/-120 ~ ~ ~ +0.75% ~ ~ ~ ~ MulconstI64/65537 ~ ~ ~ ~ ~ +5.61% ~ ~ MulconstI64/65538 ~ ~ ~ ~ ~ +5.25% ~ ~ MulconstU32/3 ~ +0.81% ~ +33.39% ~ +77.92% ~ -32.31% MulconstU32/5 ~ ~ ~ -24.97% ~ +77.92% ~ -24.47% MulconstU32/12 ~ ~ ~ +2.06% ~ ~ ~ +0.03% MulconstU32/120 ~ ~ ~ -2.74% ~ ~ ~ +0.03% MulconstU32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03% MulconstU32/65538 ~ ~ ~ ~ ~ -33.42% ~ -0.03% MulconstU64/3 ~ ~ ~ +33.33% ~ -0.28% ~ +1.22% MulconstU64/5 ~ ~ ~ -25.00% ~ ~ ~ -0.64% MulconstU64/12 ~ ~ ~ +2.30% ~ +11.59% ~ +0.14% MulconstU64/120 ~ ~ ~ -2.82% ~ ~ ~ +0.04% MulconstU64/65537 ~ +0.37% ~ ~ ~ +5.58% ~ ~ MulconstU64/65538 ~ ~ ~ ~ ~ +5.16% ~ ~ ShiftArithmeticRight ~ ~ ~ ~ ~ -10.81% ~ +0.31% Switch8Predictable +14.69% ~ ~ ~ ~ -24.85% ~ ~ Switch8Unpredictable ~ -0.58% -3.80% ~ ~ -11.78% ~ -0.79% Switch32Predictable -10.33% +17.89% ~ ~ ~ +5.76% ~ ~ Switch32Unpredictable -3.15% +1.19% +9.42% ~ ~ -10.30% -5.09% +0.44% SwitchStringPredictable +70.88% +20.48% ~ ~ ~ +2.39% ~ +0.31% SwitchStringUnpredictable ~ +3.91% -5.06% -0.98% ~ +0.61% +2.03% ~ SwitchTypePredictable +146.58% -1.10% ~ -12.45% ~ -0.46% -3.81% ~ SwitchTypeUnpredictable +0.46% -0.83% ~ +4.18% ~ +0.43% ~ +0.62% SwitchInterfaceTypePredictable -13.41% -10.13% +11.03% ~ ~ -4.38% ~ +0.75% SwitchInterfaceTypeUnpredictable -6.37% -2.14% ~ -3.21% ~ -4.20% ~ +1.08% Fixes #63110. Fixes #75954. Change-Id: I55a876f08c6c14f419ce1a8cbba2eaae6c6efbf0 Reviewed-on: https://go-review.googlesource.com/c/go/+/714160 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
915c1839fe |
test/codegen: simplify asmcheck pattern matching
Separate patterns in asmcheck by spaces instead of commas. Many patterns end in comma (like "MOV [$]123,") so separating patterns by comma is not great; they're already quoted, so spaces are fine. Also replace all tabs in the assembly lines with spaces before matching. Finally, replace \$ or \\$ with [$] as the matching idiom. The effect of all these is to make the patterns look like: // amd64:"BSFQ" "ORQ [$]256" instead of the old: // amd64:"BSFQ","ORQ\t\\$256" Update all tests as well. Change-Id: Ia39febe5d7f67ba115846422789e11b185d5c807 Reviewed-on: https://go-review.googlesource.com/c/go/+/716060 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com> |
||
|
|
73d7635fae |
cmd/compile: add generic rules to remove bool → int → bool roundtrips
Change-Id: I8b0a3b64c89fe167d304f901a5d38470f35400ab Reviewed-on: https://go-review.googlesource.com/c/go/+/715200 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
|
d7a52f9369 |
cmd/compile: use MOV(D|F) with const for Const(64|32)F on riscv64
The original Const64F using: AUIPC + LD + FMVDX to load float64 const, we can use AUIPC + FLD instead, same as Const32F. Change-Id: I8ca0a0e90d820a26e69b74cd25df3cc662132bf7 Reviewed-on: https://go-review.googlesource.com/c/go/+/703215 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> |
||
|
|
7056c71d32 |
cmd/compile: disable use of new saturating float-to-int conversions
The new conversions can be activated (or bisected) with -gcflags=all=-d=converthash=PATTERN where PATTERN is either a hash string or n, qn, y, qy for no, quietly no, yes, quietly yes. This CL makes the default pattern be "qn" instead of the default-default which is an efficient encoding of "qy". Updates #75834 Change-Id: I88a9fd7880bc999132420c8d0a22a8fdc1e95a2a Reviewed-on: https://go-review.googlesource.com/c/go/+/711845 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Bypass: David Chase <drchase@google.com> |
||
|
|
9b8742f2e7 |
cmd/compile: don't depend on arch-dependent conversions in the compiler
Leave those constant foldings for runtime, similar to how we do it for NaN generation. These are the only instances I could find in cmd/compile/..., using objdump -d ../pkg/tool/darwin_arm64/compile| egrep "(fcvtz|>:)" | grep -B1 fcvt (There are instances in other places, like runtime and reflect, but I don't think those places would affect compiler output.) Change-Id: I4113fe4570115e4765825cf442cb1fde97cf2f27 Reviewed-on: https://go-review.googlesource.com/c/go/+/711281 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> |
||
|
|
19a30ea3f2 |
cmd/compile: call generated size-specialized malloc functions directly
This change creates calls to size-specialized malloc functions instead of calls to newObject when we know the size of the allocation at compilation time. Most of it is a matter of calling the newObject function (which will create calls to the size-specialized functions) rather then the newObjectNonSpecialized function (which won't). In the newHeapaddr, small, non-pointer case, we'll create a non specialized newObject and transform that into the appropriate size-specialized function when we produce the mallocgc in flushPendingHeapAllocations. We have to update some of the rewrites in generic.rules to also apply to the size-specialized functions when they apply to newObject. The messiest thing is we have to adjust the offset we use to save the memory profiler stack, because the depth of the call to profilealloc is two frames fewer in the size-specialized malloc functions compared to when newObject calls mallocgc. A bunch of tests have been adjusted to account for that. Change-Id: I6a6a6964c9037fb6719e392c4a498ed700b617d7 Reviewed-on: https://go-review.googlesource.com/c/go/+/707856 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Matloob <matloob@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
|
97fd6bdecc |
cmd/compile: fuse NaN checks with other comparisons
NaN checks can often be merged into other comparisons by inverting them.
For example, `math.IsNaN(x) || x > 0` is equivalent to `!(x <= 0)`.
goos: linux
goarch: amd64
pkg: math
cpu: 12th Gen Intel(R) Core(TM) i7-12700T
│ sec/op │ sec/op vs base │
Acos 4.315n ± 0% 4.314n ± 0% ~ (p=0.642 n=10)
Acosh 8.398n ± 0% 7.779n ± 0% -7.37% (p=0.000 n=10)
Asin 4.203n ± 0% 4.211n ± 0% +0.20% (p=0.001 n=10)
Asinh 10.150n ± 0% 9.562n ± 0% -5.79% (p=0.000 n=10)
Atan 2.363n ± 0% 2.363n ± 0% ~ (p=0.801 n=10)
Atanh 8.192n ± 2% 7.685n ± 0% -6.20% (p=0.000 n=10)
Atan2 4.013n ± 0% 4.010n ± 0% ~ (p=0.073 n=10)
Cbrt 4.858n ± 0% 4.755n ± 0% -2.12% (p=0.000 n=10)
Cos 4.596n ± 0% 4.357n ± 0% -5.20% (p=0.000 n=10)
Cosh 5.071n ± 0% 5.071n ± 0% ~ (p=0.585 n=10)
Erf 2.802n ± 1% 2.788n ± 0% -0.54% (p=0.002 n=10)
Erfc 3.087n ± 1% 3.071n ± 0% ~ (p=0.320 n=10)
Erfinv 3.981n ± 0% 3.965n ± 0% -0.41% (p=0.000 n=10)
Erfcinv 3.985n ± 0% 3.977n ± 0% -0.20% (p=0.000 n=10)
ExpGo 8.721n ± 2% 8.252n ± 0% -5.38% (p=0.000 n=10)
Expm1 4.378n ± 0% 4.228n ± 0% -3.43% (p=0.000 n=10)
Exp2 8.313n ± 0% 7.855n ± 0% -5.52% (p=0.000 n=10)
Exp2Go 8.498n ± 2% 7.921n ± 0% -6.79% (p=0.000 n=10)
Mod 15.16n ± 4% 12.20n ± 1% -19.58% (p=0.000 n=10)
Frexp 1.780n ± 2% 1.496n ± 0% -15.96% (p=0.000 n=10)
Gamma 4.378n ± 1% 4.013n ± 0% -8.35% (p=0.000 n=10)
HypotGo 2.655n ± 5% 2.427n ± 1% -8.57% (p=0.000 n=10)
Ilogb 1.912n ± 5% 1.749n ± 0% -8.53% (p=0.000 n=10)
J0 22.43n ± 9% 20.46n ± 0% -8.76% (p=0.000 n=10)
J1 21.03n ± 4% 19.96n ± 0% -5.09% (p=0.000 n=10)
Jn 45.40n ± 1% 42.59n ± 0% -6.20% (p=0.000 n=10)
Ldexp 2.312n ± 1% 1.944n ± 0% -15.94% (p=0.000 n=10)
Lgamma 4.617n ± 1% 4.584n ± 0% -0.73% (p=0.000 n=10)
Log 4.226n ± 0% 4.213n ± 0% -0.31% (p=0.001 n=10)
Logb 1.771n ± 0% 1.775n ± 0% ~ (p=0.097 n=10)
Log1p 5.102n ± 2% 5.001n ± 0% -1.97% (p=0.000 n=10)
Log10 4.407n ± 0% 4.408n ± 0% ~ (p=1.000 n=10)
Log2 2.416n ± 1% 2.138n ± 0% -11.51% (p=0.000 n=10)
Modf 1.669n ± 2% 1.611n ± 0% -3.50% (p=0.000 n=10)
Nextafter32 2.186n ± 0% 2.185n ± 0% ~ (p=0.051 n=10)
Nextafter64 2.182n ± 0% 2.184n ± 0% +0.09% (p=0.016 n=10)
PowInt 11.39n ± 6% 10.68n ± 2% -6.24% (p=0.000 n=10)
PowFrac 26.60n ± 2% 26.12n ± 0% -1.80% (p=0.000 n=10)
Pow10Pos 0.5067n ± 4% 0.5003n ± 1% -1.27% (p=0.001 n=10)
Pow10Neg 0.8552n ± 0% 0.8552n ± 0% ~ (p=0.928 n=10)
Round 1.181n ± 0% 1.182n ± 0% +0.08% (p=0.001 n=10)
RoundToEven 1.709n ± 0% 1.710n ± 0% ~ (p=0.053 n=10)
Remainder 12.54n ± 5% 11.99n ± 2% -4.46% (p=0.000 n=10)
Sin 3.933n ± 5% 3.926n ± 0% -0.17% (p=0.000 n=10)
Sincos 5.672n ± 0% 5.522n ± 0% -2.65% (p=0.000 n=10)
Sinh 5.447n ± 1% 5.444n ± 0% -0.06% (p=0.029 n=10)
Tan 4.061n ± 0% 4.058n ± 0% -0.07% (p=0.005 n=10)
Tanh 5.599n ± 0% 5.595n ± 0% -0.06% (p=0.042 n=10)
Y0 20.75n ± 5% 19.73n ± 1% -4.92% (p=0.000 n=10)
Y1 20.87n ± 2% 19.78n ± 1% -5.20% (p=0.000 n=10)
Yn 44.50n ± 2% 42.04n ± 2% -5.53% (p=0.000 n=10)
geomean 4.989n 4.791n -3.96%
goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
│ sec/op │ sec/op vs base │
Acos 159.9n ± 0% 159.9n ± 0% ~ (p=0.269 n=10)
Acosh 244.7n ± 0% 235.0n ± 0% -3.98% (p=0.000 n=10)
Asin 159.9n ± 0% 159.9n ± 0% ~ (p=0.154 n=10)
Asinh 270.8n ± 0% 261.1n ± 0% -3.60% (p=0.000 n=10)
Atan 119.1n ± 0% 119.1n ± 0% ~ (p=0.347 n=10)
Atanh 260.2n ± 0% 261.8n ± 4% ~ (p=0.459 n=10)
Atan2 186.8n ± 0% 186.8n ± 0% ~ (p=0.487 n=10)
Cbrt 203.5n ± 0% 198.2n ± 0% -2.60% (p=0.000 n=10)
Ceil 31.82n ± 0% 31.81n ± 0% ~ (p=0.714 n=10)
Copysign 4.894n ± 0% 4.893n ± 0% ~ (p=0.161 n=10)
Cos 107.6n ± 0% 103.6n ± 0% -3.76% (p=0.000 n=10)
Cosh 259.0n ± 0% 252.8n ± 0% -2.39% (p=0.000 n=10)
Erf 133.7n ± 0% 133.7n ± 0% ~ (p=0.720 n=10)
Erfc 137.9n ± 0% 137.8n ± 0% -0.04% (p=0.033 n=10)
Erfinv 173.7n ± 0% 168.8n ± 0% -2.82% (p=0.000 n=10)
Erfcinv 173.7n ± 0% 168.8n ± 0% -2.82% (p=0.000 n=10)
Exp 215.3n ± 0% 208.1n ± 0% -3.34% (p=0.000 n=10)
ExpGo 226.7n ± 0% 220.6n ± 0% -2.69% (p=0.000 n=10)
Expm1 164.8n ± 0% 159.0n ± 0% -3.52% (p=0.000 n=10)
Exp2 185.0n ± 0% 182.7n ± 0% -1.22% (p=0.000 n=10)
Exp2Go 198.9n ± 0% 196.5n ± 0% -1.21% (p=0.000 n=10)
Abs 4.894n ± 0% 4.893n ± 0% ~ (p=0.262 n=10)
Dim 16.31n ± 0% 16.31n ± 0% ~ (p=1.000 n=10)
Floor 31.81n ± 0% 31.81n ± 0% ~ (p=0.067 n=10)
Max 26.11n ± 0% 26.10n ± 0% ~ (p=0.080 n=10)
Min 26.10n ± 0% 26.10n ± 0% ~ (p=0.095 n=10)
Mod 337.7n ± 0% 291.9n ± 0% -13.56% (p=0.000 n=10)
Frexp 50.57n ± 0% 42.41n ± 0% -16.13% (p=0.000 n=10)
Gamma 206.3n ± 0% 198.1n ± 0% -4.00% (p=0.000 n=10)
Hypot 94.62n ± 0% 94.61n ± 0% ~ (p=0.437 n=10)
HypotGo 109.3n ± 0% 109.3n ± 0% ~ (p=1.000 n=10)
Ilogb 44.05n ± 0% 44.04n ± 0% -0.02% (p=0.025 n=10)
J0 663.1n ± 0% 663.9n ± 0% +0.13% (p=0.002 n=10)
J1 663.9n ± 0% 666.4n ± 0% +0.38% (p=0.000 n=10)
Jn 1.404µ ± 0% 1.407µ ± 0% +0.21% (p=0.000 n=10)
Ldexp 57.10n ± 0% 48.93n ± 0% -14.30% (p=0.000 n=10)
Lgamma 185.1n ± 0% 187.6n ± 0% +1.32% (p=0.000 n=10)
Log 182.7n ± 0% 170.1n ± 0% -6.87% (p=0.000 n=10)
Logb 46.49n ± 0% 46.49n ± 0% ~ (p=0.675 n=10)
Log1p 184.3n ± 0% 179.4n ± 0% -2.63% (p=0.000 n=10)
Log10 184.3n ± 0% 171.2n ± 0% -7.08% (p=0.000 n=10)
Log2 66.05n ± 0% 57.90n ± 0% -12.34% (p=0.000 n=10)
Modf 34.25n ± 0% 34.24n ± 0% ~ (p=0.163 n=10)
Nextafter32 49.33n ± 1% 48.93n ± 0% -0.81% (p=0.002 n=10)
Nextafter64 43.64n ± 0% 43.23n ± 0% -0.93% (p=0.000 n=10)
PowInt 267.6n ± 0% 251.2n ± 0% -6.11% (p=0.000 n=10)
PowFrac 672.9n ± 0% 637.9n ± 0% -5.19% (p=0.000 n=10)
Pow10Pos 13.87n ± 0% 13.87n ± 0% ~ (p=1.000 n=10)
Pow10Neg 19.58n ± 62% 19.59n ± 62% ~ (p=0.355 n=10)
Round 23.65n ± 0% 23.65n ± 0% ~ (p=1.000 n=10)
RoundToEven 27.73n ± 0% 27.73n ± 0% ~ (p=0.635 n=10)
Remainder 309.9n ± 0% 280.5n ± 0% -9.49% (p=0.000 n=10)
Signbit 13.05n ± 0% 13.05n ± 0% ~ (p=1.000 n=10) ¹
Sin 120.7n ± 0% 120.7n ± 0% ~ (p=1.000 n=10) ¹
Sincos 148.4n ± 0% 143.5n ± 0% -3.30% (p=0.000 n=10)
Sinh 275.6n ± 0% 267.5n ± 0% -2.94% (p=0.000 n=10)
SqrtIndirect 3.262n ± 0% 3.262n ± 0% ~ (p=0.263 n=10)
SqrtLatency 19.57n ± 0% 19.57n ± 0% ~ (p=0.582 n=10)
SqrtIndirectLatency 19.57n ± 0% 19.57n ± 0% ~ (p=1.000 n=10)
SqrtGoLatency 203.2n ± 0% 197.6n ± 0% -2.78% (p=0.000 n=10)
SqrtPrime 4.952µ ± 0% 4.952µ ± 0% -0.01% (p=0.025 n=10)
Tan 153.3n ± 0% 153.3n ± 0% ~ (p=1.000 n=10)
Tanh 280.5n ± 0% 272.4n ± 0% -2.91% (p=0.000 n=10)
Trunc 31.81n ± 0% 31.81n ± 0% ~ (p=1.000 n=10)
Y0 680.1n ± 0% 664.8n ± 0% -2.25% (p=0.000 n=10)
Y1 684.2n ± 0% 669.6n ± 0% -2.14% (p=0.000 n=10)
Yn 1.444µ ± 0% 1.410µ ± 0% -2.35% (p=0.000 n=10)
Float64bits 5.709n ± 0% 5.708n ± 0% ~ (p=0.573 n=10)
Float64frombits 4.893n ± 0% 4.893n ± 0% ~ (p=0.734 n=10)
Float32bits 12.23n ± 0% 12.23n ± 0% ~ (p=0.628 n=10)
Float32frombits 4.893n ± 0% 4.893n ± 0% ~ (p=0.971 n=10)
FMA 4.893n ± 0% 4.893n ± 0% ~ (p=0.736 n=10)
geomean 88.96n 87.05n -2.15%
¹ all samples are equal
Change-Id: I8db8ac7b7b3430b946b89e88dd6c1546804125c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/697360
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Munday <mikemndy@gmail.com>
|
||
|
|
1d62e92567 |
test/codegen: make sure assignment results are used.
Some tests make assignments to an argument without reading it. With CL 708865, they are treated as dead stores and are removed. Make sure the results are used. Fixes #75745. Fixes #75746. Change-Id: I05580beb1006505ec1550e5fa245b54dcefd10b9 Reviewed-on: https://go-review.googlesource.com/c/go/+/708916 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
|
38b26f29f1 |
cmd/compile: remove stores to unread parameters
Currently, we remove stores to local variables that are not read. We don't do that for arguments. But arguments and locals are essentially the same. Arguments are passed by value, and are not expected to be read in the caller's frame. So we can remove the writes to them as well. One exception is the cgo_unsafe_arg directive, which makes all the arguments effectively address-taken. cgo_unsafe_arg implies ABI0, so we just skip ABI0 functions' arguments. Cherry-picked from the dev.simd branch. This CL is not necessarily SIMD specific. Apply early to reduce risk. Change-Id: I8999fc50da6a87f22c1ec23e9a0c15483b6f7df8 Reviewed-on: https://go-review.googlesource.com/c/go/+/705815 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/708865 |
||
|
|
4ff8a457db |
test/codegen: codify handling of floating point constants on arm64
While here, reorder Float32ConstantStore/Float64ConstantStore for consistency. Change-Id: Ic1b3e9f9474965d15bc94518d78d1a4a7bda93f3 Reviewed-on: https://go-review.googlesource.com/c/go/+/703756 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@google.com> |
||
|
|
97da068774 |
cmd/compile: eliminate nil checks on .dict arg
The first arg of a generic function is the dictionary. This dictionary is never nil, but it gets a nil check becuase the dict arg is treated as a slice during construction. cmp.Compare[go.shape.int] was: 00006 (+41) TESTB AX, (AX) 00007 (+52) CMPQ CX, BX 00008 (52) JGT 14 00009 (+55) JGE 12 00010 (+56) MOVL $1, AX 00011 (56) RET 00012 (+58) XORL AX, AX 00013 (58) RET 00014 (+53) MOVQ $-1, AX 00015 (53) RET Note how the function begins with a TESTB that loads the dict to perform the nil check. This CL eliminates that nil check. For most generic functions, this doesn't matter too much, but not infrequently are generic functions written which never actually use the dictionary (like cmp.Compare), so I suspect this might help in hot code to avoid repeatedly touching the dictionary in memory, and in cases where the generic function is not inlined (and thus the dict dropped). compilecmp shows these changes (deduped): cmp.Compare[go.shape.float64] 73 -> 72 (-1.37%) cmp.Compare[go.shape.int] 26 -> 24 (-7.69%) cmp.Compare[go.shape.int32] 25 -> 23 (-8.00%) cmp.Compare[go.shape.int64] 26 -> 24 (-7.69%) cmp.Compare[go.shape.string] 142 -> 141 (-0.70%) cmp.Compare[go.shape.uint16] 26 -> 24 (-7.69%) cmp.Compare[go.shape.uint] 26 -> 24 (-7.69%) cmp.Compare[go.shape.uint32] 25 -> 23 (-8.00%) cmp.Compare[go.shape.uint64] 26 -> 24 (-7.69%) cmp.Compare[go.shape.uint8] 25 -> 23 (-8.00%) cmp.Compare[go.shape.uintptr] 26 -> 24 (-7.69%) cmp.Less[go.shape.float64] 35 -> 34 (-2.86%) cmp.Less[go.shape.int32] 8 -> 6 (-25.00%) cmp.Less[go.shape.int64] 9 -> 7 (-22.22%) cmp.Less[go.shape.int] 9 -> 7 (-22.22%) cmp.Less[go.shape.string] 112 -> 110 (-1.79%) cmp.Less[go.shape.uint16] 9 -> 7 (-22.22%) cmp.Less[go.shape.uint32] 8 -> 6 (-25.00%) cmp.Less[go.shape.uint64] 9 -> 7 (-22.22%) internal/synctest.Associate[go.shape.struct 114 -> 113 (-0.88%) internal/trace.(*dataTable[go.shape.uint64,go.shape.string]).insert 805 -> 791 (-1.74%) internal/trace.(*dataTable[go.shape.uint64,go.shape.struct 858 -> 852 (-0.70%) main.(*gState[go.shape.int64]).stop 2111 -> 2085 (-1.23%) main.(*gState[go.shape.int64]).unblock 941 -> 923 (-1.91%) runtime.fmax[go.shape.float32] 85 -> 83 (-2.35%) runtime.fmax[go.shape.float64] 89 -> 87 (-2.25%) runtime.fmin[go.shape.float32] 85 -> 83 (-2.35%) runtime.fmin[go.shape.float64] 89 -> 87 (-2.25%) slices.BinarySearch[go.shape.[]string,go.shape.string] 346 -> 337 (-2.60%) slices.Concat[go.shape.[]uint8,go.shape.uint8] 462 -> 453 (-1.95%) slices.ContainsFunc[go.shape.[]*cmd/vendor/github.com/google/pprof/profile.Sample,go.shape.*uint8] 170 -> 169 (-0.59%) slices.ContainsFunc[go.shape.[]*debug/dwarf.StructField,go.shape.*uint8] 170 -> 169 (-0.59%) slices.ContainsFunc[go.shape.[]*go/ast.Field,go.shape.*uint8] 170 -> 169 (-0.59%) slices.ContainsFunc[go.shape.[]string,go.shape.string] 186 -> 181 (-2.69%) slices.Contains[go.shape.[]*cmd/compile/internal/syntax.BranchStmt,go.shape.*cmd/compile/internal/syntax.BranchStmt] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]cmd/compile/internal/syntax.Type,go.shape.interface 223 -> 219 (-1.79%) slices.Contains[go.shape.[]crypto/tls.CurveID,go.shape.uint16] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]crypto/tls.SignatureScheme,go.shape.uint16] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]*go/ast.BranchStmt,go.shape.*go/ast.BranchStmt] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]go/types.Type,go.shape.interface 223 -> 219 (-1.79%) slices.Contains[go.shape.[]int,go.shape.int] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]string,go.shape.string] 223 -> 219 (-1.79%) slices.Contains[go.shape.[]uint16,go.shape.uint16] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]uint8,go.shape.uint8] 44 -> 42 (-4.55%) slices.Insert[go.shape.[]string,go.shape.string] 1189 -> 1170 (-1.60%) slices.medianCmpFunc[go.shape.struct 1118 -> 1113 (-0.45%) slices.medianCmpFunc[go.shape.struct 1214 -> 1209 (-0.41%) slices.medianCmpFunc[go.shape.struct 889 -> 887 (-0.22%) slices.medianCmpFunc[go.shape.struct 901 -> 874 (-3.00%) slices.order2Ordered[go.shape.float64] 89 -> 87 (-2.25%) slices.order2Ordered[go.shape.uint16] 75 -> 70 (-6.67%) slices.partialInsertionSortOrdered[go.shape.string] 1115 -> 1110 (-0.45%) slices.partialInsertionSortOrdered[go.shape.uint16] 358 -> 352 (-1.68%) slices.partitionEqualOrdered[go.shape.int] 208 -> 203 (-2.40%) slices.partitionEqualOrdered[go.shape.int32] 208 -> 198 (-4.81%) slices.partitionEqualOrdered[go.shape.int64] 208 -> 203 (-2.40%) slices.partitionEqualOrdered[go.shape.uint32] 208 -> 198 (-4.81%) slices.partitionEqualOrdered[go.shape.uint64] 208 -> 203 (-2.40%) slices.partitionOrdered[go.shape.float64] 538 -> 533 (-0.93%) slices.partitionOrdered[go.shape.int] 437 -> 427 (-2.29%) slices.partitionOrdered[go.shape.int64] 437 -> 427 (-2.29%) slices.partitionOrdered[go.shape.uint16] 447 -> 442 (-1.12%) slices.partitionOrdered[go.shape.uint64] 437 -> 427 (-2.29%) slices.rotateCmpFunc[go.shape.struct 1045 -> 1029 (-1.53%) slices.rotateCmpFunc[go.shape.struct 1205 -> 1163 (-3.49%) slices.rotateCmpFunc[go.shape.struct 1226 -> 1176 (-4.08%) slices.rotateCmpFunc[go.shape.struct 1322 -> 1272 (-3.78%) slices.rotateCmpFunc[go.shape.struct 1419 -> 1400 (-1.34%) slices.rotateCmpFunc[go.shape.*uint8] 549 -> 538 (-2.00%) slices.rotateLeft[go.shape.string] 603 -> 588 (-2.49%) slices.rotateLeft[go.shape.uint8] 255 -> 250 (-1.96%) slices.siftDownOrdered[go.shape.int] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.int32] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.int64] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.string] 614 -> 592 (-3.58%) slices.siftDownOrdered[go.shape.uint32] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.uint64] 181 -> 171 (-5.52%) time.parseRFC3339[go.shape.string] 1774 -> 1758 (-0.90%) unique.(*canonMap[go.shape.struct 280 -> 276 (-1.43%) unique.clone[go.shape.struct 311 -> 293 (-5.79%) weak.Make[go.shape.6880e4598856efac32416085c0172278cf0fb9e5050ce6518bd9b7f7d1662440] 136 -> 134 (-1.47%) weak.Make[go.shape.struct 136 -> 134 (-1.47%) weak.Make[go.shape.uint8] 136 -> 134 (-1.47%) Change-Id: I43dcea5f2aa37372f773e5edc6a2ef1dee0a8db7 Reviewed-on: https://go-review.googlesource.com/c/go/+/706655 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> |
||
|
|
af6999e60d |
cmd/compile: implement jump table on loong64
Following CL 357330, use jump tables on Loong64.
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
│ old │ new │
│ sec/op │ sec/op vs base │
Switch8Predictable 2.352n ± 0% 2.101n ± 0% -10.65% (p=0.000 n=10)
Switch8Unpredictable 11.99n ± 0% 10.25n ± 0% -14.51% (p=0.000 n=10)
Switch32Predictable 3.153n ± 0% 1.887n ± 1% -40.14% (p=0.000 n=10)
Switch32Unpredictable 12.47n ± 0% 10.22n ± 0% -18.00% (p=0.000 n=10)
SwitchStringPredictable 3.162n ± 0% 3.352n ± 0% +6.01% (p=0.000 n=10)
SwitchStringUnpredictable 14.70n ± 0% 13.31n ± 0% -9.46% (p=0.000 n=10)
SwitchTypePredictable 3.702n ± 0% 2.201n ± 0% -40.55% (p=0.000 n=10)
SwitchTypeUnpredictable 16.18n ± 0% 14.48n ± 0% -10.51% (p=0.000 n=10)
SwitchInterfaceTypePredictable 7.654n ± 0% 9.680n ± 0% +26.47% (p=0.000 n=10)
SwitchInterfaceTypeUnpredictable 22.04n ± 0% 22.44n ± 0% +1.81% (p=0.000 n=10)
geomean 7.441n 6.469n -13.07%
Change-Id: Id6f30fa73349c60fac17670084daee56973a955f
Reviewed-on: https://go-review.googlesource.com/c/go/+/705396
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
||
|
|
78ef487a6f |
cmd/compile: fix the issue of shift amount exceeding the valid range
Fixes #75479 Change-Id: I362d3e49090e94f91a840dd5a475978b59222a00 Reviewed-on: https://go-review.googlesource.com/c/go/+/704135 Reviewed-by: Mark Freeman <markfreeman@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: abner chenc <chenguoqi@loongson.cn> |
||
|
|
2469e92d8c |
cmd/compile: combine doubling with shift on riscv64
Change-Id: I4bee2770fedf97e35b5a5b9187a8ba3c41f9ec2e Reviewed-on: https://go-review.googlesource.com/c/go/+/702697 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@google.com> |
||
|
|
e5ee1f2600 |
test/codegen: check zerobase for newobject on 0-sized types
This CL also adds riscv64 checks Change-Id: I693e4e606f470615f6b49085592d6d5ca61473d3 Reviewed-on: https://go-review.googlesource.com/c/go/+/703716 Reviewed-by: Pengcheng Wang <wangpengcheng.pp@bytedance.com> Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> |
||
|
|
dc960d0bfe |
cmd/compile, reflect: further allow inlining of TypeFor
Previous CLs optimized direct use of abi.Type, but reflect.Type is
indirected, so was not benefiting.
For TypeFor, we can use toRType directly without a nil check because the
types are statically known.
Normally, I'd think SSA would remove the nil check, but due to some
oddity (specifically, late fuse being required to remove the nil check,
but opt doesn't run that late) means that the nil check persists and
gets in the way.
Manually writing the code in this instance seems to fix the problem.
It also exposed another problem; depending on the ordering, writeType
could get to a type symbol before SSA, thereby preventing Extra from
being created on the symbol for later lookups that don't go through
TypeLinksym directly. In writeType, for non-shape types, call
TypeLinksym to ensure that the type is set up for later callers. That
change itself passed toolstash -cmp.
All up, this stack put through compilecmp shows a lot of improvement in
various reflect-using packages, and reflect itself. It is too big to fit
in the commit message but here's some info:
compilecmp master -> HEAD
master (
|
||
|
|
80a2aae922 |
Revert "cmd/compile: improve stp merging for non-sequent cases"
This reverts commit
|
||
|
|
a5fa5ea51c |
cmd/compile/internal/ssa: expand runtime.memequal for length {3,5,6,7}
This CL slightly speeds up strings.HasPrefix when testing constant
prefixes of length {3,5,6,7}.
goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
│ old │ new │
│ sec/op │ sec/op vs base │
StringPrefix3-8 11.125n ± 2% 8.539n ± 1% -23.25% (p=0.000 n=20)
StringPrefix5-8 11.170n ± 2% 8.700n ± 1% -22.11% (p=0.000 n=20)
StringPrefix6-8 11.190n ± 2% 8.655n ± 1% -22.65% (p=0.000 n=20)
StringPrefix7-8 11.095n ± 1% 8.878n ± 1% -19.98% (p=0.000 n=20)
Change-Id: I510a80d59cf78680b57d68780d35d212d24030e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/700816
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
||
|
|
4c63d798cb |
cmd/compile: improve stp merging for non-sequent cases
Original algorithm merges stores with the first mergeable store in the chain, but it misses some cases. Additional reordering stores in increasing order of memory access in the chain allows merging in these cases. Fixes #71987 There are the results of sweet benchmarks and the difference between sizes of sections .text │ old.results │ new.results │ │ sec/op │ sec/op vs base │ BleveIndexBatch100-4 7.614 ± 2% 7.548 ± 1% ~ (p=0.190 n=10) ESBuildThreeJS-4 821.3m ± 0% 819.0m ± 1% ~ (p=0.165 n=10) ESBuildRomeTS-4 206.2m ± 1% 204.4m ± 1% -0.90% (p=0.023 n=10) EtcdPut-4 64.89m ± 1% 64.94m ± 2% ~ (p=0.684 n=10) EtcdSTM-4 318.4m ± 0% 319.2m ± 1% ~ (p=0.631 n=10) GoBuildKubelet-4 157.4 ± 0% 157.6 ± 0% ~ (p=0.105 n=10) GoBuildKubeletLink-4 12.42 ± 2% 12.41 ± 1% ~ (p=0.529 n=10) GoBuildIstioctl-4 124.4 ± 0% 124.4 ± 0% ~ (p=0.579 n=10) GoBuildIstioctlLink-4 8.700 ± 1% 8.693 ± 1% ~ (p=0.912 n=10) GoBuildFrontend-4 46.52 ± 0% 46.50 ± 0% ~ (p=0.971 n=10) GoBuildFrontendLink-4 2.282 ± 1% 2.272 ± 1% ~ (p=0.529 n=10) GoBuildTsgo-4 75.02 ± 1% 75.31 ± 1% ~ (p=0.436 n=10) GoBuildTsgoLink-4 1.229 ± 1% 1.219 ± 1% -0.82% (p=0.035 n=10) GopherLuaKNucleotide-4 34.77 ± 5% 34.31 ± 1% -1.33% (p=0.015 n=10) MarkdownRenderXHTML-4 286.6m ± 0% 285.7m ± 1% ~ (p=0.315 n=10) Tile38QueryLoad-4 657.2µ ± 1% 660.3µ ± 0% ~ (p=0.436 n=10) geomean 2.570 2.563 -0.24% Executable Old .text New .text Change ------------------------------------------------------- benchmark 6504820 6504020 -0.01% bleve-index-bench 3903860 3903636 -0.01% esbuild 4801012 4801172 +0.00% esbuild-bench 1256404 1256340 -0.01% etcd 9188148 9187076 -0.01% etcd-bench 6462228 6461524 -0.01% go 5924468 5923892 -0.01% go-build-bench 1282004 1281940 -0.00% gopher-lua-bench 1639540 1639348 -0.01% markdown-bench 1478452 1478356 -0.01% tile38-bench 2753524 2753300 -0.01% tile38-server 10241380 10240068 -0.01% Change-Id: Ieb4fdfd656aca458f65fc45938de70550632bd13 Reviewed-on: https://go-review.googlesource.com/c/go/+/698097 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: Keith Randall <khr@google.com> |
||
|
|
f5b20689e9 |
cmd/compile: optimize loads from readonly globals into constants on loong64
Ref: CL 141118 Update #26498 Change-Id: I9c4ad2bedc4d50bd273bbe9119a898d4fca95e45 Reviewed-on: https://go-review.googlesource.com/c/go/+/700875 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
3492e4262b |
cmd/compile: simplify specific addition operations using the ADDV16 instruction
On loong64, the addi.d instruction can only directly handle 12-bit immediate numbers. If a larger immediate number needs to be processed, it must first be placed in a register, and then the add.d instruction is used to complete the processing of the larger immediate number. If a larger immediate number c satisfies is32Bit(c) && c&0xffff == 0, then the ADDV16 instruction can be used to complete the addition operation. Removes 164 instructions from the go binary on loong64. Change-Id: I404de93cc4eaaa12fe424f5a0d61b03231215d1a Reviewed-on: https://go-review.googlesource.com/c/go/+/700536 Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> |
||
|
|
df29038486 |
cmd/compile/internal/ssa: load constant values from abi.PtrType.Elem
This CL makes the generated code for reflect.TypeFor as simple as an intrinsic function. Fixes #75203 Change-Id: I7bb48787101f07e77ab5c583292e834c28a028d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/700336 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Keith Randall <khr@golang.org> |
||
|
|
bd71b94659 |
cmd/compile/internal: optimizing add+sll rule using ALSLV instruction on loong64
Reduce the number of go toolchain instructions on loong64 as follows: file before after Δ % go 1573148 1571708 -1,440 -0.0915% gofmt 320578 320090 -488 -0.1522% asm 555066 554406 -660 -0.1189% cgo 481566 480926 -640 -0.1329% compile 2475962 2473880 -2,082 -0.0841% cover 516536 515920 -616 -0.1193% link 702172 701404 -768 -0.1094% preprofile 238626 238274 -352 -0.1475% vet 792928 792100 -828 -0.1044% Change-Id: I61e462726835959c60e1b4e5256d4020202418ab Reviewed-on: https://go-review.googlesource.com/c/go/+/693877 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> |
||
|
|
44c5956bf7 |
test/codegen: add Mul2 and DivPow2 test for loong64
Change-Id: I29ccd105c5418955146a3f4873162963da489a70 Reviewed-on: https://go-review.googlesource.com/c/go/+/697935 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> |
||
|
|
0aa8019e94 |
test/codegen: add Mul* test for loong64
Change-Id: Ica285212e4884a96fe9738b53cdc789b223bf2e3 Reviewed-on: https://go-review.googlesource.com/c/go/+/697895 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: abner chenc <chenguoqi@loongson.cn> |
||
|
|
83420974b7 |
test/codegen: add sqrt* abs and copysign test for loong64
Change-Id: I645396fc4b00242f36a06f01550906805c0c1f73 Reviewed-on: https://go-review.googlesource.com/c/go/+/697955 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Carlos Amedee <carlos@golang.org> |
||
|
|
1843f1e9c0 |
cmd/compile: use zero register instead of specialized *zero instructions on loong64
Refer to CL 633075, loong64 has a zero(R0) register that can be used to do this. Change-Id: I846c6bdfcfd6dbfa18338afc13e34e350580ead4 Reviewed-on: https://go-review.googlesource.com/c/go/+/693876 Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> |
||
|
|
9632ba8160 |
cmd/compile: optimize some patterns into revb2h/revb4h instruction on loong64
Pattern1: (the type of c is uint16)
c>>8 | c<<8
To:
revb2h c
Pattern2: (the type of c is uint32)
(c & 0xff00ff00)>>8 | (c & 0x00ff00ff)<<8
To:
revb2h c
Pattern3: (the type of c is uint64)
(c & 0xff00ff00ff00ff00)>>8 | (c & 0x00ff00ff00ff00ff)<<8
To:
revb4h c
Change-Id: Ic6231a3f476cbacbea4bd00e31193d107cb86cda
Reviewed-on: https://go-review.googlesource.com/c/go/+/696335
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
||
|
|
fa706ea50f |
cmd/compile: optimize rule (x + x) << c to x << c+1 on loong64
Change-Id: I782f93510bba92ba60b298c1c1cde456c8bcec38 Reviewed-on: https://go-review.googlesource.com/c/go/+/697956 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> |
||
|
|
320df537cc |
cmd/compile: emit classify instructions for infinity tests on riscv64
The 'classify' instruction on RISC-V sets a bit in a mask to indicate
the class a floating point value belongs to (e.g. whether the value is
an infinity, a normal number, a subnormal number and so on). There are
other places this instruction is useful but for now I've just used it
for infinity tests.
The gains are relatively small (~1-2 instructions per IsInf call) but
using FCLASSD does potentially unlock further optimizations. It also
reduces the number of loads from memory and the number of moves
between general purpose and floating point register files.
goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
│ sec/op │ sec/op vs base │
Acos 159.9n ± 0% 173.7n ± 0% +8.66% (p=0.000 n=10)
Acosh 249.8n ± 0% 254.4n ± 0% +1.86% (p=0.000 n=10)
Asin 159.9n ± 0% 173.7n ± 0% +8.66% (p=0.000 n=10)
Asinh 292.2n ± 0% 283.0n ± 0% -3.15% (p=0.000 n=10)
Atan 119.1n ± 0% 119.0n ± 0% -0.08% (p=0.036 n=10)
Atanh 265.1n ± 0% 271.6n ± 0% +2.43% (p=0.000 n=10)
Atan2 194.9n ± 0% 186.7n ± 0% -4.23% (p=0.000 n=10)
Cbrt 216.3n ± 0% 203.1n ± 0% -6.10% (p=0.000 n=10)
Ceil 31.82n ± 0% 31.81n ± 0% ~ (p=0.063 n=10)
Copysign 4.897n ± 0% 4.893n ± 3% -0.08% (p=0.038 n=10)
Cos 123.9n ± 0% 107.7n ± 1% -13.03% (p=0.000 n=10)
Cosh 293.0n ± 0% 264.6n ± 0% -9.68% (p=0.000 n=10)
Erf 150.0n ± 0% 133.8n ± 0% -10.80% (p=0.000 n=10)
Erfc 151.8n ± 0% 137.9n ± 0% -9.16% (p=0.000 n=10)
Erfinv 173.8n ± 0% 173.8n ± 0% ~ (p=0.820 n=10)
Erfcinv 173.8n ± 0% 173.8n ± 0% ~ (p=1.000 n=10)
Exp 247.7n ± 0% 220.4n ± 0% -11.04% (p=0.000 n=10)
ExpGo 261.4n ± 0% 232.5n ± 0% -11.04% (p=0.000 n=10)
Expm1 176.2n ± 0% 164.9n ± 0% -6.41% (p=0.000 n=10)
Exp2 220.4n ± 0% 190.2n ± 0% -13.70% (p=0.000 n=10)
Exp2Go 232.5n ± 0% 204.0n ± 0% -12.22% (p=0.000 n=10)
Abs 4.897n ± 0% 4.897n ± 0% ~ (p=0.726 n=10)
Dim 16.32n ± 0% 16.31n ± 0% ~ (p=0.770 n=10)
Floor 31.84n ± 0% 31.83n ± 0% ~ (p=0.677 n=10)
Max 26.11n ± 0% 26.13n ± 0% ~ (p=0.290 n=10)
Min 26.10n ± 0% 26.11n ± 0% ~ (p=0.424 n=10)
Mod 416.2n ± 0% 337.8n ± 0% -18.83% (p=0.000 n=10)
Frexp 63.65n ± 0% 50.60n ± 0% -20.50% (p=0.000 n=10)
Gamma 218.8n ± 0% 206.4n ± 0% -5.62% (p=0.000 n=10)
Hypot 92.20n ± 0% 94.69n ± 0% +2.70% (p=0.000 n=10)
HypotGo 107.7n ± 0% 109.3n ± 0% +1.49% (p=0.000 n=10)
Ilogb 59.54n ± 0% 44.04n ± 0% -26.04% (p=0.000 n=10)
J0 708.9n ± 0% 674.5n ± 0% -4.86% (p=0.000 n=10)
J1 707.6n ± 0% 676.1n ± 0% -4.44% (p=0.000 n=10)
Jn 1.513µ ± 0% 1.427µ ± 0% -5.68% (p=0.000 n=10)
Ldexp 70.20n ± 0% 57.09n ± 0% -18.68% (p=0.000 n=10)
Lgamma 201.5n ± 0% 185.3n ± 1% -8.01% (p=0.000 n=10)
Log 201.5n ± 0% 182.7n ± 0% -9.35% (p=0.000 n=10)
Logb 59.54n ± 0% 46.53n ± 0% -21.86% (p=0.000 n=10)
Log1p 178.8n ± 0% 173.9n ± 6% -2.74% (p=0.021 n=10)
Log10 201.4n ± 0% 184.3n ± 0% -8.49% (p=0.000 n=10)
Log2 79.17n ± 0% 66.07n ± 0% -16.54% (p=0.000 n=10)
Modf 34.27n ± 0% 34.25n ± 0% ~ (p=0.559 n=10)
Nextafter32 49.34n ± 0% 49.37n ± 0% +0.05% (p=0.040 n=10)
Nextafter64 43.66n ± 0% 43.66n ± 0% ~ (p=0.869 n=10)
PowInt 309.1n ± 0% 267.4n ± 0% -13.49% (p=0.000 n=10)
PowFrac 769.6n ± 0% 677.3n ± 0% -11.98% (p=0.000 n=10)
Pow10Pos 13.88n ± 0% 13.88n ± 0% ~ (p=0.811 n=10)
Pow10Neg 19.58n ± 0% 19.57n ± 0% ~ (p=0.993 n=10)
Round 23.65n ± 0% 23.66n ± 0% ~ (p=0.354 n=10)
RoundToEven 27.75n ± 0% 27.75n ± 0% ~ (p=0.971 n=10)
Remainder 380.0n ± 0% 309.9n ± 0% -18.45% (p=0.000 n=10)
Signbit 13.06n ± 0% 13.06n ± 0% ~ (p=1.000 n=10)
Sin 133.8n ± 0% 120.8n ± 0% -9.75% (p=0.000 n=10)
Sincos 160.7n ± 0% 147.7n ± 0% -8.12% (p=0.000 n=10)
Sinh 305.9n ± 0% 277.9n ± 0% -9.17% (p=0.000 n=10)
SqrtIndirect 3.265n ± 0% 3.264n ± 0% ~ (p=0.546 n=10)
SqrtLatency 19.58n ± 0% 19.58n ± 0% ~ (p=0.973 n=10)
SqrtIndirectLatency 19.59n ± 0% 19.58n ± 0% ~ (p=0.370 n=10)
SqrtGoLatency 205.7n ± 0% 202.7n ± 0% -1.46% (p=0.000 n=10)
SqrtPrime 4.953µ ± 0% 4.954µ ± 0% ~ (p=0.477 n=10)
Tan 163.2n ± 0% 150.2n ± 0% -7.99% (p=0.000 n=10)
Tanh 312.4n ± 0% 284.2n ± 0% -9.01% (p=0.000 n=10)
Trunc 31.83n ± 0% 31.83n ± 0% ~ (p=0.663 n=10)
Y0 701.0n ± 0% 669.2n ± 0% -4.54% (p=0.000 n=10)
Y1 704.5n ± 0% 672.4n ± 0% -4.55% (p=0.000 n=10)
Yn 1.490µ ± 0% 1.422µ ± 0% -4.60% (p=0.000 n=10)
Float64bits 5.713n ± 0% 5.710n ± 0% ~ (p=0.926 n=10)
Float64frombits 4.896n ± 0% 4.896n ± 0% ~ (p=0.663 n=10)
Float32bits 12.25n ± 0% 12.25n ± 0% ~ (p=0.571 n=10)
Float32frombits 4.898n ± 0% 4.896n ± 0% ~ (p=0.754 n=10)
FMA 4.895n ± 0% 4.895n ± 0% ~ (p=0.745 n=10)
geomean 94.40n 89.43n -5.27%
Change-Id: I4fe0f2e9f609e38d79463f9ba2519a3f9427432e
Reviewed-on: https://go-review.googlesource.com/c/go/+/348389
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
||
|
|
90b7d7aaa2 |
cmd/compile/internal: optimize multiplication use new operation 'ADDshiftLLV' on loong64
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
│ old │ new │
│ sec/op │ sec/op vs base │
MulconstI32/3 0.8004n ± 0% 0.4247n ± 2% -46.94% (p=0.000 n=10)
MulconstI32/5 0.8005n ± 0% 0.4256n ± 1% -46.83% (p=0.000 n=10)
MulconstI32/12 1.2010n ± 0% 0.8005n ± 0% -33.35% (p=0.000 n=10)
MulconstI32/120 0.8090n ± 0% 0.8067n ± 0% -0.28% (p=0.007 n=10)
MulconstI32/-120 0.8109n ± 0% 0.8072n ± 0% -0.47% (p=0.000 n=10)
MulconstI32/65537 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10)
MulconstI32/65538 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.265 n=10)
MulconstI64/3 0.8005n ± 0% 0.4241n ± 1% -47.02% (p=0.000 n=10)
MulconstI64/5 0.8004n ± 0% 0.4249n ± 1% -46.91% (p=0.000 n=10)
MulconstI64/12 1.2010n ± 0% 0.8004n ± 0% -33.36% (p=0.000 n=10)
MulconstI64/120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.635 n=10)
MulconstI64/-120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.837 n=10)
MulconstI64/65537 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.837 n=10)
MulconstI64/65538 0.8096n ± 0% 0.8004n ± 0% -1.14% (p=0.000 n=10)
MulconstU32/3 0.8004n ± 0% 0.4263n ± 1% -46.75% (p=0.000 n=10)
MulconstU32/5 0.8005n ± 0% 0.4262n ± 1% -46.76% (p=0.000 n=10)
MulconstU32/12 1.2010n ± 0% 0.8005n ± 0% -33.35% (p=0.000 n=10)
MulconstU32/120 0.8105n ± 0% 0.8096n ± 0% ~ (p=0.183 n=10)
MulconstU32/65537 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10)
MulconstU32/65538 0.8005n ± 0% 0.8005n ± 0% ~ (p=1.000 n=10)
MulconstU64/3 0.8004n ± 0% 0.4265n ± 4% -46.71% (p=0.000 n=10)
MulconstU64/5 0.8004n ± 0% 0.4256n ± 0% -46.82% (p=0.000 n=10)
MulconstU64/12 1.2010n ± 0% 0.8004n ± 0% -33.36% (p=0.000 n=10)
MulconstU64/120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.387 n=10)
MulconstU64/65537 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.265 n=10)
MulconstU64/65538 0.8080n ± 0% 0.8004n ± 0% -0.93% (p=0.000 n=10)
geomean 0.8539n 0.6597n -22.74%
Change-Id: Ie33e88985d7639f481bbba540bc917b9f185c357
Reviewed-on: https://go-review.googlesource.com/c/go/+/693855
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
||
|
|
f04421ea9a |
cmd/compile: soften test for 74788
We now (as of CL 678620) use float registers other than X0 for copying. Change-Id: Ifdecd5df7519663742eed0f292c98453754d4b25 Reviewed-on: https://go-review.googlesource.com/c/go/+/695275 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com> |
||
|
|
084c0f8494 |
cmd/compile: allow InlMark operations to be speculatively executed
Although InlMark takes a memory argument it ultimately becomes a NOP and therefore is safe to speculatively execute. Fixes #74915 Change-Id: I64317dd433e300ac28de2bcf201845083ec2ac82 Reviewed-on: https://go-review.googlesource.com/c/go/+/693795 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> |
||
|
|
a552737418 |
cmd/compile: fold negation into multiplication on loong64
This change also add corresponding benchmark tests and codegen tests.
The performance improvement on CPU Loongson-3A6000-HV is as follows:
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
MulNeg 828.4n ± 0% 655.9n ± 0% -20.82% (p=0.000 n=10)
Mul2Neg 1062.0n ± 0% 826.8n ± 0% -22.15% (p=0.000 n=10)
geomean 938.0n 736.4n -21.49%
Change-Id: Ia999732880ec65be0c66cddc757a4868847e5b15
Reviewed-on: https://go-review.googlesource.com/c/go/+/682535
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
|
||
|
|
fcc036f03b |
cmd/compile: optimise float <-> int register moves on riscv64
Use the FMV* instructions to move values between the floating point and
integer register files.
Note: I'm unsure why there is a slowdown in the Float32bits benchmark,
I've checked and an FMVXS instruction is being used as expected. There
are multiple loads and other instructions in the main loop.
goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
│ fmv-before.txt │ fmv-after.txt │
│ sec/op │ sec/op vs base │
Acos 122.7n ± 0% 122.7n ± 0% ~ (p=1.000 n=10)
Acosh 197.2n ± 0% 191.5n ± 0% -2.89% (p=0.000 n=10)
Asin 122.7n ± 0% 122.7n ± 0% ~ (p=0.474 n=10)
Asinh 231.0n ± 0% 224.1n ± 0% -2.99% (p=0.000 n=10)
Atan 91.39n ± 0% 91.41n ± 0% ~ (p=0.465 n=10)
Atanh 210.3n ± 0% 203.4n ± 0% -3.26% (p=0.000 n=10)
Atan2 149.6n ± 0% 149.6n ± 0% ~ (p=0.721 n=10)
Cbrt 176.5n ± 0% 165.9n ± 0% -6.01% (p=0.000 n=10)
Ceil 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10)
Copysign 3.756n ± 0% 3.756n ± 0% ~ (p=0.149 n=10)
Cos 95.15n ± 0% 95.15n ± 0% ~ (p=0.374 n=10)
Cosh 228.6n ± 0% 224.7n ± 0% -1.71% (p=0.000 n=10)
Erf 115.2n ± 0% 115.2n ± 0% ~ (p=0.474 n=10)
Erfc 116.4n ± 0% 116.4n ± 0% ~ (p=0.628 n=10)
Erfinv 133.3n ± 0% 133.3n ± 0% ~ (p=1.000 n=10)
Erfcinv 133.3n ± 0% 133.3n ± 0% ~ (p=1.000 n=10)
Exp 194.1n ± 0% 190.3n ± 0% -1.93% (p=0.000 n=10)
ExpGo 204.7n ± 0% 200.3n ± 0% -2.15% (p=0.000 n=10)
Expm1 137.7n ± 0% 135.2n ± 0% -1.82% (p=0.000 n=10)
Exp2 173.4n ± 0% 169.0n ± 0% -2.54% (p=0.000 n=10)
Exp2Go 182.8n ± 0% 178.4n ± 0% -2.41% (p=0.000 n=10)
Abs 3.756n ± 0% 3.756n ± 0% ~ (p=0.157 n=10)
Dim 12.52n ± 0% 12.52n ± 0% ~ (p=0.737 n=10)
Floor 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10)
Max 21.29n ± 0% 20.03n ± 0% -5.92% (p=0.000 n=10)
Min 21.28n ± 0% 20.04n ± 0% -5.85% (p=0.000 n=10)
Mod 344.9n ± 0% 319.2n ± 0% -7.45% (p=0.000 n=10)
Frexp 55.71n ± 0% 48.85n ± 0% -12.30% (p=0.000 n=10)
Gamma 165.9n ± 0% 167.8n ± 0% +1.15% (p=0.000 n=10)
Hypot 73.24n ± 0% 70.74n ± 0% -3.41% (p=0.000 n=10)
HypotGo 84.50n ± 0% 82.63n ± 0% -2.21% (p=0.000 n=10)
Ilogb 49.45n ± 0% 45.70n ± 0% -7.59% (p=0.000 n=10)
J0 556.5n ± 0% 544.0n ± 0% -2.25% (p=0.000 n=10)
J1 555.3n ± 0% 542.8n ± 0% -2.24% (p=0.000 n=10)
Jn 1.181µ ± 0% 1.156µ ± 0% -2.12% (p=0.000 n=10)
Ldexp 59.47n ± 0% 53.84n ± 0% -9.47% (p=0.000 n=10)
Lgamma 167.2n ± 0% 154.6n ± 0% -7.51% (p=0.000 n=10)
Log 160.9n ± 0% 154.6n ± 0% -3.92% (p=0.000 n=10)
Logb 49.45n ± 0% 45.70n ± 0% -7.58% (p=0.000 n=10)
Log1p 147.1n ± 0% 137.1n ± 0% -6.80% (p=0.000 n=10)
Log10 162.1n ± 1% 154.6n ± 0% -4.63% (p=0.000 n=10)
Log2 66.99n ± 0% 60.72n ± 0% -9.36% (p=0.000 n=10)
Modf 29.42n ± 0% 26.29n ± 0% -10.64% (p=0.000 n=10)
Nextafter32 41.95n ± 0% 37.88n ± 0% -9.70% (p=0.000 n=10)
Nextafter64 38.82n ± 0% 33.49n ± 0% -13.73% (p=0.000 n=10)
PowInt 252.3n ± 0% 237.3n ± 0% -5.95% (p=0.000 n=10)
PowFrac 615.5n ± 0% 589.7n ± 0% -4.19% (p=0.000 n=10)
Pow10Pos 10.64n ± 0% 10.64n ± 0% ~ (p=1.000 n=10)
Pow10Neg 24.42n ± 0% 15.02n ± 0% -38.49% (p=0.000 n=10)
Round 21.91n ± 0% 18.16n ± 0% -17.12% (p=0.000 n=10)
RoundToEven 24.42n ± 0% 21.29n ± 0% -12.84% (p=0.000 n=10)
Remainder 308.0n ± 0% 291.2n ± 0% -5.44% (p=0.000 n=10)
Signbit 10.02n ± 0% 10.02n ± 0% ~ (p=1.000 n=10)
Sin 102.7n ± 0% 102.7n ± 0% ~ (p=0.211 n=10)
Sincos 124.0n ± 1% 123.3n ± 0% -0.56% (p=0.002 n=10)
Sinh 239.1n ± 0% 234.7n ± 0% -1.84% (p=0.000 n=10)
SqrtIndirect 2.504n ± 0% 2.504n ± 0% ~ (p=0.303 n=10)
SqrtLatency 15.03n ± 0% 15.02n ± 0% ~ (p=0.598 n=10)
SqrtIndirectLatency 15.02n ± 0% 15.02n ± 0% ~ (p=0.907 n=10)
SqrtGoLatency 165.3n ± 0% 157.2n ± 0% -4.90% (p=0.000 n=10)
SqrtPrime 3.801µ ± 0% 3.802µ ± 0% ~ (p=1.000 n=10)
Tan 125.2n ± 0% 125.2n ± 0% ~ (p=0.458 n=10)
Tanh 244.2n ± 0% 239.9n ± 0% -1.76% (p=0.000 n=10)
Trunc 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10)
Y0 550.2n ± 0% 538.1n ± 0% -2.21% (p=0.000 n=10)
Y1 552.8n ± 0% 540.6n ± 0% -2.21% (p=0.000 n=10)
Yn 1.168µ ± 0% 1.143µ ± 0% -2.14% (p=0.000 n=10)
Float64bits 8.139n ± 0% 4.385n ± 0% -46.13% (p=0.000 n=10)
Float64frombits 7.512n ± 0% 3.759n ± 0% -49.96% (p=0.000 n=10)
Float32bits 8.138n ± 0% 9.393n ± 0% +15.42% (p=0.000 n=10)
Float32frombits 7.513n ± 0% 3.757n ± 0% -49.98% (p=0.000 n=10)
FMA 3.756n ± 0% 3.756n ± 0% ~ (p=0.246 n=10)
geomean 77.43n 72.42n -6.47%
Change-Id: I8dac69b1d17cb3d2af78d1c844d2b5d80000d667
Reviewed-on: https://go-review.googlesource.com/c/go/+/599235
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Munday <mikemndy@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
||
|
|
e071617222 |
cmd/compile: optimize multiplication rules on loong64
Improve multiplication strength reduction, refer to CL 626998,
add additional 3 linear combination instructions for loong64.
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
MulconstI32/3 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstI32/5 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstI32/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10)
MulconstI32/120 1.6010n ± 0% 0.8130n ± 0% -49.22% (p=0.000 n=10)
MulconstI32/-120 1.6010n ± 0% 0.8109n ± 0% -49.35% (p=0.000 n=10)
MulconstI32/65537 1.6275n ± 0% 0.8005n ± 0% -50.81% (p=0.000 n=10)
MulconstI32/65538 1.6290n ± 0% 0.8004n ± 0% -50.87% (p=0.000 n=10)
MulconstI64/3 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10)
MulconstI64/5 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10)
MulconstI64/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10)
MulconstI64/120 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstI64/-120 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstI64/65537 1.6270n ± 0% 0.8005n ± 0% -50.80% (p=0.000 n=10)
MulconstI64/65538 1.6290n ± 0% 0.8071n ± 1% -50.45% (p=0.000 n=10)
MulconstU32/3 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10)
MulconstU32/5 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10)
MulconstU32/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10)
MulconstU32/120 1.6010n ± 0% 0.8066n ± 0% -49.62% (p=0.000 n=10)
MulconstU32/65537 1.6290n ± 0% 0.8005n ± 0% -50.86% (p=0.000 n=10)
MulconstU32/65538 1.6280n ± 0% 0.8005n ± 0% -50.83% (p=0.000 n=10)
MulconstU64/3 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstU64/5 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstU64/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10)
MulconstU64/120 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstU64/65537 1.6290n ± 0% 0.8005n ± 0% -50.86% (p=0.000 n=10)
MulconstU64/65538 1.6300n ± 0% 0.8067n ± 0% -50.51% (p=0.000 n=10)
geomean 1.609n 0.8537n -46.95%
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
MulconstI32/3 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstI32/5 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstI32/12 1.601n ± 0% 1.202n ± 0% -24.92% (p=0.000 n=10)
MulconstI32/120 1.6020n ± 0% 0.8012n ± 0% -49.99% (p=0.000 n=10)
MulconstI32/-120 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstI32/65537 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10)
MulconstI32/65538 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstI64/3 1.6015n ± 0% 0.8007n ± 0% -50.00% (p=0.000 n=10)
MulconstI64/5 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10)
MulconstI64/12 1.602n ± 0% 1.202n ± 0% -25.00% (p=0.000 n=10)
MulconstI64/120 1.6030n ± 0% 0.8011n ± 0% -50.02% (p=0.000 n=10)
MulconstI64/-120 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10)
MulconstI64/65537 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstI64/65538 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstU32/3 1.6010n ± 0% 0.8006n ± 0% -49.99% (p=0.000 n=10)
MulconstU32/5 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstU32/12 1.601n ± 0% 1.202n ± 0% -24.92% (p=0.000 n=10)
MulconstU32/120 1.6010n ± 0% 0.8006n ± 0% -49.99% (p=0.000 n=10)
MulconstU32/65537 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstU32/65538 1.6020n ± 0% 0.8009n ± 0% -50.01% (p=0.000 n=10)
MulconstU64/3 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstU64/5 1.6010n ± 0% 0.8007n ± 0% -49.98% (p=0.000 n=10)
MulconstU64/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10)
MulconstU64/120 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10)
MulconstU64/65537 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstU64/65538 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
geomean 1.601n 0.8523n -46.77%
Change-Id: I9fb0e47ca57875da171a347bf4828adfab41b875
Reviewed-on: https://go-review.googlesource.com/c/go/+/675455
Reviewed-by: Mark Freeman <mark@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
||
|
|
eb7f515c4d |
cmd/compile: use generated loops instead of DUFFZERO on amd64
goarch: amd64
cpu: 12th Gen Intel(R) Core(TM) i7-12700
│ base │ exp │
│ sec/op │ sec/op vs base │
MemclrKnownSize112-20 1.270n ± 14% 1.006n ± 0% -20.72% (p=0.000 n=10)
MemclrKnownSize128-20 1.266n ± 0% 1.005n ± 0% -20.58% (p=0.000 n=10)
MemclrKnownSize192-20 1.771n ± 0% 1.579n ± 1% -10.84% (p=0.000 n=10)
MemclrKnownSize248-20 4.034n ± 0% 3.520n ± 0% -12.75% (p=0.000 n=10)
MemclrKnownSize256-20 2.269n ± 0% 2.014n ± 0% -11.26% (p=0.000 n=10)
MemclrKnownSize512-20 4.280n ± 0% 4.030n ± 0% -5.84% (p=0.000 n=10)
MemclrKnownSize1024-20 8.309n ± 1% 8.057n ± 0% -3.03% (p=0.000 n=10)
Change-Id: I8f1627e2a1e981ff351dc7178932b32a2627f765
Reviewed-on: https://go-review.googlesource.com/c/go/+/678937
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
||
|
|
cedf63616a |
cmd/compile: add floating point min/max intrinsics on s390x
Add the VECTOR FP (MINIMUM|MAXIMUM) instructions to the assembler and use them in the compiler to implement min and max. Note: I've allowed floating point registers to be used with the single element instructions (those with the W instead of V prefix) to allow easier integration into the compiler. Change-Id: I5f80a510bd248cf483cce95f1979bf63fbae7de6 Reviewed-on: https://go-review.googlesource.com/c/go/+/684715 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <mark@golang.org> Reviewed-by: Keith Randall <khr@google.com> |