This API was added for #25819, where it was discussed as math.FMA.
The commit adding it used math.Fma, presumably for consistency
with the rest of the unusual names in package math
(Sincos, Acosh, Erfcinv, Float32bits, etc).
I believe that using an idiomatic Go name is more important here
than consistency with these other names, most of which are historical
baggage from C's standard library.
Early additions like Float32frombits happened before "uppercase for export"
(so they were originally like "float32frombits") and they were not properly
reconsidered when we uppercased the symbols to export them.
That's a mistake we live with.
The names of functions we have added since then, and even a few
that were legacy, are more properly Go-cased, such as IsNaN, IsInf,
and RoundToEven, rather than Isnan, Isinf, and Roundtoeven.
And also constants like MaxFloat32.
For new API, we should keep using proper Go-cased symbols
instead of minimally-upper-cased-C symbols.
So math.FMA, not math.Fma.
This API has not yet been released, so this change does not break
the compatibility promise.
This CL also modifies cmd/compile, since the compiler knows
the name of the function. I could have stopped at changing the
string constants, but it seemed to make more sense to use a
consistent casing everywhere.
Change-Id: I0f6f3407f41e99bfa8239467345c33945088896e
Reviewed-on: https://go-review.googlesource.com/c/go/+/205317
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
For #10958, #24543, but makes sense on its own.
Change-Id: I2a87dab66b82a1863e4b6512b1f8def51463ce2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/203284
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
We use -1 for Phi argLength, not "a really large number".
Change-Id: I5d4fa364c22c3cf40213f2368a9a20de7e989aac
Reviewed-on: https://go-review.googlesource.com/c/go/+/203883
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
In order to make math.FMA a compiler intrinsic for ISAs like ARM64,
PPC64[le], and S390X, a generic 3-argument opcode "Fma" is provided and
rewritten as
ARM64: (Fma x y z) -> (FMADDD z x y)
PPC64: (Fma x y z) -> (FMADD x y z)
S390X: (Fma x y z) -> (FMADD z x y)
Updates #25819.
Change-Id: Ie5bc628311e6feeb28ddf9adaa6e702c8c291efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/131959
Run-TryBot: Akhil Indurti <aindurti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Use the following (suboptimal) script to obtain a list of possible
typos:
#!/usr/bin/env sh
set -x
git ls-files |\
grep -e '\.\(c\|cc\|go\)$' |\
xargs -n 1\
awk\
'/\/\// { gsub(/.*\/\//, ""); print; } /\/\*/, /\*\// { gsub(/.*\/\*/, ""); gsub(/\*\/.*/, ""); }' |\
hunspell -d en_US -l |\
grep '^[[:upper:]]\{0,1\}[[:lower:]]\{1,\}$' |\
grep -v -e '^.\{1,4\}$' -e '^.\{16,\}$' |\
sort -f |\
uniq -c |\
awk '$1 == 1 { print $2; }'
Then, go through the results manually and fix the most obvious typos in
the non-vendored code.
Change-Id: I3cb5830a176850e1a0584b8a40b47bde7b260eae
Reviewed-on: https://go-review.googlesource.com/c/go/+/193848
Reviewed-by: Robert Griesemer <gri@golang.org>
A few examples (for accessing a slice of length 3):
s[-1] runtime error: index out of range [-1]
s[3] runtime error: index out of range [3] with length 3
s[-1:0] runtime error: slice bounds out of range [-1:]
s[3:0] runtime error: slice bounds out of range [3:0]
s[3:-1] runtime error: slice bounds out of range [:-1]
s[3:4] runtime error: slice bounds out of range [:4] with capacity 3
s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3
Note that in cases where there are multiple things wrong with the
indexes (e.g. s[3:-1]), we report one of those errors kind of
arbitrarily, currently the rightmost one.
An exhaustive set of examples is in issue30116[u].out in the CL.
The message text has the same prefix as the old message text. That
leads to slightly awkward phrasing but hopefully minimizes the chance
that code depending on the error text will break.
Increases the size of the go binary by 0.5% (amd64). The panic functions
take arguments in registers in order to keep the size of the compiled code
as small as possible.
Fixes#30116
Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/161477
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Work involved in getting a stack trace is divided between
runtime.Callers and runtime.CallersFrames.
Before this CL, runtime.Callers returns a pc per runtime frame.
runtime.CallersFrames is responsible for expanding a runtime frame
into potentially multiple user frames.
After this CL, runtime.Callers returns a pc per user frame.
runtime.CallersFrames just maps those to user frame info.
Entries in the result of runtime.Callers are now pcs
of the calls (or of the inline marks), not of the instruction
just after the call.
Fixes#29007Fixes#28640
Update #26320
Change-Id: I1c9567596ff73dc73271311005097a9188c3406f
Reviewed-on: https://go-review.googlesource.com/c/152537
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Go documentation style for boolean funcs is to say:
// Foo reports whether ...
func Foo() bool
(rather than "returns true if")
This CL also replaces 4 uses of "iff" with the same "reports whether"
wording, which doesn't lose any meaning, and will prevent people from
sending typo fixes when they don't realize it's "if and only if". In
the past I think we've had the typo CLs updated to just say "reports
whether". So do them all at once.
(Inspired by the addition of another "returns true if" in CL 146938
in fd_plan9.go)
Created with:
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" | grep -v vendor)
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" | grep -v vendor)
Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f
Reviewed-on: https://go-review.googlesource.com/c/147037
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This change creates the infrastructure for new lightweight atomics
primitives in runtime/internal/atomic:
- LoadAcq, for load-acquire
- StoreRel, for store-release
- CasRel, for Compare-and-Swap-release
and implements them for ppc64x. There is visible performance improvement
in producer-consumer scenarios, like BenchmarkChanProdCons*:
benchmark old ns/op new ns/op delta
BenchmarkChanProdCons0-48 2034 2034 +0.00%
BenchmarkChanProdCons10-48 1798 1608 -10.57%
BenchmarkChanProdCons100-48 1596 1585 -0.69%
BenchmarkChanProdConsWork0-48 2084 2046 -1.82%
BenchmarkChanProdConsWork10-48 1829 1668 -8.80%
BenchmarkChanProdConsWork100-48 1650 1650 +0.00%
Fixes#21348
Change-Id: I1f6ce377e4a0fe4bd7f5f775e8036f50070ad8db
Reviewed-on: https://go-review.googlesource.com/c/142277
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
prove is able to find 94 occurrences in std cmd where a divisor
can't have the value -1. The change removes
the extraneous fix-up code for these cases.
Fixes#25239
Change-Id: Ic184de971f47cc57c702eb72805b8e291c14035d
Reviewed-on: https://go-review.googlesource.com/c/130215
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Add generic, 386 and amd64 specific ops and SSA rules for multiplication
with overflow and branching based on overflow flags. Use these to intrinsify
runtime/internal/math.MulUinptr.
On amd64
mul, overflow := math.MulUintptr(a, b)
if overflow {
is lowered to two instructions:
MULQ SI
JO 0x10ee35c
No codegen tests as codegen can not currently test unexported internal runtime
functions.
amd64:
name old time/op new time/op delta
MulUintptr/small 1.16ns ± 5% 0.88ns ± 6% -24.36% (p=0.000 n=19+20)
MulUintptr/large 10.7ns ± 1% 1.1ns ± 1% -89.28% (p=0.000 n=17+19)
Change-Id: If60739a86f820e5044d677276c21df90d3c7a86a
Reviewed-on: https://go-review.googlesource.com/c/141820
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The previous CL introduced stack objects. This CL removes the old
ambiguously live liveness analysis. After this CL we're relying
on stack objects exclusively.
Update a bunch of liveness tests to reflect the new world.
Fixes#22350
Change-Id: I739b26e015882231011ce6bc1a7f426049e59f31
Reviewed-on: https://go-review.googlesource.com/c/134156
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Previously, pattern matching was good enough to achieve good performance
for the RotateLeft* functions, but the inlining cost for them was much
too high. Make RotateLeft* intrinsic on amd64 as a stop-gap for now to
reduce inlining costs.
This should be done (or at least looked at) for other architectures
as well.
Updates golang/go#17566
Change-Id: I6a106ff00b6c4e3f490650af3e083ed2be00c819
Reviewed-on: https://go-review.googlesource.com/132435
Run-TryBot: Andrew Bonventre <andybons@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Lack of a well-defined order between VarDef and related
address operations sometimes causes problems with store order
and write barrier transformations; glitches in the order are
made irreparable (by later optimizations) if the two parts of
the glitch straddle a split in the original block caused by
insertion of a write barrier diamond.
Fix this by creating a LocalAddr for addresses of locals
(what VarDef matters for) that takes a memory input to
help make the order explicit. Addr is modified to only
be legal for SB operand, so there is no overlap between
Addr and LocalAddr uses (there may be some downstream
cleanup from this).
Changes to generic.rules and rewrite.go ensure that codegen
tests continue to pass; CSE of LocalAddr is impaired, not
quite sure of the cost.
Fixes#26105.
Change-Id: Id4192b4440aa4e9d7ba54a465c456df9b530b515
Reviewed-on: https://go-review.googlesource.com/122483
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
ARMv8.1 has added new instruction (LDADDAL) for atomic memory operations. This
CL improves existing atomic add intrinsics with the new instruction. Since the
new instruction is only guaranteed to be present after ARMv8.1, we guard its
usage with a conditional on CPU feature.
Performance result on ARMv8.1 machine:
name old time/op new time/op delta
Xadd-224 1.05µs ± 6% 0.02µs ± 4% -98.06% (p=0.000 n=10+8)
Xadd64-224 1.05µs ± 3% 0.02µs ±13% -98.10% (p=0.000 n=9+10)
[Geo mean] 1.05µs 0.02µs -98.08%
Performance result on ARMv8.0 machine:
name old time/op new time/op delta
Xadd-46 538ns ± 1% 541ns ± 1% +0.62% (p=0.000 n=9+9)
Xadd64-46 505ns ± 1% 508ns ± 0% +0.48% (p=0.003 n=9+8)
[Geo mean] 521ns 524ns +0.55%
Change-Id: If4b5d8d0e2d6f84fe1492a4f5de0789910ad0ee9
Reviewed-on: https://go-review.googlesource.com/81877
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This comment needs updating after moving the "bounded" mark from
Aux to AuxInt.
Change-Id: I924fb22a81fffcd6944b93f0e3357c3aa2c4c49e
Reviewed-on: https://go-review.googlesource.com/111880
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The prove pass sometimes has bounds information
that later rewrite passes do not.
Use this information to mark shifts as bounded,
and then use that information to generate better code on amd64.
It may prove to be helpful on other architectures, too.
While here, coalesce the existing shift lowering rules.
This triggers 35 times building std+cmd. The full list is below.
Here's an example from runtime.heapBitsSetType:
if nb < 8 {
b |= uintptr(*p) << nb
p = add1(p)
} else {
nb -= 8
}
We now generate better code on amd64 for that left shift.
Updates #25087
vendor/golang_org/x/crypto/curve25519/mont25519_amd64.go:48:20: Proved Rsh8Ux64 bounded
runtime/mbitmap.go:1252:22: Proved Lsh64x64 bounded
runtime/mbitmap.go:1265:16: Proved Lsh64x64 bounded
runtime/mbitmap.go:1275:28: Proved Lsh64x64 bounded
runtime/mbitmap.go:1645:25: Proved Lsh64x64 bounded
runtime/mbitmap.go:1663:25: Proved Lsh64x64 bounded
runtime/mbitmap.go:1808:41: Proved Lsh64x64 bounded
runtime/mbitmap.go:1831:49: Proved Lsh64x64 bounded
syscall/route_bsd.go:227:23: Proved Lsh32x64 bounded
syscall/route_bsd.go:295:23: Proved Lsh32x64 bounded
syscall/route_darwin.go:40:23: Proved Lsh32x64 bounded
compress/bzip2/bzip2.go:384:26: Proved Lsh64x16 bounded
vendor/golang_org/x/net/route/address.go:370:14: Proved Lsh64x64 bounded
compress/flate/inflate.go:201:54: Proved Lsh64x64 bounded
math/big/prime.go:50:25: Proved Lsh64x64 bounded
vendor/golang_org/x/crypto/cryptobyte/asn1.go:464:43: Proved Lsh8x8 bounded
net/ip.go:87:21: Proved Rsh8Ux64 bounded
cmd/internal/goobj/read.go:267:23: Proved Lsh64x64 bounded
cmd/vendor/golang.org/x/arch/arm64/arm64asm/decode.go:534:27: Proved Lsh32x32 bounded
cmd/vendor/golang.org/x/arch/arm64/arm64asm/decode.go:544:27: Proved Lsh32x32 bounded
cmd/internal/obj/arm/asm5.go:1044:16: Proved Lsh32x64 bounded
cmd/internal/obj/arm/asm5.go:1065:10: Proved Lsh32x32 bounded
cmd/internal/obj/mips/obj0.go:1311:21: Proved Lsh32x64 bounded
cmd/compile/internal/syntax/scanner.go:352:23: Proved Lsh64x64 bounded
go/types/expr.go:222:36: Proved Lsh64x64 bounded
crypto/x509/x509.go:1626:9: Proved Rsh8Ux64 bounded
cmd/link/internal/loadelf/ldelf.go:823:22: Proved Lsh8x64 bounded
net/http/h2_bundle.go:1470:17: Proved Lsh8x8 bounded
net/http/h2_bundle.go:1477:46: Proved Lsh8x8 bounded
net/http/h2_bundle.go:1481:31: Proved Lsh64x8 bounded
cmd/compile/internal/ssa/rewriteARM64.go:18759:17: Proved Lsh64x64 bounded
cmd/compile/internal/ssa/sparsemap.go:70:23: Proved Lsh32x64 bounded
cmd/compile/internal/ssa/sparsemap.go:73:45: Proved Lsh32x64 bounded
Change-Id: I58bb72f3e6f12f6ac69be633ea7222c245438142
Reviewed-on: https://go-review.googlesource.com/109776
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
On amd64, Ctz must include special handling of zeros.
But the prove pass has enough information to detect whether the input
is non-zero, allowing a more efficient lowering.
Introduce new CtzNonZero ops to capture and use this information.
Benchmark code:
func BenchmarkVisitBits(b *testing.B) {
b.Run("8", func(b *testing.B) {
for i := 0; i < b.N; i++ {
x := uint8(0xff)
for x != 0 {
sink = bits.TrailingZeros8(x)
x &= x - 1
}
}
})
// and similarly so for 16, 32, 64
}
name old time/op new time/op delta
VisitBits/8-8 7.27ns ± 4% 5.58ns ± 4% -23.35% (p=0.000 n=28+26)
VisitBits/16-8 14.7ns ± 7% 10.5ns ± 4% -28.43% (p=0.000 n=30+28)
VisitBits/32-8 27.6ns ± 8% 19.3ns ± 3% -30.14% (p=0.000 n=30+26)
VisitBits/64-8 44.0ns ±11% 38.0ns ± 5% -13.48% (p=0.000 n=30+30)
Fixes#25077
Change-Id: Ie6e5bd86baf39ee8a4ca7cadcf56d934e047f957
Reviewed-on: https://go-review.googlesource.com/109358
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Currently, each architecture lowers OpConvert to an arch-specific
OpXXXconvert. This is silly because OpConvert means the same thing on
all architectures and is logically a no-op that exists only to keep
track of conversions to and from unsafe.Pointer. Furthermore, lowering
it makes it harder to recognize in other analyses, particularly
liveness analysis.
This CL eliminates the lowering of OpConvert, leaving it as the
generic op until code generation time.
The main complexity here is that we still need to register-allocate
OpConvert operations. Currently, each arch's lowered OpConvert
specifies all GP registers in its register mask. Ideally, OpConvert
wouldn't affect value homing at all, and we could just copy the home
of OpConvert's source, but this can potentially home an OpConvert in a
LocalSlot, which neither regalloc nor stackalloc expect. Rather than
try to disentangle this assumption from regalloc and stackalloc, we
continue to register-allocate OpConvert, but teach regalloc that
OpConvert can be allocated to any allocatable GP register.
For #24543.
Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6
Reviewed-on: https://go-review.googlesource.com/108496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
The way Value.AuxInt represents unsigned numbers is currently
documented in genericOps.go, which is not the most obvious place for
it. Move that documentation to Value.AuxInt. Furthermore, to make it
harder to use incorrectly, introduce a Value.AuxUnsigned accessor that
returns the zero-extended value of Value.AuxInt.
Change-Id: I85030c3c68761404058a430e0b1c7464591b2f42
Reviewed-on: https://go-review.googlesource.com/102597
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Add a bool to opInfo to indicate if an Op never results in any
instructions. This is a conservative approximation: some operations,
like Copy, may or may not generate code depending on their arguments.
I built the list by reading each arch's ssaGenValue function. Hopefully
I got them all.
Change-Id: I130b251b65f18208294e129bb7ddc3f91d57d31d
Reviewed-on: https://go-review.googlesource.com/97957
Reviewed-by: Keith Randall <khr@golang.org>
The first word of an interface is a pointer, but for the purposes
of GC we don't need to treat it as such.
1. If it is a non-empty interface, the pointer points to an itab
which is always in persistentalloc space.
2. If it is an empty interface, the pointer points to a _type.
a. If it is a compile-time-allocated type, it points into
the read-only data section.
b. If it is a reflect-allocated type, it points into the Go heap.
Reflect is responsible for keeping a reference to
the underlying type so it won't be GCd.
If we ever have a moving GC, we need to change this for 2b (as
well as scan itabs to update their itab._type fields).
Write barriers on the first word of interfaces have already been removed.
Change-Id: I643e91d7ac4de980ac2717436eff94097c65d959
Reviewed-on: https://go-review.googlesource.com/97518
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
The new RoundToEven function can be implemented as a single FIDBR
instruction on s390x.
name old time/op new time/op delta
RoundToEven 5.32ns ± 1% 0.86ns ± 1% -83.86% (p=0.000 n=10+10)
Change-Id: Iaf597e57a0d1085961701e3c75ff4f6f6dcebb5f
Reviewed-on: https://go-review.googlesource.com/74350
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This adds support for math Abs, Copysign to be instrinsics on ppc64x.
New instruction FCPSGN is added to generate fcpsgn. Some new
rules are added to improve the int<->float conversions that are
generated mainly due to the Float64bits and Float64frombits in
the math package. PPC64.rules is also modified as suggested
in the review for CL 63290.
Improvements:
benchmark old ns/op new ns/op delta
BenchmarkAbs-16 1.12 0.69 -38.39%
BenchmarkCopysign-16 1.30 0.93 -28.46%
BenchmarkNextafter32-16 9.34 8.05 -13.81%
BenchmarkFrexp-16 8.81 7.60 -13.73%
Others that used Copysign also saw smaller improvements.
I attempted to make this work using rules since that
seems to be preferred, but due to the use of Float64bits and
Float64frombits in these functions, several rules had to be added and
even then not all cases were matched. Using rules became too
complicated and seemed too fragile for these.
Updates #21390
Change-Id: Ia265da9a18355e08000818a4fba1a40e9e031995
Reviewed-on: https://go-review.googlesource.com/67130
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Add a compiler intrinsic for getcallersp. So we are able to get
rid of the argument (not done in this CL).
Change-Id: Ic38fda1c694f918328659ab44654198fb116668d
Reviewed-on: https://go-review.googlesource.com/69350
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
First step towards removing the mandatory argument for
getcallerpc, which solves certain problems for the runtime.
This might also slightly improve performance.
Intrinsic enabled on 386, amd64, amd64p32,
runtime asm implementation removed on those architectures.
Now-superfluous argument remains in getcallerpc signature
(for a future CL; non-386/amd64 asm funcs ignore it).
Added getcallerpc to the "not a real function" test
in dcl.go, that story is a little odd with respect to
unexported functions but that is not this CL.
Fixes#17327.
Change-Id: I5df1ad91f27ee9ac1f0dd88fa48f1329d6306c3e
Reviewed-on: https://go-review.googlesource.com/31851
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Ceil, Floor and Trunc are pre-existing intrinsics. Round is a new
function and has been added as an intrinsic in this CL. All of the
functions can be implemented as a single 'LOAD FP INTEGER'
instruction, FIDBR, on s390x.
name old time/op new time/op delta
Ceil 2.34ns ± 0% 0.85ns ± 0% -63.74% (p=0.000 n=5+4)
Floor 2.33ns ± 0% 0.85ns ± 1% -63.35% (p=0.008 n=5+5)
Round 4.23ns ± 0% 0.85ns ± 0% -79.89% (p=0.000 n=5+4)
Trunc 2.35ns ± 0% 0.85ns ± 0% -63.83% (p=0.029 n=4+4)
Change-Id: Idee7ba24a2899d12bf9afee4eedd6b4aaad3c510
Reviewed-on: https://go-review.googlesource.com/63890
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
We used to have {Arg,Auto,Extern}Symbol structs with which we wrapped
a *gc.Node or *obj.LSym before storing them in the Aux field
of an ssa.Value. This let the SSA part of the compiler distinguish
between autos and args, for example. We no longer need the wrappers
as we can query the underlying objects directly.
There was also some sloppy usage, where VarDef had a *gc.Node
directly in its Aux field, whereas the use of that variable had
that *gc.Node wrapped in an AutoSymbol. Thus the Aux fields didn't
match (using ==) when they probably should.
This sloppy usage cleanup is the only thing in the CL that changes the
generated code - we can get rid of some more unused auto variables if
the matching happens reliably.
Removing this wrapper also lets us get rid of the varsyms cache
(which was used to prevent wrapping the same *gc.Node twice).
Change-Id: I0dedf8f82f84bfee413d310342b777316bd1d478
Reviewed-on: https://go-review.googlesource.com/64452
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This implements trunc, floor, and ceil in the math package
as intrinsics on ppc64x. Significant improvement mainly due
to avoiding call overhead of args and return value.
BenchmarkCeil-16 5.95 0.69 -88.40%
BenchmarkFloor-16 5.95 0.69 -88.40%
BenchmarkTrunc-16 5.82 0.69 -88.14%
Updates #21390
Change-Id: I951e182694f6e0c431da79c577272b81fb0ebad0
Reviewed-on: https://go-review.googlesource.com/54654
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: David Chase <drchase@google.com>
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
The experiment "clobberdead" clobbers all pointer fields that the
compiler thinks are dead, just before and after every safepoint.
Useful for debugging the generation of live pointer bitmaps.
Helped find the following issues:
Update #15936
Update #16026
Update #16095
Update #18860
Change-Id: Id1d12f86845e3d93bae903d968b1eac61fc461f9
Reviewed-on: https://go-review.googlesource.com/23924
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Popcount instructions on amd64 are not guaranteed to be
present, so we must guard their call. Rewrite rules can't
generate control flow at the moment, so the intrinsifier
needs to generate that code.
name old time/op new time/op delta
OnesCount-8 2.47ns ± 5% 1.04ns ± 2% -57.70% (p=0.000 n=10+10)
OnesCount16-8 1.05ns ± 1% 0.78ns ± 0% -25.56% (p=0.000 n=9+8)
OnesCount32-8 1.63ns ± 5% 1.04ns ± 2% -35.96% (p=0.000 n=10+10)
OnesCount64-8 2.45ns ± 0% 1.04ns ± 1% -57.55% (p=0.000 n=6+10)
Update #18616
Change-Id: I4aff2cc9aa93787898d7b22055fe272a7cf95673
Reviewed-on: https://go-review.googlesource.com/38320
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Note that this is a redo of an undo of the original buggy CL 38666.
We have lots of rewrite rules that vary only in the fact that
we have 2 versions for the 2 different orderings of various
commuting ops. For example:
(ADDL x (MOVLconst [c])) -> (ADDLconst [c] x)
(ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x)
It can get unwieldly quickly, especially when there is more than
one commuting op in a rule.
Our existing "fix" for this problem is to have rules that
canonicalize the operations first. For example:
(Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x)
Subsequent rules can then assume if there is a constant arg to Eq64,
it will be the first one. This fix kinda works, but it is fragile and
only works when we remember to include the required extra rules.
The fundamental problem is that the rule matcher doesn't
know anything about commuting ops. This CL fixes that fact.
We already have information about which ops commute. (The register
allocator takes advantage of commutivity.) The rule generator now
automatically generates multiple rules for a single source rule when
there are commutative ops in the rule. We can now drop all of our
almost-duplicate source-level rules and the canonicalization rules.
I have some CLs in progress that will be a lot less verbose when
the rule generator handles commutivity for me.
I had to reorganize the load-combining rules a bit. The 8-way OR rules
generated 128 different reorderings, which was causing the generator
to put too much code in the rewrite*.go files (the big ones were going
from 25K lines to 132K lines). Instead I reorganized the rules to
combine pairs of loads at a time. The generated rule files are now
actually a bit (5%) smaller.
Make.bash times are ~unchanged.
Compiler benchmarks are not observably different. Probably because
we don't spend much compiler time in rule matching anyway.
I've also done a pass over all of our ops adding commutative markings
for ops which hadn't had them previously.
Fixes#18292
Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805
Reviewed-on: https://go-review.googlesource.com/38801
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This reverts commit 041ecb697f.
Reason for revert: Not working on S390x and some 386 archs.
I have a guess why the S390x is failing. No clue on the 386 yet.
Revert until I can figure it out.
Change-Id: I64f1ce78fa6d1037ebe7ee2a8a8107cb4c1db70c
Reviewed-on: https://go-review.googlesource.com/38790
Reviewed-by: Keith Randall <khr@golang.org>
We have lots of rewrite rules that vary only in the fact that
we have 2 versions for the 2 different orderings of various
commuting ops. For example:
(ADDL x (MOVLconst [c])) -> (ADDLconst [c] x)
(ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x)
It can get unwieldly quickly, especially when there is more than
one commuting op in a rule.
Our existing "fix" for this problem is to have rules that
canonicalize the operations first. For example:
(Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x)
Subsequent rules can then assume if there is a constant arg to Eq64,
it will be the first one. This fix kinda works, but it is fragile and
only works when we remember to include the required extra rules.
The fundamental problem is that the rule matcher doesn't
know anything about commuting ops. This CL fixes that fact.
We already have information about which ops commute. (The register
allocator takes advantage of commutivity.) The rule generator now
automatically generates multiple rules for a single source rule when
there are commutative ops in the rule. We can now drop all of our
almost-duplicate source-level rules and the canonicalization rules.
I have some CLs in progress that will be a lot less verbose when
the rule generator handles commutivity for me.
I had to reorganize the load-combining rules a bit. The 8-way OR rules
generated 128 different reorderings, which was causing the generator
to put too much code in the rewrite*.go files (the big ones were going
from 25K lines to 132K lines). Instead I reorganized the rules to
combine pairs of loads at a time. The generated rule files are now
actually a bit (5%) smaller.
[Note to reviewers: check these carefully. Most of the other rule
changes are trivial.]
Make.bash times are ~unchanged.
Compiler benchmarks are not observably different. Probably because
we don't spend much compiler time in rule matching anyway.
I've also done a pass over all of our ops adding commutative markings
for ops which hadn't had them previously.
Fixes#18292
Change-Id: I999b1307272e91965b66754576019dedcbe7527a
Reviewed-on: https://go-review.googlesource.com/38666
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
I don't know that it exists for any other architectures.
Update #18616
Change-Id: Idfe5dee251764d32787915889ec0be4bebc5be24
Reviewed-on: https://go-review.googlesource.com/38323
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
name old time/op new time/op delta
LeadingZeros-4 2.00ns ± 0% 1.34ns ± 1% -33.02% (p=0.000 n=8+10)
LeadingZeros16-4 1.62ns ± 0% 1.57ns ± 0% -3.09% (p=0.001 n=8+9)
LeadingZeros32-4 2.14ns ± 0% 1.48ns ± 0% -30.84% (p=0.002 n=8+10)
LeadingZeros64-4 2.06ns ± 1% 1.33ns ± 0% -35.08% (p=0.000 n=8+8)
8-bit args is a special case - the Go code is really fast because
it is just a single table lookup. So I've disabled that for now.
Intrinsics were actually slower:
LeadingZeros8-4 1.22ns ± 3% 1.58ns ± 1% +29.56% (p=0.000 n=10+10)
Update #18616
Change-Id: Ia9c289b9ba59c583ea64060470315fd637e814cf
Reviewed-on: https://go-review.googlesource.com/38311
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Remove size AuxInt in Store, and alignment in Move/Zero. We still
pass size AuxInt to Move/Zero, as it is used for partial Move/Zero
lowering (e.g. cmd/compile/internal/ssa/gen/386.rules:288).
SizeAndAlign is gone.
Passes "toolstash -cmp" on std.
Change-Id: I1ca34652b65dd30de886940e789fcf41d521475d
Reviewed-on: https://go-review.googlesource.com/38150
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>