In my experimentation, I had found that most non-SSAable expressions were
converted to autotmp variables during AST evaluation. However, this was not true
generally, as witnessed by issue #35213, which has a non-SSAable field reference
of a struct that is not converted to an autotmp. So, I fixed openDeferSave() to
handle non-SSAable nodes more generally, and make sure that these non-SSAable
expressions are not evaluated more than once (which could incorrectly repeat side
effects).
Fixes#35213
Change-Id: I8043d5576b455e94163599e930ca0275e550d594
Reviewed-on: https://go-review.googlesource.com/c/go/+/203888
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
For #10958, #24543, but makes sense on its own.
Change-Id: I2a87dab66b82a1863e4b6512b1f8def51463ce2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/203284
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).
When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.
In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).
I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().
The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).
Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ]
With normal (stack-allocated) defers only: 35.4 ns/op
With open-coded defers: 5.6 ns/op
Cost of function call alone (remove defer keyword): 4.4 ns/op
Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09%
The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.
The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:
Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
Without open-coded defers: 62.0 ns/op
With open-coded defers: 255 ns/op
A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:
CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
Without open-coded defers: 443 ns/op
With open-coded defers: 347 ns/op
Updates #14939 (defer performance)
Updates #34481 (design doc)
Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/202340
Reviewed-by: Austin Clements <austin@google.com>
CL 137156 introduces an intrinsic on AMD64 that executes vfmadd231sd
when feature detection is successful. However, because floating-point
isn't allowed in note handler, the builder disables SSE instructions,
and fails when attempting to execute this instruction. This change
disables FMA on plan9 to immediately use the software fallback.
Fixes#35063.
Change-Id: I87d8f0995bd2f15013d203e618938f5079c9eed2
Reviewed-on: https://go-review.googlesource.com/c/go/+/202617
Reviewed-by: Keith Randall <khr@golang.org>
This change introduces an arm intrinsic that generates the FMULAD
instruction for the fused-multiply-add operation on systems that
support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite
rule translates the generic intrinsic to FMULAD.
Updates #25819.
Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/142117
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
To permit ssa-level optimization, this change introduces an amd64 intrinsic
that generates the VFMADD231SD instruction for the fused-multiply-add
operation on systems that support it. System support is detected via
cpu.X86.HasFMA. A rewrite rule can then translate the generic ssa intrinsic
("Fma") to VFMADD231SD.
The benchmark compares the software implementation (old) with the intrinsic
(new).
name old time/op new time/op delta
Fma-4 27.2ns ± 1% 1.0ns ± 9% -96.48% (p=0.008 n=5+5)
Updates #25819.
Change-Id: I966655e5f96817a5d06dff5942418a3915b09584
Reviewed-on: https://go-review.googlesource.com/c/go/+/137156
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
In order to make math.FMA a compiler intrinsic for ISAs like ARM64,
PPC64[le], and S390X, a generic 3-argument opcode "Fma" is provided and
rewritten as
ARM64: (Fma x y z) -> (FMADDD z x y)
PPC64: (Fma x y z) -> (FMADD x y z)
S390X: (Fma x y z) -> (FMADD z x y)
Updates #25819.
Change-Id: Ie5bc628311e6feeb28ddf9adaa6e702c8c291efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/131959
Run-TryBot: Akhil Indurti <aindurti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).
When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.
In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).
I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().
The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).
Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ]
With normal (stack-allocated) defers only: 35.4 ns/op
With open-coded defers: 5.6 ns/op
Cost of function call alone (remove defer keyword): 4.4 ns/op
Text size increase (including funcdata) for go cmd without/with open-coded defers: 0.09%
The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.
The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:
Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
Without open-coded defers: 62.0 ns/op
With open-coded defers: 255 ns/op
A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:
CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
Without open-coded defers: 443 ns/op
With open-coded defers: 347 ns/op
Updates #14939 (defer performance)
Updates #34481 (design doc)
Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28
Reviewed-on: https://go-review.googlesource.com/c/go/+/190098
Reviewed-by: Austin Clements <austin@google.com>
The Go spec requires
If a deferred function value evaluates to nil, execution
panics when the function is invoked, not when the "defer"
statement is executed.
On Wasm and AIX, currently we actually emit a nil check at the
point of defer statement, which will make it panic too early.
This CL fixes this.
Also, on Wasm, now the nil function will be passed through
deferreturn to jmpdefer, which does an explicit nil check and
calls sigpanic if it is nil. This sigpanic, being called from
assembly, is ABI0. So change the assembler backend to also
handle sigpanic in ABI0.
Fixes#34926.
Updates #8047.
Change-Id: I28489a571cee36d2aef041f917b8cfdc31d557d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/201297
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The IsClosureVar, IsOutputParamHeapAddr, Assigned, Addrtaken,
InlFormal, and InlLocal flags are only interesting for ONAME nodes, so
it's better to set these flags on Name.flags instead of Node.flags.
Two caveats though:
1. Previously, we would set Assigned and Addrtaken on the entire
expression tree involved in an assignment or addressing operation.
However, the rest of the compiler only actually cares about knowing
whether the underlying ONAME (if any) was assigned/addressed.
2. This actually requires bumping Name.flags from bitset8 to bitset16,
whereas it doesn't allow shrinking Node.flags any. However, Name has
some trailing padding bytes, so expanding Name.flags doesn't cost any
memory.
Passes toolstash-check.
Change-Id: I7775d713566a38d5b9723360b1659b79391744c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/200898
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This flag is supposed to indicate whether the expression is
"addressable"; but in practice, we infer this from other
attributes about the expression (e.g., n.Op and n.Class()).
Passes toolstash-check.
Change-Id: I19352ca07ab5646e232d98e8a7c1c9aec822ddd0
Reviewed-on: https://go-review.googlesource.com/c/go/+/200897
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Calls are code-generated in an alternate path that inherits
its positions from values, not from *SSAGenState. The
default position on *SSAGenState was marked as not-a-statement,
but this was not applied to the value itself, leading to
spurious "is statement" marks in the output (convention:
after code generation in the compiler, everything is either
definitely a statement or definitely not a statement, nothing
is in the undetermined state).
This CL causes a 35 statement regression in ssa/stmtlines_test.
This is down from the earlier 150 because of all the other
CLs preceding this one that deal with the root causes of the
missing lines (repeated lines on nested calls hid missing lines).
This also removes some line repeats from ssa/debug_test.
Change-Id: Ie9a507bd5447e906b35bbd098e3295211df2ae01
Reviewed-on: https://go-review.googlesource.com/c/go/+/188018
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
This CL changes cmd/compile to use Node.Right instead of
Node.Rlist for OAS2FUNC/OAS2RECV/OAS2MAPR/OAS2DOTTYPE nodes.
Fixes#32293
Change-Id: I4c9d9100be2d98d15e016797f934f64d385f5faa
Reviewed-on: https://go-review.googlesource.com/c/go/+/197817
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This change adds an intrinsic for Mul64 on s390x. To achieve that,
a new assembly instruction, MLGR, is introduced in s390x/asmz.go. This assembly
instruction directly uses an existing instruction on Z and supports multiplication
of two 64 bit unsigned integer and stores the result in two separate registers.
In this case, we require the multiplcand to be stored in register R3 and
the output result (the high and low 64 bit of the product) to be stored in
R2 and R3 respectively.
A test case is also added.
Benchmark:
name old time/op new time/op delta
Mul-18 11.1ns ± 0% 1.4ns ± 0% -87.39% (p=0.002 n=8+10)
Mul32-18 2.07ns ± 0% 2.07ns ± 0% ~ (all equal)
Mul64-18 11.1ns ± 1% 1.4ns ± 0% -87.42% (p=0.000 n=10+10)
Change-Id: Ieca6ad1f61fff9a48a31d50bbd3f3c6d9e6675c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/194572
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Use the following (suboptimal) script to obtain a list of possible
typos:
#!/usr/bin/env sh
set -x
git ls-files |\
grep -e '\.\(c\|cc\|go\)$' |\
xargs -n 1\
awk\
'/\/\// { gsub(/.*\/\//, ""); print; } /\/\*/, /\*\// { gsub(/.*\/\*/, ""); gsub(/\*\/.*/, ""); }' |\
hunspell -d en_US -l |\
grep '^[[:upper:]]\{0,1\}[[:lower:]]\{1,\}$' |\
grep -v -e '^.\{1,4\}$' -e '^.\{16,\}$' |\
sort -f |\
uniq -c |\
awk '$1 == 1 { print $2; }'
Then, go through the results manually and fix the most obvious typos in
the non-vendored code.
Change-Id: I3cb5830a176850e1a0584b8a40b47bde7b260eae
Reviewed-on: https://go-review.googlesource.com/c/go/+/193848
Reviewed-by: Robert Griesemer <gri@golang.org>
This CL detangles the hairy mess that was convlit+defaultlit. In
particular, it makes the following changes:
1. convlit1 now follows the standard typecheck behavior of setting
"n.Type = nil" if there's an error. Notably, this means for a lot of
test cases, we now avoid reporting useless follow-on error messages.
For example, after reporting that "1 << s + 1.0" has an invalid shift,
we no longer also report that it can't be assigned to string.
2. Previously, assignconvfn had some extra logic for trying to
suppress errors from convlit/defaultlit so that it could provide its
own errors with better context information. Instead, this extra
context information is now passed down into convlit1 directly.
3. Relatedly, this CL also removes redundant calls to defaultlit prior
to assignconv. As a consequence, when an expression doesn't make sense
for a particular assignment (e.g., assigning an untyped string to an
integer), the error messages now say "untyped string" instead of just
"string". This is more consistent with go/types behavior.
4. defaultlit2 is now smarter about only trying to convert pairs of
untyped constants when it's likely to succeed. This allows us to
report better error messages for things like 3+"x"; instead of "cannot
convert 3 to string" we now report "mismatched types untyped number
and untyped string".
Passes toolstash-check.
Change-Id: I26822a02dc35855bd0ac774907b1cf5737e91882
Reviewed-on: https://go-review.googlesource.com/c/go/+/187657
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Assinging to 1-element array/1-field struct variable is considered clobbering
the whole variable. By emitting OpVarDef in this case, liveness analysis
can now know the variable is redefined.
Also, the isfat is not necessary anymore, and will be removed in follow up CL.
Fixes#33916
Change-Id: Iece0d90b05273f333d59d6ee5b12ee7dc71908c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/192979
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This CL reverts CL 192097 and fixes the issue in CL 189277.
Change-Id: Icd271262e1f5019a8e01c91f91c12c1261eeb02b
Reviewed-on: https://go-review.googlesource.com/c/go/+/192519
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This CL optimizes math.bits.TrailingZeros16 on 386 with
a pair of BSFL and ORL instrcutions.
The case TrailingZeros16-4 of the benchmark test in
math/bits shows big improvement.
name old time/op new time/op delta
TrailingZeros16-4 1.55ns ± 1% 0.87ns ± 1% -43.87% (p=0.000 n=50+49)
Change-Id: Ia899975b0e46f45dcd20223b713ed632bc32740b
Reviewed-on: https://go-review.googlesource.com/c/go/+/189277
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
state and ssafn both have their own Fatalf, so use them instead of
global Fatalf.
Updates #19683
Change-Id: Ie02a961d4285ab0a3f3b8d889a5b498d926ed567
Reviewed-on: https://go-review.googlesource.com/c/go/+/188539
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This reverts CL 180761
Reason for revert: Reinstate the stack-allocated defer CL.
There was nothing wrong with the CL proper, but stack allocation of defers exposed two other issues.
Issue #32477: Fix has been submitted as CL 181258.
Issue #32498: Possible fix is CL 181377 (not submitted yet).
Change-Id: I32b3365d5026600069291b068bbba6cb15295eb3
Reviewed-on: https://go-review.googlesource.com/c/go/+/181378
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The z/Architecture does not guarantee that a load following a store
will not be reordered with that store, unless they access the same
address. Therefore if we want to ensure the sequential consistency
of atomic loads and stores we need to perform serialization
operations after atomic stores.
We do not need to serialize in the runtime when using StoreRel[ease]
and LoadAcq[uire]. The z/Architecture already provides sufficient
ordering guarantees for these operations.
name old time/op new time/op delta
AtomicLoad64-16 0.51ns ± 0% 0.51ns ± 0% ~ (all equal)
AtomicStore64-16 0.51ns ± 0% 0.60ns ± 9% +16.47% (p=0.000 n=17+20)
AtomicLoad-16 0.51ns ± 0% 0.51ns ± 0% ~ (all equal)
AtomicStore-16 0.51ns ± 0% 0.60ns ± 9% +16.50% (p=0.000 n=18+20)
Fixes#32428.
Change-Id: I88d19a4010c46070e4fff4b41587efe4c628d4d9
Reviewed-on: https://go-review.googlesource.com/c/go/+/180439
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This reverts commit fff4f599fe.
Reason for revert: Seems to still have issues around GC.
Fixes#32452
Change-Id: Ibe7af629f9ad6a3d5312acd7b066123f484da7f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/180761
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
When a defer is executed at most once in a function body,
we can allocate the defer record for it on the stack instead
of on the heap.
This should make defers like this (which are very common) faster.
This optimization applies to 363 out of the 370 static defer sites
in the cmd/go binary.
name old time/op new time/op delta
Defer-4 52.2ns ± 5% 36.2ns ± 3% -30.70% (p=0.000 n=10+10)
Fixes#6980
Update #14939
Change-Id: I697109dd7aeef9e97a9eeba2ef65ff53d3ee1004
Reviewed-on: https://go-review.googlesource.com/c/go/+/171758
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Some runtime functions, like getcallerpc/sp, don't have Go or
assembly implementations and have to be intrinsified. Make sure
they are, even if intrinsics are disabled.
This makes "go build -gcflags=all=-d=ssa/intrinsics/off hello.go"
work.
Change-Id: I77caaed7715d3ca7ffef68a3cdc9357f095c6b9f
Reviewed-on: https://go-review.googlesource.com/c/go/+/179897
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This CL adds intrinsics for the 64-bit addition and subtraction
functions in math/bits. These intrinsics use the condition code
to propagate the carry or borrow bit.
To make the carry chains more efficient I've removed the
'clobberFlags' property from most of the load and store
operations. Originally these ops did clobber flags when using
offsets that didn't fit in a signed 20-bit integer, however
that is no longer true.
As with other platforms the intrinsics are faster when executed
in a chain rather than a loop because currently we need to spill
and restore the carry bit between each loop iteration. We may
be able to reduce the need to do this on s390x (e.g. by using
compare-and-branch instructions that do not clobber flags) in the
future.
name old time/op new time/op delta
Add64 1.21ns ± 2% 2.03ns ± 2% +67.18% (p=0.000 n=7+10)
Add64multiple 2.98ns ± 3% 1.03ns ± 0% -65.39% (p=0.000 n=10+9)
Sub64 1.23ns ± 4% 2.03ns ± 1% +64.85% (p=0.000 n=10+10)
Sub64multiple 3.73ns ± 4% 1.04ns ± 1% -72.28% (p=0.000 n=10+8)
Change-Id: I913bbd5e19e6b95bef52f5bc4f14d6fe40119083
Reviewed-on: https://go-review.googlesource.com/c/go/+/174303
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
In the statement x = a[i], the index panic should appear to come from
the line number of the '['. Previous to this CL we sometimes used the
line number of the '=' instead.
Fixes#29504
Change-Id: Ie718fd303c1ac2aee33e88d52c9ba9bcf220dea1
Reviewed-on: https://go-review.googlesource.com/c/go/+/174617
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change creates an intrinsic for Add64 for ppc64x and adds a
testcase for it.
name old time/op new time/op delta
Add64-160 1.90ns ±40% 2.29ns ± 0% ~ (p=0.119 n=5+5)
Add64multiple-160 6.69ns ± 2% 2.45ns ± 4% -63.47% (p=0.016 n=4+5)
Change-Id: I9abe6fb023fdf62eea3c9b46a1820f60bb0a7f97
Reviewed-on: https://go-review.googlesource.com/c/go/+/173758
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
In 31618, we end up comparing the is-stmt-ness of positions
to repurpose real instructions as inline marks. If the is-stmt-ness
doesn't match, we end up not being able to remove the inline mark.
Always use statement-full positions to do the matching, so we
always find a match if there is one.
Also always use positions that are statements for inline marks.
Fixes#31618
Change-Id: Idaf39bdb32fa45238d5cd52973cadf4504f947d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/173324
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Though there is variation in the spelling of canceled,
cancellation is always spelled with a double l.
Reference: https://www.grammarly.com/blog/canceled-vs-cancelled/
Change-Id: I240f1a297776c8e27e74f3eca566d2bc4c856f2f
Reviewed-on: https://go-review.googlesource.com/c/go/+/170060
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This change is mostly cosmetic.
OINDREGSP was used only for reading the results of a function call.
In recognition of that fact, rename it to ORESULT.
Along the way, trim down our handling of it to the bare minimum,
and rely on the increased clarity of ORESULT to inline nodarg.
Passes toolstash-check.
Change-Id: I25b177df4ea54a8e94b1698d044c297b7e453c64
Reviewed-on: https://go-review.googlesource.com/c/go/+/170705
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This opcode was only used to mark unreachable code for plive to use.
plive now uses the SSA representation, so it knows locations are
unreachable because they are ends of Exit blocks. It doesn't need
these opcodes any more.
These opcodes actually used space in the binary, 2 bytes per undef
on x86 and more for other archs.
Makes the amd64 go binary 0.2% smaller.
Change-Id: I64c84c35db7c7949617a3a5830f09c8e5fcd2620
Reviewed-on: https://go-review.googlesource.com/c/go/+/171058
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Tidy the code up a little bit to move variable definitions closer
to uses, prefer early return to else branches and some other minor
tweaks.
I'd like to make some more changes to this code in the near future
and this CL should make those changes cleaner.
Change-Id: Ie7d7f2e4bb1e670347941e255c9cdc1703282db5
Reviewed-on: https://go-review.googlesource.com/c/go/+/170120
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
ssa/debug_test.go already had a step limit; this exposes
it to individual tests, and it is then set low for the
infinite loop tests.
That however is not enough; in an infinite loop debuggers
see an unchanging line number, and therefore keep trying
until they see a different one. To do this, the concept
of a "bogus" line number is introduced, and on output
single-instruction infinite loops are detected and a
hardware nop with correct line number is inserted into
the loop; the branch itself receives a bogus line number.
This breaks up the endless stream of same line number and
causes both gdb and delve to not hang; Delve complains
about the incorrect line number while gdb does
a sort of odd step-to-nowhere that then steps back
to the loop. Since repeats are suppressed in the reference
file, a single line is shown there.
(The wrong line number mentioned in previous message
was an artifact of debug_test.go, not Delve, and is now
fixed.)
The bogus line number exposed in Delve is less than
wonderful, but compared to hanging, it is better.
Fixes#30664.
Change-Id: I30c927cf8869a84c6c9b84033ee44d7044aab552
Reviewed-on: https://go-review.googlesource.com/c/go/+/168477
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Instead of always inserting a nop to use as the target of an inline
mark, see if we can instead find an instruction we're issuing anyway
with the correct line number, and use that instruction. That way, we
don't need to issue a nop.
Makes cmd/go 0.3% smaller.
Update #29571
Change-Id: If6cfc93ab3352ec2c6e0878f8074a3bf0786b2f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/158021
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
This is part of a general effort to shrink walk.
In an ideal world, we'd have an SSA op for allocation,
but we don't yet have a good mechanism for introducing
function calling during SSA compilation.
In the meantime, SSA conversion is a better place for it.
This also makes it easier to introduce new optimizations;
instead of doing the typecheck walk dance,
we can simply write what we want the backend to do.
I introduced a new opcode in this change because:
(a) It avoids a class of bugs involving correctly detecting
whether this ONEW is a "before walk" ONEW or an "after walk" ONEW.
It also means that using ONEW or ONEWOBJ in the wrong context
will generally result in a faster failure.
(b) Opcodes are cheap.
(c) It provides a better place to put documentation.
This change also is also marginally more performant:
name old alloc/op new alloc/op delta
Template 39.1MB ± 0% 39.0MB ± 0% -0.14% (p=0.008 n=5+5)
Unicode 28.4MB ± 0% 28.4MB ± 0% ~ (p=0.421 n=5+5)
GoTypes 132MB ± 0% 132MB ± 0% -0.23% (p=0.008 n=5+5)
Compiler 608MB ± 0% 607MB ± 0% -0.25% (p=0.008 n=5+5)
SSA 2.04GB ± 0% 2.04GB ± 0% -0.01% (p=0.008 n=5+5)
Flate 24.4MB ± 0% 24.3MB ± 0% -0.13% (p=0.008 n=5+5)
GoParser 29.3MB ± 0% 29.1MB ± 0% -0.54% (p=0.008 n=5+5)
Reflect 84.8MB ± 0% 84.7MB ± 0% -0.21% (p=0.008 n=5+5)
Tar 36.7MB ± 0% 36.6MB ± 0% -0.10% (p=0.008 n=5+5)
XML 48.7MB ± 0% 48.6MB ± 0% -0.24% (p=0.008 n=5+5)
[Geo mean] 85.0MB 84.8MB -0.19%
name old allocs/op new allocs/op delta
Template 383k ± 0% 382k ± 0% -0.26% (p=0.008 n=5+5)
Unicode 341k ± 0% 341k ± 0% ~ (p=0.579 n=5+5)
GoTypes 1.37M ± 0% 1.36M ± 0% -0.39% (p=0.008 n=5+5)
Compiler 5.59M ± 0% 5.56M ± 0% -0.49% (p=0.008 n=5+5)
SSA 16.9M ± 0% 16.9M ± 0% -0.03% (p=0.008 n=5+5)
Flate 238k ± 0% 238k ± 0% -0.23% (p=0.008 n=5+5)
GoParser 306k ± 0% 303k ± 0% -0.93% (p=0.008 n=5+5)
Reflect 990k ± 0% 987k ± 0% -0.33% (p=0.008 n=5+5)
Tar 356k ± 0% 355k ± 0% -0.20% (p=0.008 n=5+5)
XML 444k ± 0% 442k ± 0% -0.45% (p=0.008 n=5+5)
[Geo mean] 848k 845k -0.33%
Change-Id: I2c36003a7cbf71b53857b7de734852b698f49310
Reviewed-on: https://go-review.googlesource.com/c/go/+/167957
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This CL instrinsifies Add64 with arm64 instruction sequence ADDS, ADCS
and ADC, and optimzes the case of carry chains.The CL also changes the
test code so that the intrinsic implementation can be tested.
Benchmarks:
name old time/op new time/op delta
Add-224 2.500000ns +- 0% 2.090000ns +- 4% -16.40% (p=0.000 n=9+10)
Add32-224 2.500000ns +- 0% 2.500000ns +- 0% ~ (all equal)
Add64-224 2.500000ns +- 0% 1.577778ns +- 2% -36.89% (p=0.000 n=10+9)
Add64multiple-224 6.000000ns +- 0% 2.000000ns +- 0% -66.67% (p=0.000 n=10+10)
Change-Id: I6ee91c9a85c16cc72ade5fd94868c579f16c7615
Reviewed-on: https://go-review.googlesource.com/c/go/+/159017
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
A few examples (for accessing a slice of length 3):
s[-1] runtime error: index out of range [-1]
s[3] runtime error: index out of range [3] with length 3
s[-1:0] runtime error: slice bounds out of range [-1:]
s[3:0] runtime error: slice bounds out of range [3:0]
s[3:-1] runtime error: slice bounds out of range [:-1]
s[3:4] runtime error: slice bounds out of range [:4] with capacity 3
s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3
Note that in cases where there are multiple things wrong with the
indexes (e.g. s[3:-1]), we report one of those errors kind of
arbitrarily, currently the rightmost one.
An exhaustive set of examples is in issue30116[u].out in the CL.
The message text has the same prefix as the old message text. That
leads to slightly awkward phrasing but hopefully minimizes the chance
that code depending on the error text will break.
Increases the size of the go binary by 0.5% (amd64). The panic functions
take arguments in registers in order to keep the size of the compiled code
as small as possible.
Fixes#30116
Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/161477
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>