Link.Plists never contained more than one Plist, and sometimes none.
Passing around the Plist being worked on is straightforward and makes
the data flow easier to follow.
Change-Id: I79cb30cb2bd3d319fdbb1dfa5d35b27fcb748e5c
Reviewed-on: https://go-review.googlesource.com/37169
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
runtime.memclr* functions have signatures
func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
func memclrHasPointers(ptr unsafe.Pointer, n uintptr)
Update compiler's copy. Also teach gc/mkbuiltin.go to handle
unsafe.Pointer. The import statement and its support is not
really necessary, but just to make it look like real Go code.
Fixes#19185.
Change-Id: I251d02571fde2716d4727e31e04d56ec04b6f22a
Reviewed-on: https://go-review.googlesource.com/37257
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Explcitly block fused multiply-add pattern matching when a cast is used
after the multiplication, for example:
- (a * b) + c // can emit fused multiply-add
- float64(a * b) + c // cannot emit fused multiply-add
float{32,64} and complex{64,128} casts of matching types are now kept
as OCONV operations rather than being replaced with OCONVNOP operations
because they now imply a rounding operation (and therefore aren't a
no-op anymore).
Operations (for example, multiplication) on complex types may utilize
fused multiply-add and -subtract instructions internally. There is no
way to disable this behavior at the moment.
Improves the performance of the floating point implementation of
poly1305:
name old speed new speed delta
64 246MB/s ± 0% 275MB/s ± 0% +11.48% (p=0.000 n=10+8)
1K 312MB/s ± 0% 357MB/s ± 0% +14.41% (p=0.000 n=10+10)
64Unaligned 246MB/s ± 0% 274MB/s ± 0% +11.43% (p=0.000 n=10+10)
1KUnaligned 312MB/s ± 0% 357MB/s ± 0% +14.39% (p=0.000 n=10+8)
Updates #17895.
Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16
Reviewed-on: https://go-review.googlesource.com/36963
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This is the escape analysis analog of CL 37499.
Fixes#12397Fixes#16871
The only "moved to heap" decisions eliminated by this
CL in std+cmd are:
cmd/compile/internal/gc/const.go:1514: moved to heap: ac
cmd/compile/internal/gc/const.go:1515: moved to heap: bd
cmd/compile/internal/gc/const.go:1516: moved to heap: bc
cmd/compile/internal/gc/const.go:1517: moved to heap: ad
cmd/compile/internal/gc/const.go:1546: moved to heap: ac
cmd/compile/internal/gc/const.go:1547: moved to heap: bd
cmd/compile/internal/gc/const.go:1548: moved to heap: bc
cmd/compile/internal/gc/const.go:1549: moved to heap: ad
cmd/compile/internal/gc/const.go:1550: moved to heap: cc_plus
cmd/compile/internal/gc/export.go:162: moved to heap: copy
cmd/compile/internal/gc/mpfloat.go:66: moved to heap: b
cmd/compile/internal/gc/mpfloat.go:97: moved to heap: b
Change-Id: I0d420b69c84a41ba9968c394e8957910bab5edea
Reviewed-on: https://go-review.googlesource.com/37508
Reviewed-by: David Chase <drchase@google.com>
Keep liveness bit vectors as simple live-variable vectors during
liveness analysis. We can defer expanding them into runtime heap
bitmaps until we're actually writing out the symbol data, and then we
only need temporary memory to expand one bitmap at a time.
This is logically cleaner (e.g., we no longer depend on stack frame
layout during analysis) and saves a little bit on allocations.
name old alloc/op new alloc/op delta
Template 41.4MB ± 0% 41.3MB ± 0% -0.28% (p=0.000 n=60+60)
Unicode 32.6MB ± 0% 32.6MB ± 0% -0.11% (p=0.000 n=59+60)
GoTypes 119MB ± 0% 119MB ± 0% -0.35% (p=0.000 n=60+59)
Compiler 483MB ± 0% 481MB ± 0% -0.47% (p=0.000 n=59+60)
name old allocs/op new allocs/op delta
Template 381k ± 1% 380k ± 1% -0.32% (p=0.000 n=60+60)
Unicode 325k ± 1% 325k ± 1% ~ (p=0.867 n=60+60)
GoTypes 1.16M ± 0% 1.15M ± 0% -0.40% (p=0.000 n=60+59)
Compiler 4.22M ± 0% 4.19M ± 0% -0.61% (p=0.000 n=59+60)
Passes toolstash -cmp.
Change-Id: I8175efe55201ffb5017f79ae6cb90df03f1b7e99
Reviewed-on: https://go-review.googlesource.com/37458
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Constant evaluation provides some rudimentary
knowledge of dead code at inlining decision time.
Use it.
This CL addresses only dead code inside if statements.
For statements are never inlined anyway,
and dead code inside for statements is rare.
Analyzing switch statements is worth doing,
but it is more complicated, since we would have
to evaluate each case; leave it for later.
Fixes#9274
After this CL, the following functions in std+cmd
can be newly inlined:
cmd/internal/obj/x86/asm6.go:3122: can inline subreg
cmd/vendor/golang.org/x/arch/x86/x86asm/decode.go:172: can inline instPrefix
cmd/vendor/golang.org/x/arch/x86/x86asm/decode.go:202: can inline truncated
go/constant/value.go:234: can inline makeFloat
go/types/labels.go:52: can inline (*block).insert
math/big/float.go:231: can inline (*Float).Sign
math/bits/bits.go:57: can inline OnesCount
net/http/server.go:597: can inline (*Server).newConn
runtime/hashmap.go:1165: can inline reflect_maplen
runtime/proc.go:207: can inline os_beforeExit
runtime/signal_unix.go:55: can inline init.5
runtime/stack.go:1081: can inline gostartcallfn
Change-Id: I4c92fb96aa0c3d33df7b3f2da548612e79b56b5b
Reviewed-on: https://go-review.googlesource.com/37499
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Follow-up to CL 37270.
This considerably reduces the time to run the test.
Before:
real 0m7.638s
user 0m14.341s
sys 0m2.244s
After:
real 0m4.867s
user 0m7.107s
sys 0m1.842s
Change-Id: I8837a5da0979a1c365e1ce5874d81708249a4129
Reviewed-on: https://go-review.googlesource.com/37461
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
New special case for booleans and byte-sized integer types
converted to interfaces needs to ensure that the operand is
not too complex, if it were to appear in a parameter list
for example.
Added test, also increased the recursive node dump depth to
a level that was actually useful for an actual bug.
Fixes#19275.
Change-Id: If36ac3115edf439e886703f32d149ee0a46eb2a5
Reviewed-on: https://go-review.googlesource.com/37470
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Add Set3 function to complement existing Set1 and Set2 functions.
Consistently use Set1, Set2 and Set3 for []*Node instead of Set where applicable.
Add SetFirst and SetSecond for setting elements of []*Node to mirror
First and Second for accessing elements in []*Node.
Replace uses of Index by First and Second and
SetIndex with SetFirst and SetSecond where applicable.
Passes toolstash -cmp.
Change-Id: I8255aae768cf245c8f93eec2e9efa05b8112b4e5
Reviewed-on: https://go-review.googlesource.com/37430
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TestAssembly was very slow, leading to it being skipped by default.
This is not surprising, it separately invoked the compiler and
parsed the result many times.
Now the test assembles one source file for arch/os combination,
containing the relevant functions.
Tests for each arch/os run in parallel.
Now the test runs approximately 10x faster on my Intel(R) Core(TM)
i5-6600 CPU @ 3.30GHz.
Fixes#18966
Change-Id: I45ab97630b627a32e17900c109f790eb4c0e90d9
Reviewed-on: https://go-review.googlesource.com/37270
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
CL 35562 substituted zerobase for the pointer for
interfaces containing zero-sized values.
However, it failed to evaluate the zero-sized value
expression for side-effects. Fix that.
The other similar interface value optimizations
are not affected, because they all actually use the
value one way or another.
Fixes#19246
Change-Id: I1168a99561477c63c29751d5cd04cf81b5ea509d
Reviewed-on: https://go-review.googlesource.com/37395
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Avoid printing a second error message when a field of an undefined
variable is accessed.
Fixes#8440.
Change-Id: I3fe0b11fa3423cec3871cb01b5951efa8ea7451a
Reviewed-on: https://go-review.googlesource.com/36751
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
When set to false, the -dolinkobj flag instructs the compiler
not to generate or emit linker information.
This is handy when you need the compiler's export data,
e.g. for use with go/importer,
but you want to avoid the cost of full compilation.
This must be used with care, since the resulting
files are unusable for linking.
This CL interacts with #18369,
where adding gcflags and ldflags to buildid has been mooted.
On the one hand, adding gcflags would make safe use of this
flag easier, since if the full object files were needed,
a simple 'go install' would fix it.
On the other hand, this would mean that
'go install -gcflags=-dolinkobj=false' would rebuild the object files,
although any existing object files would probably suffice.
Change-Id: I8dc75ab5a40095c785c1a4d2260aeb63c4d10f73
Reviewed-on: https://go-review.googlesource.com/37384
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Fixes#19012.
Fallback to return signatures without detailed types.
These error message will be of the form of issue:
* https://golang.org/issues/4215
* https://golang.org/issues/6750
So:
func f(x int, y uint) {
return x > y
}
f(10, "a" < 3)
will give errors:
too many errors to return
too many arguments in call to f
instead of:
too many errors to return
have (<T>)
want ()
too many arguments in call to f
have (number, <T>)
want (number, number)
Change-Id: I680abc7cdd8444400e234caddf3ff49c2d69f53d
Reviewed-on: https://go-review.googlesource.com/36806
Reviewed-by: Robert Griesemer <gri@golang.org>
Keith pointed out that these rules should zero extend during the review
of CL 36845. In practice the generic rules are responsible for eliminating
most load-hit-stores and they do not have this problem. When the s390x
rules are triggered any cast following the elided load-hit-store is
kept because of the sequence the rules are applied in (i.e. the load is
removed before the zero extension gets a chance to be merged into the load).
It is therefore not clear that this issue results in any functional bugs.
This CL includes a test, but it only tests the generic rules currently.
Change-Id: Idbc43c782097a3fb159be293ec3138c5b36858ad
Reviewed-on: https://go-review.googlesource.com/37154
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The definition of writeBarrier in the runtime was changed in CL 22855
to include padding. Update the definition built in to the compiler to match.
This doesn't affect the generated code, as the compiler sets the type
to use anyhow, but having them be different seems clearly wrong.
Change-Id: I8eac05bf70a424a0b2338ba5e9e41af231316de0
Reviewed-on: https://go-review.googlesource.com/37377
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The type of the OffPtr should be consistent with the type of the
following load. Before this CL it was typed as a pointer to the
struct.
Fixes#19164.
Change-Id: Ibcdec4411c6f719702f76f8dba3cce8691bfbe0c
Reviewed-on: https://go-review.googlesource.com/37254
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Change list https://golang.org/cl/37015/ moved the optimization
of division by constants to the generic ssa backend.
This removes the old now unused code that was used
for this optimization outside of the ssa backend.
Change-Id: I86223e56742e48dbb372ba8d779681e66448c513
Reviewed-on: https://go-review.googlesource.com/37198
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
We can immediately emit static assignment data rather than queueing
them up to be processed during SSA building.
Passes toolstash -cmp.
Change-Id: I8bcea4b72eafb0cc0b849cd93e9cde9d84f30d5e
Reviewed-on: https://go-review.googlesource.com/37024
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
These seem not to really matter, but good to be correct.
Change-Id: I02edb9797c3d6739725cfbe4723c75f151acd05e
Reviewed-on: https://go-review.googlesource.com/36837
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
SSA's writebarrier pass requires WB store ops are always at the
end of a block. If we move write barrier insertion into SSA and
emits normal Store ops when building SSA, this requirement becomes
impractical -- it will create too many blocks for all the Store
ops.
Redo SSA's writebarrier pass, explicitly order values in store
order, so it no longer needs this requirement.
Updates #17583.
Fixes#19067.
Change-Id: I66e817e526affb7e13517d4245905300a90b7170
Reviewed-on: https://go-review.googlesource.com/36834
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Currently the conversion from constant divides to multiplies is mostly
done during the walk pass. This is suboptimal because SSA can
determine that the value being divided by is constant more often
(e.g. after inlining).
Change-Id: If1a9b993edd71be37396b9167f77da271966f85f
Reviewed-on: https://go-review.googlesource.com/37015
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Currently, whether we need a write barrier is simply a property of the
pointer slot being written to.
The only optimization we currently apply using the value being written
is that pointers to stack variables can omit write barriers because
they're only written to stack slots... but we already omit write
barriers for all writes to the stack anyway.
Passes toolstash -cmp.
Change-Id: I7f16b71ff473899ed96706232d371d5b2b7ae789
Reviewed-on: https://go-review.googlesource.com/37109
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
... and same for stores. This does for binary.BigEndian.Uint16() what
was already done for Uint32 and Uint64 with BSWAP in 10f75748 (CL 32222).
Here is how generated code changes e.g. for the following function
(omitting saying the same prologue/epilogue):
func get16(b [2]byte) uint16 {
return binary.BigEndian.Uint16(b[:])
}
"".get16 t=1 size=21 args=0x10 locals=0x0
// before
0x0000 00000 (x.go:15) MOVBLZX "".b+9(FP), AX
0x0005 00005 (x.go:15) MOVBLZX "".b+8(FP), CX
0x000a 00010 (x.go:15) SHLL $8, CX
0x000d 00013 (x.go:15) ORL CX, AX
// after
0x0000 00000 (x.go:15) MOVWLZX "".b+8(FP), AX
0x0005 00005 (x.go:15) ROLW $8, AX
encoding/binary is speedup overall a bit:
name old time/op new time/op delta
ReadSlice1000Int32s-4 4.83µs ± 0% 4.83µs ± 0% ~ (p=0.206 n=4+5)
ReadStruct-4 1.29µs ± 2% 1.28µs ± 1% -1.27% (p=0.032 n=4+5)
ReadInts-4 384ns ± 1% 385ns ± 1% ~ (p=0.968 n=4+5)
WriteInts-4 534ns ± 3% 526ns ± 0% -1.54% (p=0.048 n=4+5)
WriteSlice1000Int32s-4 5.02µs ± 0% 5.11µs ± 3% ~ (p=0.175 n=4+5)
PutUint16-4 0.59ns ± 0% 0.49ns ± 2% -16.95% (p=0.016 n=4+5)
PutUint32-4 0.52ns ± 0% 0.52ns ± 0% ~ (all equal)
PutUint64-4 0.53ns ± 0% 0.53ns ± 0% ~ (all equal)
PutUvarint32-4 19.9ns ± 0% 19.9ns ± 1% ~ (p=0.556 n=4+5)
PutUvarint64-4 54.5ns ± 1% 54.2ns ± 0% ~ (p=0.333 n=4+5)
name old speed new speed delta
ReadSlice1000Int32s-4 829MB/s ± 0% 828MB/s ± 0% ~ (p=0.190 n=4+5)
ReadStruct-4 58.0MB/s ± 2% 58.7MB/s ± 1% +1.30% (p=0.032 n=4+5)
ReadInts-4 78.0MB/s ± 1% 77.8MB/s ± 1% ~ (p=0.968 n=4+5)
WriteInts-4 56.1MB/s ± 3% 57.0MB/s ± 0% ~ (p=0.063 n=4+5)
WriteSlice1000Int32s-4 797MB/s ± 0% 783MB/s ± 3% ~ (p=0.190 n=4+5)
PutUint16-4 3.37GB/s ± 0% 4.07GB/s ± 2% +20.83% (p=0.016 n=4+5)
PutUint32-4 7.73GB/s ± 0% 7.72GB/s ± 0% ~ (p=0.556 n=4+5)
PutUint64-4 15.1GB/s ± 0% 15.1GB/s ± 0% ~ (p=0.905 n=4+5)
PutUvarint32-4 201MB/s ± 0% 201MB/s ± 0% ~ (p=0.905 n=4+5)
PutUvarint64-4 147MB/s ± 1% 147MB/s ± 0% ~ (p=0.286 n=4+5)
( "a bit" only because most of the time is spent in reflection-like things
there, not actual bytes decoding. Even for direct PutUint16 benchmark the
looping adds overhead and lowers visible benefit. For code-generated encoders /
decoders actual effect is more than 20% )
Adding Uint32 and Uint64 raw benchmarks too for completeness.
NOTE I had to adjust load-combining rule for bswap case to match first 2 bytes
loads as result of "2-bytes load+shift" -> "loadw + rorw 8" rewrite. Reason is:
for loads+shift, even e.g. into uint16 var
var b []byte
var v uin16
v = uint16(b[1]) | uint16(b[0])<<8
the compiler eventually generates L(ong) shift - SHLLconst [8], probably
because it is more straightforward / other reasons to work on the whole
register. This way 2 bytes rewriting rule is using SHLLconst (not SHLWconst) in
its pattern, and then it always gets matched first, even if 2-byte rule comes
syntactically after 4-byte rule in AMD64.rules because 4-bytes rule seemingly
needs more applyRewrite() cycles to trigger. If 2-bytes rule gets matched for
inner half of
var b []byte
var v uin32
v = uint32(b[3]) | uint32(b[2])<<8 | uint32(b[1])<<16 | uint32(b[0])<<24
and we keep 4-byte load rule unchanged, the result will be MOVW + RORW $8 and
then series of byte loads and shifts - not one MOVL + BSWAPL.
There is no such problem for stores: there compiler, since it probably knows
store destination is 2 bytes wide, uses SHRWconst 8 (not SHRLconst 8) and thus
2-byte store rule is not a subset of rule for 4-byte stores.
Fixes#17151 (int16 was last missing piece there)
Change-Id: Idc03ba965bfce2b94fef456b02ff6742194748f6
Reviewed-on: https://go-review.googlesource.com/34636
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
CL 35261 introduces special handling of zero-valued STRUCTLIT for
efficient struct zeroing. But it didn't cover all use cases, for
example, CONVNOP STRUCTLIT is not handled.
On the other hand, CL 34566 handles zeroing earlier, so we don't
need the change in CL 35261 for efficient zeroing. Other uses of
zero-valued struct literals are very rare. So undo the change in
walk.go in CL 35261.
Add a test for efficient zeroing.
Fixes#19084.
Change-Id: I0807f7423fb44d47bf325b3c1ce9611a14953853
Reviewed-on: https://go-review.googlesource.com/36955
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
It is not always obvious from the first glance when looking at
TestAssembly failure in which context the code was generated. For
example x86 and x86-64 are similar, and those of us who do not work with
assembly every day can even take s390x version as something similar to x86.
So when something fails lets print the whole test context - this
includes os and arch which were previously missing. An example failure:
before:
--- FAIL: TestAssembly (40.48s)
asm_test.go:46: expected: MOVWZ \(.*\),
go:
import "encoding/binary"
func f(b []byte) uint32 {
return binary.LittleEndian.Uint32(b)
}
asm:"".f t=1 size=160 args=0x20 locals=0x0
...
after:
--- FAIL: TestAssembly (40.43s)
asm_test.go:46: linux/s390x: expected: MOVWZ \(.*\),
go:
import "encoding/binary"
func f(b []byte) uint32 {
return binary.LittleEndian.Uint32(b)
}
asm:"".f t=1 size=160 args=0x20 locals=0x0
Motivated-by: #18946#issuecomment-279491071
Change-Id: I61089ceec05da7a165718a7d69dec4227dd0e993
Reviewed-on: https://go-review.googlesource.com/36881
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
MOVD{reg,nop} operations (added in CL 36256) inserted to preserve
type information were blocking the load-combining rules. Fix this
by merging type changes into loads wherever possible.
Fixes#19059.
Change-Id: I8a1df06eb0f231b40ae43107d4a3bd0b9c441b59
Reviewed-on: https://go-review.googlesource.com/36843
Run-TryBot: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
CL 33632 reorders args of commutative ops in order to make
CSE for commutative ops more robust. Unfortunately, that
broke the load-combining rules which depend on a certain ordering
of OR ops' arguments.
Introduce some additional rules that order OR ops' arguments
consistently so that the load-combining rules fire.
Note: there's also something else wrong with the s390x rules.
I've filed #19059 for that.
Fixes#18946
Change-Id: I0a5447196bd88a55ccee683c69a57b943a9972e1
Reviewed-on: https://go-review.googlesource.com/36911
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
When doing i.(T) for non-empty-interface i and concrete type T,
there's no need to read the type out of the itab. Just compare the
itab to the itab we expect for that interface/type pair.
Also optimize type switches by putting the type hash of the
concrete type in the itab. That way we don't need to load the
type pointer out of the itab.
Update #18492
Change-Id: I49e280a21e5687e771db5b8a56b685291ac168ce
Reviewed-on: https://go-review.googlesource.com/34810
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Add temporaries to reorder the assignment for OAS2XXX nodes.
This makes orderstmt(), rewrite
a, b, c = ...
as
tmp1, tmp2, tmp3 = ...
a, b, c = tmp1, tmp2, tmp3
and
a, ok = ...
as
t1, t2 = ...
a = t1
ok = t2
Fixes#13433.
Change-Id: Id0f5956e3a254d0a6f4b89b5f7b0e055b1f0e21f
Reviewed-on: https://go-review.googlesource.com/34713
Run-TryBot: Dhananjay Nakrani <dhananjayn@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Instead we can just call needwritebarrier when constructing the SSA
representation.
Change-Id: I6fefaad49daada9cdb3050f112889e49dca0047b
Reviewed-on: https://go-review.googlesource.com/34566
Reviewed-by: Cherry Zhang <cherryyz@google.com>
CL 35554 taught order.go to use static variables
for constants that needed to be addressable for runtime routines.
However, there is one class of runtime routines that
do not actually need an addressable value: fast map access routines.
This CL teaches order.go to avoid using static variables
for addressability in those cases.
Instead, it avoids introducing a temp at all,
which the backend would just have to optimize away.
Fixes#19015.
Change-Id: I5ef780c604fac3fb48dabb23a344435e283cb832
Reviewed-on: https://go-review.googlesource.com/36693
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
go:systemstack works by tweaking the stack check prologue to check
against a different bound, while go:nosplit removes the stack check
prologue entirely. Hence, they can't be used together. Make the build
fail if they are.
Change-Id: I2d180c4b1d31ff49ec193291ecdd42921d253359
Reviewed-on: https://go-review.googlesource.com/36710
Run-TryBot: Austin Clements <austin@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
In cmd/compile, we can directly construct obj.Auto to represent local
variables and attach them to the function's obj.LSym.
In preparation for being able to emit more precise DWARF info based on
other compiler available information (e.g., lexical scoping).
Change-Id: I9c4225ec59306bec42552838493022e0e9d70228
Reviewed-on: https://go-review.googlesource.com/36420
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>