Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Keith Randall	076eae436e	cmd/compile: move amd64 and 386 over to new bounds check strategy Change-Id: I13f54f04ccb8452e625dba4249e0d56bafd1fad8 Reviewed-on: https://go-review.googlesource.com/c/go/+/682397 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>	2025-07-24 16:06:16 -07:00
Keith Randall	dbaa2d3e65	cmd/compile: do nil check before calling duff functions, on arm64 and amd64 On these platforms, we set up a frame pointer record below the current stack pointer, so when we're in duffcopy or duffzero, we get a reasonable traceback. See #73753. But because this frame pointer record is below SP, it is vulnerable. Anything that adds a new stack frame to the stack might clobber it. Which actually happens in #73748 on amd64. I have not yet come across a repro on arm64, but might as well be safe here. The only real situation this could happen is when duffzero or duffcopy is passed a nil pointer. So we can just avoid the problem by doing the nil check outside duffzero/duffcopy. That way we never add a frame below duffzero/duffcopy. (Most other ways to get a new frame below the current one, like async preempt or debugger-generated calls, don't apply to duffzero/duffcopy because they are runtime functions; we're not allowed to preempt there.) Longer term, we should stop putting stuff below SP. #73753 will include that as part of its remit. But that's not for 1.25, so we'll do the simple thing for 1.25 for this issue. Fixes #73748 Change-Id: I913c49ee46dcaee8fb439415a4531f7b59d0f612 Reviewed-on: https://go-review.googlesource.com/c/go/+/676916 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>	2025-05-28 20:35:09 -07:00
khr@golang.org	6729fbe93e	cmd/compile: on amd64, use flag result of x instead of doing (TEST x x) So we can avoid using a TEST where it isn't needed. Currently only implemented for ADD{Q,L}const. Change-Id: Ia9c4c69bb6033051a45cfd3d191376c7cec9d423 Reviewed-on: https://go-review.googlesource.com/c/go/+/669875 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org>	2025-05-05 13:07:39 -07:00
Jakub Ciolek	43b7e67040	cmd/compile: lower x*z + y to FMA if FMA enabled There is a generic opcode for FMA, but we don't use it in rewrite rules. This is maybe because some archs, like WASM and MIPS don't have a late lowering rule for it. Fixes #71204 Intel Alder Lake 12600k (GOAMD64=v3): math: name old time/op new time/op delta Acos-16 4.58ns ± 0% 3.36ns ± 0% -26.68% (p=0.008 n=5+5) Acosh-16 8.04ns ± 1% 6.44ns ± 0% -19.95% (p=0.008 n=5+5) Asin-16 4.28ns ± 0% 3.32ns ± 0% -22.24% (p=0.008 n=5+5) Asinh-16 9.92ns ± 0% 8.62ns ± 0% -13.13% (p=0.008 n=5+5) Atan-16 2.31ns ± 0% 1.84ns ± 0% -20.02% (p=0.008 n=5+5) Atanh-16 7.79ns ± 0% 7.03ns ± 0% -9.67% (p=0.008 n=5+5) Atan2-16 3.93ns ± 0% 3.52ns ± 0% -10.35% (p=0.000 n=5+4) Cbrt-16 4.62ns ± 0% 4.41ns ± 0% -4.57% (p=0.016 n=4+5) Ceil-16 0.14ns ± 1% 0.14ns ± 2% ~ (p=0.103 n=5+5) Copysign-16 0.33ns ± 0% 0.33ns ± 0% +0.03% (p=0.029 n=4+4) Cos-16 4.87ns ± 0% 4.75ns ± 0% -2.44% (p=0.016 n=5+4) Cosh-16 4.86ns ± 0% 4.86ns ± 0% ~ (p=0.317 n=5+5) Erf-16 2.71ns ± 0% 2.25ns ± 0% -16.69% (p=0.008 n=5+5) Erfc-16 3.06ns ± 0% 2.67ns ± 0% -13.00% (p=0.016 n=5+4) Erfinv-16 3.88ns ± 0% 2.84ns ± 3% -26.83% (p=0.008 n=5+5) Erfcinv-16 4.08ns ± 0% 3.01ns ± 1% -26.27% (p=0.008 n=5+5) Exp-16 3.29ns ± 0% 3.37ns ± 2% +2.64% (p=0.016 n=4+5) ExpGo-16 8.44ns ± 0% 7.48ns ± 1% -11.37% (p=0.008 n=5+5) Expm1-16 4.46ns ± 0% 3.69ns ± 2% -17.26% (p=0.016 n=4+5) Exp2-16 8.20ns ± 0% 7.39ns ± 2% -9.94% (p=0.008 n=5+5) Exp2Go-16 8.26ns ± 0% 7.23ns ± 0% -12.49% (p=0.016 n=4+5) Abs-16 0.26ns ± 3% 0.22ns ± 1% -16.34% (p=0.008 n=5+5) Dim-16 0.38ns ± 1% 0.40ns ± 2% +5.02% (p=0.008 n=5+5) Floor-16 0.11ns ± 1% 0.17ns ± 4% +54.99% (p=0.008 n=5+5) Max-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.619 n=5+5) Min-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.484 n=5+5) Mod-16 13.4ns ± 1% 12.8ns ± 0% -4.21% (p=0.016 n=5+4) Frexp-16 1.70ns ± 0% 1.71ns ± 0% +0.46% (p=0.008 n=5+5) Gamma-16 3.97ns ± 0% 3.97ns ± 0% ~ (p=0.643 n=5+5) Hypot-16 2.11ns ± 0% 2.11ns ± 0% ~ (p=0.762 n=5+5) HypotGo-16 2.48ns ± 4% 2.26ns ± 0% -8.94% (p=0.008 n=5+5) Ilogb-16 1.67ns ± 0% 1.67ns ± 0% -0.07% (p=0.048 n=5+5) J0-16 19.8ns ± 0% 19.3ns ± 0% ~ (p=0.079 n=4+5) J1-16 19.4ns ± 0% 18.9ns ± 0% -2.63% (p=0.000 n=5+4) Jn-16 41.5ns ± 0% 40.6ns ± 0% -2.32% (p=0.016 n=4+5) Ldexp-16 2.26ns ± 0% 2.26ns ± 0% ~ (p=0.683 n=5+5) Lgamma-16 4.40ns ± 0% 4.21ns ± 0% -4.21% (p=0.008 n=5+5) Log-16 4.05ns ± 0% 4.05ns ± 0% ~ (all equal) Logb-16 1.69ns ± 0% 1.69ns ± 0% ~ (p=0.429 n=5+5) Log1p-16 5.00ns ± 0% 3.99ns ± 0% -20.14% (p=0.008 n=5+5) Log10-16 4.22ns ± 0% 4.21ns ± 0% -0.15% (p=0.008 n=5+5) Log2-16 2.27ns ± 0% 2.25ns ± 0% -0.94% (p=0.008 n=5+5) Modf-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.492 n=5+5) Nextafter32-16 2.09ns ± 0% 2.09ns ± 0% ~ (p=0.079 n=4+5) Nextafter64-16 2.09ns ± 0% 2.09ns ± 0% ~ (p=0.095 n=4+5) PowInt-16 10.8ns ± 0% 10.8ns ± 0% ~ (all equal) PowFrac-16 25.3ns ± 0% 25.3ns ± 0% -0.09% (p=0.000 n=5+4) Pow10Pos-16 0.52ns ± 1% 0.52ns ± 0% ~ (p=0.810 n=5+5) Pow10Neg-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.381 n=5+5) Round-16 0.93ns ± 0% 0.93ns ± 0% ~ (p=0.056 n=5+5) RoundToEven-16 1.64ns ± 0% 1.64ns ± 0% ~ (all equal) Remainder-16 12.4ns ± 2% 12.0ns ± 0% -3.27% (p=0.008 n=5+5) Signbit-16 0.37ns ± 0% 0.37ns ± 0% -0.19% (p=0.008 n=5+5) Sin-16 4.04ns ± 0% 3.92ns ± 0% -3.13% (p=0.000 n=4+5) Sincos-16 5.99ns ± 0% 5.80ns ± 0% -3.03% (p=0.008 n=5+5) Sinh-16 5.22ns ± 0% 5.22ns ± 0% ~ (p=0.651 n=5+4) SqrtIndirect-16 0.41ns ± 0% 0.41ns ± 0% ~ (p=0.333 n=4+5) SqrtLatency-16 2.66ns ± 0% 2.66ns ± 0% ~ (p=0.079 n=4+5) SqrtIndirectLatency-16 2.66ns ± 0% 2.66ns ± 0% ~ (p=1.000 n=5+5) SqrtGoLatency-16 30.1ns ± 0% 28.6ns ± 1% -4.84% (p=0.008 n=5+5) SqrtPrime-16 645ns ± 0% 645ns ± 0% ~ (p=0.095 n=5+4) Tan-16 4.21ns ± 0% 4.09ns ± 0% -2.76% (p=0.029 n=4+4) Tanh-16 5.36ns ± 0% 5.36ns ± 0% ~ (p=0.444 n=5+5) Trunc-16 0.12ns ± 6% 0.11ns ± 1% -6.79% (p=0.008 n=5+5) Y0-16 19.2ns ± 0% 18.7ns ± 0% -2.52% (p=0.000 n=5+4) Y1-16 19.1ns ± 0% 18.4ns ± 0% ~ (p=0.079 n=4+5) Yn-16 40.7ns ± 0% 39.5ns ± 0% -2.82% (p=0.008 n=5+5) Float64bits-16 0.21ns ± 0% 0.21ns ± 0% ~ (p=0.603 n=5+5) Float64frombits-16 0.21ns ± 0% 0.21ns ± 0% ~ (p=0.984 n=4+5) Float32bits-16 0.21ns ± 0% 0.21ns ± 0% ~ (p=0.778 n=4+5) Float32frombits-16 0.21ns ± 0% 0.20ns ± 0% ~ (p=0.397 n=5+5) FMA-16 0.82ns ± 0% 0.82ns ± 0% +0.02% (p=0.029 n=4+4) [Geo mean] 2.87ns 2.74ns -4.61% math/cmplx: name old time/op new time/op delta Abs-16 2.07ns ± 0% 2.05ns ± 0% -0.70% (p=0.016 n=5+4) Acos-16 36.5ns ± 0% 35.7ns ± 0% -2.33% (p=0.029 n=4+4) Acosh-16 37.0ns ± 0% 36.2ns ± 0% -2.20% (p=0.008 n=5+5) Asin-16 36.5ns ± 0% 35.7ns ± 0% -2.29% (p=0.008 n=5+5) Asinh-16 33.5ns ± 0% 31.6ns ± 0% -5.51% (p=0.008 n=5+5) Atan-16 15.5ns ± 0% 13.9ns ± 0% -10.61% (p=0.008 n=5+5) Atanh-16 15.0ns ± 0% 13.6ns ± 0% -9.73% (p=0.008 n=5+5) Conj-16 0.11ns ± 5% 0.11ns ± 1% ~ (p=0.421 n=5+5) Cos-16 12.3ns ± 0% 12.2ns ± 0% -0.60% (p=0.000 n=4+5) Cosh-16 12.1ns ± 0% 12.0ns ± 0% ~ (p=0.079 n=4+5) Exp-16 10.0ns ± 0% 9.8ns ± 0% -1.77% (p=0.008 n=5+5) Log-16 14.5ns ± 0% 13.7ns ± 0% -5.67% (p=0.008 n=5+5) Log10-16 14.5ns ± 0% 13.7ns ± 0% -5.55% (p=0.000 n=5+4) Phase-16 5.11ns ± 0% 4.25ns ± 0% -16.90% (p=0.008 n=5+5) Polar-16 7.12ns ± 0% 6.35ns ± 0% -10.90% (p=0.008 n=5+5) Pow-16 64.3ns ± 0% 63.7ns ± 0% -0.97% (p=0.008 n=5+5) Rect-16 5.74ns ± 0% 5.58ns ± 0% -2.73% (p=0.016 n=4+5) Sin-16 12.2ns ± 0% 12.2ns ± 0% -0.54% (p=0.000 n=4+5) Sinh-16 12.1ns ± 0% 12.0ns ± 0% -0.58% (p=0.000 n=5+4) Sqrt-16 5.30ns ± 0% 5.18ns ± 0% -2.36% (p=0.008 n=5+5) Tan-16 22.7ns ± 0% 22.6ns ± 0% -0.33% (p=0.008 n=5+5) Tanh-16 21.2ns ± 0% 20.9ns ± 0% -1.32% (p=0.008 n=5+5) [Geo mean] 11.3ns 10.8ns -3.97% Change-Id: Idcc4b357ba68477929c126289e5095b27a827b1b Reviewed-on: https://go-review.googlesource.com/c/go/+/646335 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-02-13 12:34:33 -08:00
Keith Randall	a7e331e671	cmd/compile: implement signed loads from read-only memory In addition to unsigned loads which already exist. This helps code that does switches on strings to constant-fold the switch away when the string being switched on is constant. Fixes #71699 Change-Id: If3051af0f7255d2a573da6f96b153a987a7f159d Reviewed-on: https://go-review.googlesource.com/c/go/+/649295 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@google.com>	2025-02-13 12:27:55 -08:00
Jakub Ciolek	8cb6d3b826	cmd/compile: mark PCMPEQB as commutative compilecmp linux/amd64: internal/runtime/maps internal/runtime/maps.(table).Delete changed internal/runtime/maps [cmd/compile] internal/runtime/maps.(Map).Delete changed internal/runtime/maps.(*table).Delete changed Change-Id: Ic3c95411c23cab7427e63105170de41e5766f809 Reviewed-on: https://go-review.googlesource.com/c/go/+/647935 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org>	2025-02-10 08:44:42 -08:00
Michael Pratt	dce30a1920	cmd/compile: intrinsify swissmap match calls with SIMD on amd64 Use similar SIMD operations to the ones used in Abseil. We still using 8-slot groups (even though the XMM registers could handle 16-slot groups) to keep the implementation simpler (no changes to the memory layout of maps). Still, the implementations of matchH2 and matchEmpty are shorter than the portable version using standard arithmetic operations. They also return a packed bitset, which avoids the need to shift in bitset.first. That said, the packed bitset is a downside in cognitive complexity, as we have to think about two different possible representations. This doesn't leak out of the API, but we do need to intrinsify bitset to switch to a compatible implementation. The compiler's intrinsics don't support intrinsifying methods, so the implementations move to free functions. This makes operations between 0-3% faster on my machine. e.g., MapGetHit/impl=runtimeMap/t=Int64/len=6-12 12.34n ± 1% 11.42n ± 1% -7.46% (p=0.000 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=12-12 15.14n ± 2% 14.88n ± 1% -1.72% (p=0.009 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=18-12 15.04n ± 6% 14.66n ± 2% -2.53% (p=0.000 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=24-12 15.80n ± 1% 15.48n ± 3% ~ (p=0.444 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=30-12 15.55n ± 4% 14.77n ± 3% -5.02% (p=0.004 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=64-12 15.26n ± 1% 15.05n ± 1% ~ (p=0.055 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=128-12 15.34n ± 1% 15.02n ± 2% -2.09% (p=0.000 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=256-12 15.42n ± 1% 15.15n ± 1% -1.75% (p=0.001 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=512-12 15.48n ± 1% 15.18n ± 1% -1.94% (p=0.000 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=1024-12 17.38n ± 1% 17.05n ± 1% -1.90% (p=0.000 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=2048-12 17.96n ± 0% 17.59n ± 1% -2.06% (p=0.000 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=4096-12 18.36n ± 1% 18.18n ± 1% -0.98% (p=0.013 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=8192-12 18.75n ± 0% 18.31n ± 1% -2.35% (p=0.000 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=65536-12 26.25n ± 0% 25.95n ± 1% -1.14% (p=0.000 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=262144-12 44.24n ± 1% 44.06n ± 1% ~ (p=0.181 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=1048576-12 85.02n ± 0% 85.35n ± 0% +0.39% (p=0.032 n=25) MapGetHit/impl=runtimeMap/t=Int64/len=4194304-12 98.87n ± 1% 98.85n ± 1% ~ (p=0.799 n=25) For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-amd64-goamd64v3 Change-Id: Ic1b852f02744404122cb3672900fd95f4625905e Reviewed-on: https://go-review.googlesource.com/c/go/+/626277 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com>	2024-11-20 22:41:14 +00:00
Michael Pratt	81c92352a7	runtime: move getcallerpc to internal/runtime/sys Moving these intrinsics to a base package enables other internal/runtime packages to use them. For #54766. Change-Id: I0b3eded3bb45af53e3eb5bab93e3792e6a8beb46 Reviewed-on: https://go-review.googlesource.com/c/go/+/613260 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2024-09-17 15:14:14 +00:00
Keith Randall	c18ff29295	cmd/compile: make sync/atomic AND/OR operations intrinsic on amd64 Update #61395 Change-Id: I59a950f48efc587dfdffce00e2f4f3ab99d8df00 Reviewed-on: https://go-review.googlesource.com/c/go/+/594738 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Nicolas Hillegeer <aktau@google.com>	2024-07-23 21:29:38 +00:00
Keith Randall	611706b171	cmd/compile: don't use BTS when OR works, add direct memory BTS operations Stop using BTSconst and friends when ORLconst can be used instead. OR can be issued by more function units than BTS can, so it could lead to better IPC. OR might take a few more bytes to encode, but not a lot more. Still use BTSconst for cases where the constant otherwise wouldn't fit and would require a separate movabs instruction to materialize the constant. This happens when setting bits 31-63 of 64-bit targets. Add BTS-to-memory operations so we don't need to load/bts/store. Fixes #61694 Change-Id: I00379608df8fb0167cb01466e97d11dec7c1596c Reviewed-on: https://go-review.googlesource.com/c/go/+/515755 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-08-04 16:40:24 +00:00
Keith Randall	319504ce43	cmd/compile: implement float min/max in hardware for amd64 and arm64 Update #59488 Change-Id: I89f5ea494cbcc887f6fae8560e57bcbd8749be86 Reviewed-on: https://go-review.googlesource.com/c/go/+/514596 Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-08-01 20:03:31 +00:00
Keith Randall	67983c0f78	cmd/compile: add indexed SET* opcodes for amd64 Update #61356 Change-Id: I391af98563b1c068208784c80ea736c78c29639d Reviewed-on: https://go-review.googlesource.com/c/go/+/510435 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Martin Möhrmann <martin@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2023-07-26 17:19:57 +00:00
Keith Randall	21d82e6ac8	cmd/compile: batch write barrier calls Have the write barrier call return a pointer to a buffer into which the generated code records pointers that need write barrier treatment. Change-Id: I7871764298e0aa1513de417010c8d46b296b199e Reviewed-on: https://go-review.googlesource.com/c/go/+/447781 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Bypass: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-02-24 00:21:13 +00:00
Jorropo	5c67ebbb31	cmd/compile: AMD64v3 remove unnecessary TEST comparision in isPowerOfTwo With GOAMD64=V3 the canonical isPowerOfTwo function: func isPowerOfTwo(x uintptr) bool { return x&(x-1) == 0 } Used to compile to: temp := BLSR(x) // x&(x-1) flags = TEST(temp, temp) return flags.zf However the blsr instruction already set ZF according to the result. So we can remove the TEST instruction if we are just checking ZF. Such as in multiple pieces of code around memory allocations. This make the code smaller and faster. Change-Id: Ia12d5a73aa3cb49188c0b647b1eff7b56c5a7b58 Reviewed-on: https://go-review.googlesource.com/c/go/+/448255 Run-TryBot: Jakub Ciolek <jakub@ciolek.dev> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-01-20 04:58:59 +00:00
Keith Randall	45dc81d856	cmd/compile: add memory argument to GetCallerSP We need to make sure that when we get the stack pointer, we get it at the right time. V = GetCallerSP Call() W = GetCallerSP If Call causes a stack growth, then we will be in a situation where V != W. So it matters when GetCallerSP operations get scheduled. Add a memory argument to GetCallerSP so it can't be reordered with things like calls. Change-Id: I6cc801134c38e358c5a1ec0c09d38379a16a4184 Reviewed-on: https://go-review.googlesource.com/c/go/+/453515 Reviewed-by: Martin Möhrmann <moehrmann@google.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <martin@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-01-19 22:43:22 +00:00
Keith Randall	9e059630ad	cmd/compile: clean up amd64 opcode comments Put comments about what operations do per block of related opcodes instead of on each line. This is less repetitive and lets us be a bit more verbose in our descriptions. Doesn't change the generated code at all. Change-Id: I98fbd4029df6537b10aac2113a00df121d0fca1b Reviewed-on: https://go-review.googlesource.com/c/go/+/433736 Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2022-12-05 22:22:11 +00:00
Keith Randall	5f7abeca5a	cmd/compile: teach regalloc about temporary registers Temporary registers are sometimes needed for an architecture backend which needs to use several machine instructions to implement a single SSA instruction. Mark such instructions so that regalloc can reserve the temporary register for it. That way we don't have to reserve a fixed register like we do now. Convert the temp-register-using instructions on amd64 to use this new mechanism. Other archs can follow as needed. Change-Id: I1d0c8588afdad5cd18b4398eb5a0f755be5dead7 Reviewed-on: https://go-review.googlesource.com/c/go/+/398556 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>	2022-11-17 18:53:13 +00:00
Russ Cox	164406ad93	cmd/compile: rename gen and builtin to _gen and _builtin These two directories are full of //go:build ignore files. We can ignore them more easily by putting an underscore at the start of the name. That also works around a bug in Go 1.17 that was not fixed until Go 1.17.3. Change-Id: Ia5389b65c79b1e6d08e4fef374d335d776d44ead Reviewed-on: https://go-review.googlesource.com/c/go/+/435472 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-10-04 19:35:46 +00:00

18 commits