Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Joel Sing	3f94f3d4b2	test/codegen: fix shift tests on riscv64 These were broken by CL 721206, which changes Rsh to RshU for positive inputs. Change-Id: I9e38c3c428fb8aeb70cf51e7e76f4711c864f027 Reviewed-on: https://go-review.googlesource.com/c/go/+/723340 Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-11-28 15:26:25 -08:00
thepudds	4879151d1d	cmd/compile: introduce alias analysis and automatically free non-aliased memory after growslice This CL is part of a set of CLs that attempt to reduce how much work the GC must do. See the design in https://go.dev/design/74299-runtime-freegc This CL updates the compiler to examine append calls to prove whether or not the slice is aliased. If proven unaliased, the compiler automatically inserts a call to a new runtime function introduced with this CL, runtime.growsliceNoAlias, which frees the old backing memory immediately after slice growth is complete and the old storage is logically dead. Two append benchmarks below show promising results, executing up to ~2x faster and up to factor of ~3 memory reduction with this CL. The approach works with multiple append calls for the same slice, including inside loops, and the final slice memory can be escaping, such as in a classic pattern of returning a slice from a function after the slice is built. (The final slice memory is never freed with this CL, though we have other work that tackles that.) An example target for this CL is we automatically free the intermediate memory for the appends in the loop in this function: func f1(input []int) []int { var s []int for _, x := range input { s = append(s, g(x)) // s cannot be aliased here if h(x) { s = append(s, x) // s cannot be aliased here } } return s // slice escapes at end } In this case, the compiler and the runtime collaborate so that the heap allocated backing memory for s is automatically freed after a successful grow. (For the first grow, there is nothing to free, but for the second and subsequent growths, the old heap memory is freed automatically.) The new runtime.growsliceNoAlias is primarily implemented by calling runtime.freegc, which we introduced in CL 673695. The high-level approach here is we step through the IR starting from a slice declaration and look for any operations that either alias the slice or might do so, and treat any IR construct we don't specifically handle as a potential alias (and therefore conservatively fall back to treating the slice as aliased when encountering something not understood). For loops, some additional care is required. We arrange the analysis so that an alias in the body of a loop causes all the appends in that same loop body to be marked aliased, even if the aliasing occurs after the append in the IR: func f2() { var s []int for i := range 10 { s = append(s, i) // aliased due to next line alias = s } } For nested loops, we analyse the nesting appropriately so that for example this append is still proven as non-aliased in the inner loop even though it aliased for the outer loop: func f3() { for range 10 { var s []int for i := range 10 { s = append(s, i) // append using non-aliased slice } alias = s } } A good starting point is the beginning of the test/escape_alias.go file, which starts with ~10 introductory examples with brief comments that attempt to illustrate the high-level approach. For more details, see the new .../internal/escape/alias.go file, especially the (*aliasAnalysis).analyze method. In the first benchmark, an append in a loop builds up a slice from nothing, where the slice elements are each 64 bytes. In the table below, 'count' is the number of appends. With 1 append, there is no opportunity for this CL to free memory. Once there are 2 appends, the growth from 1 element to 2 elements means the compiler-inserted growsliceNoAlias frees the 1-element array, and we see a ~33% reduction in memory use and a small reported speed improvement. As the number of appends increases for example to 5, we are at a ~20% speed improvement and ~45% memory reduction, and so on until we reach ~40% faster and ~50% less memory allocated at the end of the table. There can be variation in the reported numbers based on -randlayout, so this table is for 30 different values of -randlayout with a total n=150. (Even so, there is still some variation, so we probably should not read too much into small changes.) This is with GOAMD64=v3 on a VM that gcc reports is cascadelake. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ Append64Bytes/count=1-4 31.09n ± 2% 31.69n ± 1% +1.95% (n=150) Append64Bytes/count=2-4 73.31n ± 1% 70.27n ± 0% -4.15% (n=150) Append64Bytes/count=3-4 142.7n ± 1% 124.6n ± 1% -12.68% (n=150) Append64Bytes/count=4-4 149.6n ± 1% 127.7n ± 0% -14.64% (n=150) Append64Bytes/count=5-4 277.1n ± 1% 213.6n ± 0% -22.90% (n=150) Append64Bytes/count=6-4 280.7n ± 1% 216.5n ± 1% -22.87% (n=150) Append64Bytes/count=10-4 544.3n ± 1% 386.6n ± 0% -28.97% (n=150) Append64Bytes/count=20-4 1058.5n ± 1% 715.6n ± 1% -32.39% (n=150) Append64Bytes/count=50-4 2.121µ ± 1% 1.404µ ± 1% -33.83% (n=150) Append64Bytes/count=100-4 4.152µ ± 1% 2.736µ ± 1% -34.11% (n=150) Append64Bytes/count=200-4 7.753µ ± 1% 4.882µ ± 1% -37.03% (n=150) Append64Bytes/count=400-4 15.163µ ± 2% 9.273µ ± 1% -38.84% (n=150) geomean 601.8n 455.0n -24.39% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ Append64Bytes/count=1-4 64.00 ± 0% 64.00 ± 0% ~ (n=150) Append64Bytes/count=2-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) Append64Bytes/count=3-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=4-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=5-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=6-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=10-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) Append64Bytes/count=20-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) Append64Bytes/count=50-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) Append64Bytes/count=100-4 15.938Ki ± 0% 8.021Ki ± 0% -49.67% (n=150) Append64Bytes/count=200-4 31.94Ki ± 0% 16.08Ki ± 0% -49.64% (n=150) Append64Bytes/count=400-4 63.94Ki ± 0% 32.33Ki ± 0% -49.44% (n=150) geomean 1.991Ki 1.124Ki -43.54% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ Append64Bytes/count=1-4 1.000 ± 0% 1.000 ± 0% ~ (n=150) Append64Bytes/count=2-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) Append64Bytes/count=3-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=4-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=5-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=6-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=10-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) Append64Bytes/count=20-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) Append64Bytes/count=50-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) Append64Bytes/count=100-4 8.000 ± 0% 1.000 ± 0% -87.50% (n=150) Append64Bytes/count=200-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) Append64Bytes/count=400-4 10.000 ± 0% 1.000 ± 0% -90.00% (n=150) geomean 4.331 1.000 -76.91% The second benchmark is similar, but instead uses an 8-byte integer for the slice element. The first 4 appends in the loop never call into the runtime thanks to the excellent CL 664299 introduced by Keith in Go 1.25 that allows some <= 32 byte dynamically-sized slices to be on the stack, so this CL is neutral for <= 32 bytes. Once the 5th append occurs at count=5, a grow happens via the runtime and heap allocates as normal, but freegc does not yet have anything to free, so we see a small ~1.4ns penalty reported there. But once the second growth happens, the older heap memory is now automatically freed by freegc, so we start to see some benefit in memory reductions and speed improvements, starting at a tiny speed improvement (close to a wash, or maybe noise) by the second growth before count=10, and building up to ~2x faster with ~68% fewer allocated bytes reported. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ AppendInt/count=1-4 2.978n ± 0% 2.969n ± 0% -0.30% (p=0.000 n=150) AppendInt/count=4-4 4.292n ± 3% 4.163n ± 3% ~ (p=0.528 n=150) AppendInt/count=5-4 33.50n ± 0% 34.93n ± 0% +4.25% (p=0.000 n=150) AppendInt/count=10-4 76.21n ± 1% 75.67n ± 0% -0.72% (p=0.000 n=150) AppendInt/count=20-4 150.6n ± 1% 133.0n ± 0% -11.65% (n=150) AppendInt/count=50-4 284.1n ± 1% 225.6n ± 0% -20.59% (n=150) AppendInt/count=100-4 544.2n ± 1% 392.4n ± 1% -27.89% (n=150) AppendInt/count=200-4 1051.5n ± 1% 702.3n ± 0% -33.21% (n=150) AppendInt/count=400-4 2.041µ ± 1% 1.312µ ± 1% -35.70% (n=150) AppendInt/count=1000-4 5.224µ ± 2% 2.851µ ± 1% -45.43% (n=150) AppendInt/count=2000-4 11.770µ ± 1% 6.010µ ± 1% -48.94% (n=150) AppendInt/count=3000-4 17.747µ ± 2% 8.264µ ± 1% -53.44% (n=150) geomean 331.8n 246.4n -25.72% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) AppendInt/count=20-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) AppendInt/count=50-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) AppendInt/count=100-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) AppendInt/count=200-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) AppendInt/count=400-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) AppendInt/count=1000-4 24.56Ki ± 0% 10.05Ki ± 0% -59.07% (n=150) AppendInt/count=2000-4 58.56Ki ± 0% 20.31Ki ± 0% -65.32% (n=150) AppendInt/count=3000-4 85.19Ki ± 0% 27.30Ki ± 0% -67.95% (n=150) geomean ² -42.81% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) AppendInt/count=20-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) AppendInt/count=50-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) AppendInt/count=100-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) AppendInt/count=200-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) AppendInt/count=400-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) AppendInt/count=1000-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) AppendInt/count=2000-4 11.000 ± 0% 1.000 ± 0% -90.91% (n=150) AppendInt/count=3000-4 12.000 ± 0% 1.000 ± 0% -91.67% (n=150) geomean ² -72.76% ² Of course, these are just microbenchmarks, but likely indicate there are some opportunities here. The immediately following CL 712422 tackles inlining and is able to get runtime.freegc working automatically with iterators such as used by slices.Collect, which becomes able to automatically free the intermediate memory from its repeated appends (which earlier in this work required a temporary hand edit to the slices package). For now, we only use the NoAlias version for element types without pointers while waiting on additional runtime support in CL 698515. Updates #74299 Change-Id: I1b9d286aa97c170dcc2e203ec0f8ca72d84e8221 Reviewed-on: https://go-review.googlesource.com/c/go/+/710015 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-11-26 19:04:05 -08:00
Keith Randall	3c6bf6fbf3	cmd/compile: handle loops better during stack allocation of slices Don't use the move2heap optimization if the move2heap is inside a loop deeper than the declaration of the slice. We really only want to do the move2heap operation once. Change-Id: I4a68d01609c2c9d4e0abe4580839e70059393a81 Reviewed-on: https://go-review.googlesource.com/c/go/+/722440 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-11-26 13:33:51 -08:00
Alexander Musman	dda7c8253d	cmd/compile,internal/bytealg: add MemEq intrinsic for runtime.memequal Introduce a new MemEq SSA operation for runtime.memequal. The operation is initially implemented for arm64. The change adds opt rules (following existing rules for call to runtime.memequal), working with MemEq, and a later op version LoweredMemEq which may be lowered differently for more constant size cases in future (for other targets as well as for arm64). The new MemEq SSA operation does not have memory result, allowing cse of loads operations around it. Code size difference (for arm64 linux): Executable Old .text New .text Change ------------------------------------------------------- asm 1970420 1969668 -0.04% cgo 1741220 1740212 -0.06% compile 8956756 8959428 +0.03% cover 1879332 1878772 -0.03% link 2574116 2572660 -0.06% preprofile 867124 866820 -0.04% vet 2890404 2888596 -0.06% Change-Id: I6ab507929b861884d17d5818cfbd152cf7879751 Reviewed-on: https://go-review.googlesource.com/c/go/+/686655 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2025-11-26 09:58:51 -08:00
Cherry Mui	220d73cc44	[dev.simd] all: merge master (`8dd5b13`) into dev.simd Merge List: + 2025-11-24 `8dd5b13abc` cmd/compile: relax stmtline_test on amd64 + 2025-11-23 `feae743bdb` cmd/compile: use 32x32->64 multiplies on loong64 + 2025-11-23 `e88be8a128` runtime: fix stale comment for mheap/malloc + 2025-11-23 `a318843a2a` cmd/internal/obj/loong64: optimize duplicate optab entries + 2025-11-23 `a18294bb6a` cmd/internal/obj/arm64, image/gif, runtime, sort: use math/bits to calculate log2 + 2025-11-23 `437323ef7b` slices: fix incorrect comment in slices.Insert function documentation + 2025-11-23 `1993dca400` doc/next: pre-announce end of support for macOS 12 in Go 1.27 + 2025-11-22 `337f7b1f5d` cmd/go: update default go directive in mod or work init + 2025-11-21 `3c26aef8fb` cmd/internal/obj/riscv: improve large branch/call/jump tests + 2025-11-21 `31aa9f800b` crypto/tls: use inner hello for earlyData when using QUIC and ECH + 2025-11-21 `d68aec8db1` runtime: replace trace seqlock with write flag + 2025-11-21 `8d9906cd34` runtime/trace: add Log benchmark + 2025-11-21 `6aeacdff38` cmd/go: support sha1 repos when git default is sha256 + 2025-11-21 `9570036ca5` crypto/sha3: make the zero value of SHAKE useable + 2025-11-21 `155efbbeeb` crypto/sha3: make the zero value of SHA3 useable + 2025-11-21 `6f16669e34` database/sql: don't ignore ColumnConverter for unknown input count + 2025-11-21 `121bc3e464` runtime/pprof: remove hard-coded sleep in CPU profile reader + 2025-11-21 `b604148c4e` runtime: fix double wakeup in CPU profile buffer + 2025-11-21 `22f24f90b5` cmd/compile: change testing.B.Loop keep alive semantic + 2025-11-21 `cfb9d2eb73` net: remove unused linknames + 2025-11-21 `65ef314f89` net/http: remove unused linknames + 2025-11-21 `0f32fbc631` net/http: populate Response.Request when using NewFileTransport + 2025-11-21 `3e0a8e7867` net/http: preserve original path encoding in redirects + 2025-11-21 `831af61120` net/http: use HTTP 307 redirects in ServeMux + 2025-11-21 `87269224cb` net/http: update Response.Request.URL after redirects on GOOS=js + 2025-11-21 `7aa9ca729f` net/http/cookiejar: treat localhost as secure origin + 2025-11-21 `f870a1d398` net/url: warn that JoinPath arguments should be escaped + 2025-11-21 `9962d95fed` crypto/internal/fips140/mldsa: unroll NTT and inverseNTT + 2025-11-21 `f821fc46c5` crypto/internal/fisp140test: update acvptool, test data + 2025-11-21 `b59efc38a0` crypto/internal/fips140/mldsa: new package + 2025-11-21 `62741480b8` runtime: remove linkname for gopanic + 2025-11-21 `7db2f0bb9a` crypto/internal/hpke: separate KEM and PublicKey/PrivateKey interfaces + 2025-11-21 `e15800c0ec` crypto/internal/hpke: add ML-KEM and hybrid KEMs, and SHAKE KDFs + 2025-11-21 `7c985a2df4` crypto/internal/hpke: modularize API and support more ciphersuites + 2025-11-21 `e7d47ac33d` cmd/compile: simplify negative on multiplication + 2025-11-21 `35d2712b32` net/http: fix typo in Transport docs + 2025-11-21 `90c970cd0f` net: remove unnecessary loop variable copies in tests + 2025-11-21 `9772d3a690` cmd/cgo: strip top-level const qualifier from argument frame struct + 2025-11-21 `1903782ade` errors: add examples for custom Is/As matching + 2025-11-21 `ec92bc6d63` cmd/compile: rewrite Rsh to RshU if arguments are proved positive + 2025-11-21 `3820f94c1d` cmd/compile: propagate unsigned relations for Rsh if arguments are positive + 2025-11-21 `d474f1fd21` cmd/compile: make dse track multiple shadowed ranges + 2025-11-21 `d0d0a72980` cmd/compile/internal/ssa: correct type of ARM64 conditional instructions + 2025-11-21 `a9704f89ea` internal/runtime/gc/scan: add AVX512 impl of filterNil. + 2025-11-21 `ccd389036a` cmd/internal/objabi: remove -V=goexperiment internal special case + 2025-11-21 `e7787b9eca` runtime: go fmt + 2025-11-21 `17b3b98796` internal/strconv: go fmt + 2025-11-21 `c851827c68` internal/trace: go fmt + 2025-11-21 `f87aaec53d` cmd/compile: fix integer overflow in prove pass + 2025-11-21 `dbd2ab9992` cmd/compile/internal: fix typos + 2025-11-21 `b9d86baae3` cmd/compile/internal/devirtualize: fix typos + 2025-11-20 `4b0e3cc1d6` cmd/link: support loading R_LARCH_PCREL20_S2 and R_LARCH_CALL36 relocs + 2025-11-20 `cdba82c7d6` cmd/internal/obj/loong64: add {,X}VSLT.{B/H/W/V}{,U} instructions support + 2025-11-20 `bd2b117c2c` crypto/tls: add QUICErrorEvent + 2025-11-20 `3ad2e113fc` net/http/httputil: wrap ReverseProxy's outbound request body so Close is a noop + 2025-11-20 `d58b733646` runtime: track goroutine location until actual STW + 2025-11-20 `1bc54868d4` cmd/vendor: update to x/tools@68724af + 2025-11-20 `8c3195973b` runtime: disable stack allocation tests on sanitizers + 2025-11-20 `ff654ea100` net/url: permit colons in the host of postgresql:// URLs + 2025-11-20 `a662badab9` encoding/json: remove linknames + 2025-11-20 `5afe237d65` mime: add missing path for mime types in godoc + 2025-11-20 `c1b7112af8` os/signal: make NotifyContext cancel the context with a cause Change-Id: Ib93ef643be610dfbdd83ff45095a7b1ca2537b8b	2025-11-24 11:03:06 -05:00
Xiaolin Zhao	feae743bdb	cmd/compile: use 32x32->64 multiplies on loong64 Gets rid of some sign extensions, like arm64. Change-Id: I9fc37e15a82718bfcf53db8cab0c4e7baaa0a747 Reviewed-on: https://go-review.googlesource.com/c/go/+/721522 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-11-23 23:54:44 -08:00
Meng Zhuo	e7d47ac33d	cmd/compile: simplify negative on multiplication goos: linux goarch: amd64 pkg: cmd/compile/internal/test cpu: AMD EPYC 7532 32-Core Processor │ simplify_base │ simplify_new │ │ sec/op │ sec/op vs base │ SimplifyNegMul 623.0n ± 0% 319.3n ± 1% -48.75% (p=0.000 n=10) goos: linux goarch: riscv64 pkg: cmd/compile/internal/test cpu: Spacemit(R) X60 │ simplify.base │ simplify.new │ │ sec/op │ sec/op vs base │ SimplifyNegMul 10.928µ ± 0% 6.432µ ± 0% -41.14% (p=0.000 n=10) Change-Id: I1d9393cd19a0b948a5d3a512d627cdc0cf0b38be Reviewed-on: https://go-review.googlesource.com/c/go/+/721520 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <markfreeman@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2025-11-21 12:40:29 -08:00
Cherry Mui	e3d4645693	[dev.simd] all: merge master (`ca37d24`) into dev.simd Conflicts: - src/cmd/compile/internal/typecheck/builtin.go Merge List: + 2025-11-20 `ca37d24e0b` net/http: drop unused "broken" field from persistConn + 2025-11-20 `4b740af56a` cmd/internal/obj/x86: handle global reference in From3 in dynlink mode + 2025-11-20 `790384c6c2` spec: adjust rule for type parameter on RHS of alias declaration + 2025-11-20 `a49b0302d0` net/http: correctly close fake net.Conns + 2025-11-20 `32f5aadd2f` cmd/compile: stack allocate backing stores during append + 2025-11-20 `a18aff8057` runtime: select GC mark workers during start-the-world + 2025-11-20 `829779f4fe` runtime: split findRunnableGCWorker in two + 2025-11-20 `ab59569099` go/version: use "custom" as an example of a version suffix + 2025-11-19 `c4bb9653ba` cmd/compile: Implement LoweredZeroLoop with LSX Instruction on loong64 + 2025-11-19 `7f2ae21fb4` cmd/internal/obj/loong64: add MULW.D.W[U] instructions + 2025-11-19 `a2946f2385` crypto: add Encapsulator and Decapsulator interfaces + 2025-11-19 `6b83bd7146` crypto/ecdh: add KeyExchanger interface + 2025-11-19 `4fef9f8b55` go/types, types2: fix object path for grouped declaration statements + 2025-11-19 `33529db142` spec: escape double-ampersands + 2025-11-19 `dc42565a20` cmd/compile: fix control flow for unsigned divisions proof relations + 2025-11-19 `e64023dcbf` cmd/compile: cleanup useless if statement in prove + 2025-11-19 `2239520d1c` test: go fmt prove.go tests + 2025-11-19 `489d3dafb7` math: switch s390x math.Pow to generic implementation + 2025-11-18 `8c41a482f9` runtime: add dlog.hexdump + 2025-11-18 `e912618bd2` runtime: add hexdumper + 2025-11-18 `2cf9d4b62f` Revert "net/http: do not discard body content when closing it within request handlers" + 2025-11-18 `4d0658bb08` cmd/compile: prefer fixed registers for values + 2025-11-18 `ba634ca5c7` cmd/compile: fold boolean NOT into branches + 2025-11-18 `8806d53c10` cmd/link: align sections, not symbols after DWARF compress + 2025-11-18 `c93766007d` runtime: do not print recovered when double panic with the same value + 2025-11-18 `9859b43643` cmd/asm,cmd/compile,cmd/internal/obj/riscv: use compressed instructions on riscv64 + 2025-11-17 `b9ef0633f6` cmd/internal/sys,internal/goarch,runtime: enable the use of compressed instructions on riscv64 + 2025-11-17 `a087dea869` debug/elf: sync new loong64 relocation types up to LoongArch ELF psABI v20250521 + 2025-11-17 `e1a12c781f` cmd/compile: use 32x32->64 multiplies on arm64 + 2025-11-17 `6caab99026` runtime: relax TestMemoryLimit on darwin a bit more + 2025-11-17 `eda2e8c683` runtime: clear frame pointer at thread entry points + 2025-11-17 `6919858338` runtime: rename findrunnable references to findRunnable + 2025-11-17 `8e734ec954` go/ast: fix BasicLit.End position for raw strings containing \r + 2025-11-17 `592775ec7d` crypto/mlkem: avoid a few unnecessary inverse NTT calls + 2025-11-17 `590cf18daf` crypto/mlkem/mlkemtest: add derandomized Encapsulate768/1024 + 2025-11-17 `c12c337099` cmd/compile: teach prove about subtract idioms + 2025-11-17 `bc15963813` cmd/compile: clean up prove pass + 2025-11-17 `1297fae708` go/token: add (*File).End method + 2025-11-17 `65c09eafdf` runtime: hoist invariant code out of heapBitsSmallForAddrInline + 2025-11-17 `594129b80c` internal/runtime/maps: update doc for table.Clear + 2025-11-15 `c58d075e9a` crypto/rsa: deprecate PKCS#1 v1.5 encryption + 2025-11-14 `d55ecea9e5` runtime: usleep before stealing runnext only if not in syscall + 2025-11-14 `410ef44f00` cmd: update x/tools to 59ff18c + 2025-11-14 `50128a2154` runtime: support runtime.freegc in size-specialized mallocs for noscan objects + 2025-11-14 `c3708350a4` cmd/go: tests: rename git-min-vers->git-sha256 + 2025-11-14 `aea881230d` std: fix printf("%q", int) mistakes + 2025-11-14 `120f1874ef` runtime: add more precise test of assist credit handling for runtime.freegc + 2025-11-14 `fecfcaa4f6` runtime: add runtime.freegc to reduce GC work + 2025-11-14 `5a347b775e` runtime: set GOEXPERIMENT=runtimefreegc to disabled by default + 2025-11-14 `1a03d0db3f` runtime: skip tests for GOEXPERIMENT=arenas that do not handle clobberfree=1 + 2025-11-14 `cb0d9980f5` net/http: do not discard body content when closing it within request handlers + 2025-11-14 `03ed43988f` cmd/compile: allow multi-field structs to be stored directly in interfaces + 2025-11-14 `1bb1f2bf0c` runtime: put AddCleanup cleanup arguments in their own allocation + 2025-11-14 `9fd2e44439` runtime: add AddCleanup benchmark + 2025-11-14 `80c91eedbb` runtime: ensure weak handles end up in their own allocation + 2025-11-14 `7a8d0b5d53` runtime: add debug mode to extend _Grunning-without-P windows + 2025-11-14 `710abf74da` internal/runtime/cgobench: add Go function call benchmark for comparison + 2025-11-14 `b24aec598b` doc, cmd/internal/obj/riscv: document the riscv64 assembler + 2025-11-14 `a0e738c657` cmd/compile/internal: remove incorrect riscv64 SLTI rule + 2025-11-14 `2cdcc4150b` cmd/compile: fold negation into multiplication + 2025-11-14 `b57962b7c7` bytes: fix panic in bytes.Buffer.Peek + 2025-11-14 `0a569528ea` cmd/compile: optimize comparisons with single bit difference + 2025-11-14 `1e5e6663e9` cmd/compile: remove unnecessary casts and types from riscv64 rules + 2025-11-14 `ddd8558e61` go/types, types2: swap object.color for Checker.objPathIdx + 2025-11-14 `9daaab305c` cmd/link/internal/ld: make runtime.buildVersion with experiments valid + 2025-11-13 `d50a571ddf` test: fix tests to work with sizespecializedmalloc turned off + 2025-11-13 `704f841eab` cmd/trace: annotation proc start/stop with thread and proc always + 2025-11-13 `17a02b9106` net/http: remove unused isLitOrSingle and isNotToken + 2025-11-13 `ff61991aed` cmd/go: fix flaky TestScript/mod_get_direct + 2025-11-13 `129d0cb543` net/http/cgi: accept INCLUDED as protocol for server side includes + 2025-11-13 `77c5130100` go/types: minor simplification + 2025-11-13 `7601cd3880` go/types: generate cycles.go + 2025-11-13 `7a372affd9` go/types, types2: rename definedType to declaredType and clarify docs Change-Id: Ibaa9bdb982364892f80e511c1bb12661fcd5fb86	2025-11-20 14:40:43 -05:00
khr@golang.org	32f5aadd2f	cmd/compile: stack allocate backing stores during append We can already stack allocate the backing store during append if the resulting backing store doesn't escape. See CL 664299. This CL enables us to often stack allocate the backing store during append even if the result escapes. Typically, for code like: func f(n int) []int { var r []int for i := range n { r = append(r, i) } return r } the backing store for r escapes, but only by returning it. Could we operate with r on the stack for most of its lifeime, and only move it to the heap at the return point? The current implementation of append will need to do an allocation each time it calls growslice. This will happen on the 1st, 2nd, 4th, 8th, etc. append calls. The allocations done by all but the last growslice call will then immediately be garbage. We'd like to avoid doing some of those intermediate allocations if possible. We rewrite the above code by introducing a move2heap operation: func f(n int) []int { var r []int for i := range n { r = append(r, i) } r = move2heap(r) return r } Using the move2heap runtime function, which does: move2heap(r): If r is already backed by heap storage, return r. Otherwise, copy r to the heap and return the copy. Now we can treat the backing store of r allocated at the append site as not escaping. Previous stack allocation optimizations now apply, which can use a fixed-size stack-allocated backing store for r when appending. See the description in cmd/compile/internal/slice/slice.go for how we ensure that this optimization is safe. Change-Id: I81f36e58bade2241d07f67967d8d547fff5302b8 Reviewed-on: https://go-review.googlesource.com/c/go/+/707755 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-11-20 09:19:39 -08:00
Keith Randall	ba634ca5c7	cmd/compile: fold boolean NOT into branches Gets rid of an EOR $1 instruction. Change-Id: Ib032b0cee9ac484329c978af9b1305446f8d5dac Reviewed-on: https://go-review.googlesource.com/c/go/+/721501 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@google.com>	2025-11-18 09:31:58 -08:00
Keith Randall	e1a12c781f	cmd/compile: use 32x32->64 multiplies on arm64 Gets rid of some sign extensions. Change-Id: Ie67ef36b4ca1cd1a2cd9fa5d84578db553578a22 Reviewed-on: https://go-review.googlesource.com/c/go/+/721241 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@google.com>	2025-11-17 13:45:54 -08:00
Junyang Shao	934dbcea1a	[dev.simd] simd: update CPU feature APIs This CL also updates the internal uses of these APIs. This CL also fixed a instable output issue left by previous CLs. Change-Id: Ibc38361d35e2af0c4943a48578f3c610b74ed14d Reviewed-on: https://go-review.googlesource.com/c/go/+/720020 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-11-17 13:37:30 -08:00
Meng Zhuo	2cdcc4150b	cmd/compile: fold negation into multiplication goos: linux goarch: riscv64 pkg: cmd/compile/internal/test cpu: Spacemit(R) X60 │ /root/mul.base.log │ /root/mul.new.log │ │ sec/op │ sec/op vs base │ MulNeg 6.426µ ± 0% 4.501µ ± 0% -29.96% (p=0.000 n=10) Mul2Neg 9.000µ ± 0% 6.431µ ± 0% -28.54% (p=0.000 n=10) Mul2 1.263µ ± 0% 1.263µ ± 0% ~ (p=1.000 n=10) MulNeg2 1.577µ ± 0% 1.577µ ± 0% ~ (p=0.211 n=10) geomean 3.276µ 2.756µ -15.89% goos: linux goarch: amd64 pkg: cmd/compile/internal/test cpu: AMD EPYC 7532 32-Core Processor │ /root/base │ /root/new │ │ sec/op │ sec/op vs base │ MulNeg 691.9n ± 1% 319.4n ± 0% -53.83% (p=0.000 n=10) Mul2Neg 630.0n ± 0% 629.6n ± 0% -0.07% (p=0.000 n=10) Mul2 438.1n ± 0% 438.1n ± 0% ~ (p=0.728 n=10) MulNeg2 439.3n ± 0% 439.4n ± 0% ~ (p=0.656 n=10) geomean 538.2n 443.6n -17.58% Change-Id: Ice8e6c8d1e8e3009ba8a0b1b689205174e199019 Reviewed-on: https://go-review.googlesource.com/c/go/+/720180 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org>	2025-11-14 11:01:22 -08:00
Michael Munday	0a569528ea	cmd/compile: optimize comparisons with single bit difference Optimize comparisons with constants that only differ by 1 bit (i.e. a power of 2). For example: x == 4 \|\| x == 6 -> x\|2 == 6 x != 1 && x != 5 -> x\|4 != 5 Change-Id: Ic61719e5118446d21cf15652d9da22f7d95b2a15 Reviewed-on: https://go-review.googlesource.com/c/go/+/719420 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2025-11-14 10:59:56 -08:00
matloob@golang.org	d50a571ddf	test: fix tests to work with sizespecializedmalloc turned off Cq-Include-Trybots: luci.golang.try:gotip-linux-386-nosizespecializedmalloc,gotip-linux-amd64-nosizespecializedmalloc,gotip-linux-arm64-nosizespecializedmalloc Change-Id: I6a6a696465004b939c989afc058c4c3e1fb7134f Reviewed-on: https://go-review.googlesource.com/c/go/+/720401 Auto-Submit: Michael Matloob <matloob@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Matloob <matloob@google.com>	2025-11-13 16:57:31 -08:00
Cherry Mui	d7a0c45642	[dev.simd] all: merge master (`57362e9`) into dev.simd Conflicts: - src/cmd/compile/internal/ir/symtab.go - src/cmd/compile/internal/ssa/prove.go - src/cmd/compile/internal/ssa/rewriteAMD64.go - src/cmd/compile/internal/ssagen/intrinsics.go - src/cmd/compile/internal/typecheck/builtin.go - src/internal/buildcfg/exp.go - src/internal/strconv/ftoa.go - test/codegen/stack.go Manually resolved some conflicts: - Use internal/strconv for simd.String, remove internal/ftoa - prove.go is just copied from the one on the main branch. We have cherry-picked the changes to prove.go to main branch, so our copy is identical to an old version of the one on the main branch. There are CLs landed after our cherry-picks. Just copy it over to adopt the new code. Merge List: + 2025-11-13 `57362e9814` go/types, types2: check for direct cycles as a separate phase + 2025-11-13 `099e0027bd` cmd/go/internal/modfetch: consolidate global vars + 2025-11-13 `028375323f` cmd/go/internal/modfetch/codehost: fix flaky TestReadZip + 2025-11-13 `4ebf295b0b` runtime: prefer to restart Ps on the same M after STW + 2025-11-13 `625d8e9b9c` runtime/pprof: fix goroutine leak profile tests for noopt + 2025-11-13 `4684a26c26` spec: remove cycle restriction for type parameters + 2025-11-13 `0f9c8fb29d` cmd/asm,cmd/internal/obj/riscv: add support for riscv compressed instructions + 2025-11-13 `a15d036ce2` cmd/internal/obj/riscv: implement better bit pattern encoding + 2025-11-12 `abb241a789` cmd/internal/obj/loong64: add {,X}VS{ADD,SUB}.{B/H/W/V}{,U} instructions support + 2025-11-12 `0929d21978` cmd/go: keep objects alive while stopping cleanups + 2025-11-12 `f03d06ec1a` runtime: fix list test memory management for mayMoreStack + 2025-11-12 `48127f656b` crypto/internal/fips140/sha3: remove outdated TODO + 2025-11-12 `c3d1d42764` sync/atomic: amend comments for Value.{Swap,CompareAndSwap} + 2025-11-12 `e0807ba470` cmd/compile: don't clear ptrmask in fillptrmask + 2025-11-12 `66318d2b4b` internal/abi: correctly describe result in Name.Name doc comment + 2025-11-12 `34aef89366` cmd/compile: use FCLASSD for subnormal checks on riscv64 + 2025-11-12 `0c28789bd7` net/url: disallow raw IPv6 addresses in host + 2025-11-12 `4e761b9a18` cmd/compile: optimize liveness in stackalloc + 2025-11-12 `956909ff84` crypto/x509: move BetterTLS suite from crypto/tls + 2025-11-12 `6525f46707` cmd/link: change shdr and phdr from arrays to slices + 2025-11-12 `d3aeba1670` runtime: switch p.gcFractionalMarkTime to atomic.Int64 + 2025-11-12 `8873e8bea2` runtime,runtime/pprof: clean up goroutine leak profile writing + 2025-11-12 `b8b84b789e` cmd/go: clarify the -o testflag is only for copying the binary + 2025-11-12 `c761b26b56` mime: parse media types that contain braces + 2025-11-12 `65858a146e` os/exec: include Cmd.Start in the list of methods that run Cmd + 2025-11-11 `4bfc3a9d14` std,cmd: go fix -any std cmd + 2025-11-11 `2263d4aabd` runtime: doubly-linked sched.midle list + 2025-11-11 `046dce0e54` runtime: use new list type for spanSPMCs + 2025-11-11 `5f11275457` runtime: reusable intrusive doubly-linked list + 2025-11-11 `951cf0501b` internal/trace/testtrace: fix flag name typos + 2025-11-11 `2750f95291` cmd/go: implement accurate pseudo-versions for Mercurial + 2025-11-11 `b709a3e8b4` cmd/go/internal/vcweb: cache hg servers + 2025-11-11 `426ef30ecf` cmd/go: implement -reuse for Mercurial repos + 2025-11-10 `5241d114f5` spec: more precise prose for special case of append + 2025-11-10 `cdf64106f6` go/types, types2: first argument to append must never be be nil + 2025-11-10 `a0eb4548cf` .gitignore: ignore go test artifacts + 2025-11-10 `bf58e7845e` internal/trace: add "command" to convert text traces to raw + 2025-11-10 `052c192a4c` runtime: fix lock rank for work.spanSPMCs.lock + 2025-11-10 `bc5ffe5c79` internal/runtime/sys,math/bits: eliminate bounds checks on len8tab + 2025-11-10 `32f8d6486f` runtime: document that tracefpunwindoff applies to some profilers + 2025-11-10 `1c1c1942ba` cmd/go: remove redundant AVX regex in security flag checks + 2025-11-10 `3b3d6b9e5d` cmd/internal/obj/arm64: shorten constant integer loads + 2025-11-10 `5f4b5f1a19` runtime/msan: use different msan routine for copying + 2025-11-10 `0fe6c8e8c8` runtime: tweak wording for comment of mcache.flushGen + 2025-11-10 `95a0e5adc1` sync: don't call Done when f() panics in WaitGroup.Go + 2025-11-08 `e8ed85d6c2` cmd/go: update goSum if necessary + 2025-11-08 `b76103c08e` cmd/go: output missing GoDebug entries + 2025-11-07 `47a63a331d` cmd/go: rewrite hgrepo1 test repo to be deterministic + 2025-11-07 `7995751d3a` cmd/go: copy git reuse and support repos to hg + 2025-11-07 `66c7ca7fb3` cmd/go: improve TestScript/reuse_git + 2025-11-07 `de84ac55c6` cmd/link: clean up some comments to Go standards + 2025-11-07 `5cd1b73772` runtime: correctly print panics before fatal-ing on defer + 2025-11-07 `91ca80f970` runtime/cgo: improve error messages after pointer panic + 2025-11-07 `d36e88f21f` runtime: tweak wording for doc + 2025-11-06 `ad3ccd92e4` cmd/link: move pclntab out of relro section + 2025-11-06 `43b91e7abd` iter: fix a tiny doc comment bug + 2025-11-06 `48c7fa13c6` Revert "runtime: remove the pc field of _defer struct" + 2025-11-05 `8111104a21` cmd/internal/obj/loong64: add {,X}VSHUF.{B/H/W/V} instructions support + 2025-11-05 `2e2072561c` cmd/internal/obj/loong64: add {,X}VEXTRINS.{B,H,W,V} instruction support + 2025-11-05 `01c29d1f0b` internal/chacha8rand: replace VORV with instruction VMOVQ on loong64 + 2025-11-05 `f01a1841fd` cmd/compile: fix error message on loong64 + 2025-11-05 `8cf7a0b4c9` cmd/link: support weak binding on darwin + 2025-11-05 `2dd7e94e16` cmd/go: use go.dev instead of golang.org in flag errors + 2025-11-05 `28f1ad5782` cmd/go: fix TestScript/govcs + 2025-11-05 `daa220a1c9` cmd/go: silence TLS handshake errors during test + 2025-11-05 `3ae9e95002` cmd/go: fix TestCgoPkgConfig on darwin with pkg-config installed + 2025-11-05 `a494a26bc2` cmd/go: fix TestScript/vet_flags + 2025-11-05 `a8fb94969c` cmd/go: fix TestScript/tool_build_as_needed + 2025-11-05 `04f05219c4` cmd/cgo: skip escape checks if call site has no argument + 2025-11-04 `9f3a108ee0` os: ignore O_TRUNC errors on named pipes and terminal devices on Windows + 2025-11-04 `0e1bd8b5f1` cmd/link, runtime: don't store text start in pcHeader + 2025-11-04 `7347b54727` cmd/link: don't generate .gosymtab section + 2025-11-04 `6914dd11c0` cmd/link: add and use new SymKind SFirstUnallocated + 2025-11-04 `f5f14262d0` cmd/link: remove misleading comment + 2025-11-04 `61de3a9dae` cmd/link: remove unused SFILEPATH symbol kind + 2025-11-04 `8e2bd267b5` cmd/link: add comments for SymKind values + 2025-11-04 `16705b962e` cmd/compile: faster liveness analysis in regalloc + 2025-11-04 `a5fe6791d7` internal/syscall/windows: fix ReOpenFile sentinel error value + 2025-11-04 `a7d174ccaa` cmd/compile/internal/ssa: simplify riscv64 FCLASSD rewrite rules + 2025-11-04 `856238615d` runtime: amend doc for setPinned + 2025-11-04 `c7ccbddf22` cmd/compile/internal/ssa: more aggressive on dead auto elim + 2025-11-04 `75b2bb1d1a` cmd/cgo: drop pre-1.18 support + 2025-11-04 `dd839f1d00` internal/strconv: handle %f with fixedFtoa when possible + 2025-11-04 `6e165b4d17` cmd/compile: implement Avg64u, Hmul64, Hmul64u for wasm + 2025-11-04 `9f6590f333` encoding/pem: don't reslice in failure modes + 2025-11-03 `34fec512ce` internal/strconv: extract fixed-precision ftoa from ftoaryu.go + 2025-11-03 `162ba6cc40` internal/strconv: add tests and benchmarks for ftoaFixed + 2025-11-03 `9795c7ba22` internal/strconv: fix pow10 off-by-one in exponent result + 2025-11-03 `ad5e941a45` cmd/internal/obj/loong64: using {xv,v}slli.d to perform copying between vector registers + 2025-11-03 `dadbac0c9e` cmd/internal/obj/loong64: add VPERMI.W, XVPERMI.{W,V,Q} instruction support + 2025-11-03 `e2c6a2024c` runtime: avoid append in printint, printuint + 2025-11-03 `c93cc603cd` runtime: allow Stack to traceback goroutines in syscall _Grunning window + 2025-11-03 `b5353fd90a` runtime: don't panic in castogscanstatus + 2025-11-03 `43491f8d52` cmd/cgo: use the export'ed file/line in error messages + 2025-11-03 `aa94fdf0cc` cmd/go: link to go.dev/doc/godebug for removed GODEBUG settings + 2025-11-03 `4d2b03d2fc` crypto/tls: add BetterTLS test coverage + 2025-11-03 `0c4444e13d` cmd/internal/obj: support arm64 FMOVQ large offset encoding + 2025-11-03 `85bec791a0` cmd/go/testdata/script: loosen list_empty_importpath for freebsd + 2025-11-03 `17b57078ab` internal/runtime/cgobench: add cgo callback benchmark + 2025-11-03 `5f8fdb720c` cmd/go: move functions to methods + 2025-11-03 `0a95856b95` cmd/go: eliminate additional global variable + 2025-11-03 `f93186fb44` cmd/go/internal/telemetrystats: count cgo usage + 2025-11-03 `eaf28a27fd` runtime: update outdated comments for deferprocStack + 2025-11-03 `e12d8a90bf` all: remove extra space in the comments + 2025-11-03 `c5559344ac` internal/profile: optimize Parse allocs + 2025-11-03 `5132158ac2` bytes: add Buffer.Peek + 2025-11-03 `361d51a6b5` runtime: remove the pc field of _defer struct + 2025-11-03 `00ee1860ce` crypto/internal/constanttime: expose intrinsics to the FIPS 140-3 packages + 2025-11-02 `388c41c412` cmd/go: skip git sha256 tests if git < 2.29 + 2025-11-01 `385dc33250` runtime: prevent time.Timer.Reset(0) from deadlocking testing/synctest tests + 2025-10-31 `99b724f454` cmd/go: document purego convention + 2025-10-31 `27937289dc` runtime: avoid zeroing scavenged memory + 2025-10-30 `89dee70484` runtime: prioritize panic output over racefini + 2025-10-30 `8683bb846d` runtime: optimistically CAS atomicstatus directly in enter/exitsyscall + 2025-10-30 `5b8e850340` runtime: don't track scheduling latency for _Grunning <-> _Gsyscall + 2025-10-30 `251814e580` runtime: document tracer invariants explicitly + 2025-10-30 `7244e9221f` runtime: eliminate _Psyscall + 2025-10-30 `5ef19c0d0c` strconv: delete divmod1e9 + 2025-10-30 `d32b1f02c3` runtime: delete timediv + 2025-10-30 `cbbd385cb8` strconv: remove arch-specific decision in formatBase10 + 2025-10-30 `6aca04a73a` reflect: correct internal docs for uncommonType + 2025-10-30 `235b4e729d` cmd/compile/internal/ssa: model right shift more precisely + 2025-10-30 `d44db293f9` go/token: fix a typo in a comment + 2025-10-30 `cdc6b559ca` strconv: remove hand-written divide on 32-bit systems + 2025-10-30 `1e5bb416d8` cmd/compile: implement bits.Mul64 on 32-bit systems + 2025-10-30 `38317c44e7` crypto/internal/fips140/aes: fix CTR generator + 2025-10-29 `3be9a0e014` go/types, types: proceed with correct (invalid) type in case of a selector error + 2025-10-29 `d2c5fa0814` strconv: remove &0xFF trick in formatBase10 + 2025-10-29 `9bbda7c99d` cmd/compile: make prove understand div, mod better + 2025-10-29 `915c1839fe` test/codegen: simplify asmcheck pattern matching + 2025-10-29 `32ee3f3f73` runtime: tweak example code for gorecover + 2025-10-29 `da3fb90b23` crypto/internal/fips140/bigmod: fix extendedGCD comment + 2025-10-29 `9035f7aea5` runtime: use internal/strconv + 2025-10-29 `49c1da474d` internal/itoa, internal/runtime/strconv: delete + 2025-10-29 `b2a346bbd1` strconv: move all but Quote to internal/strconv + 2025-10-28 `041f564b3e` internal/runtime/gc/scan: avoid memory destination on VPCOMPRESSQ + 2025-10-28 `81afd3a59b` cmd/compile: extend ppc64 MADDLD to match const ADDconst & MULLDconst + 2025-10-28 `ea50d61b66` cmd/compile: name change isDirect -> isDirectAndComparable + 2025-10-28 `bd4dc413cd` cmd/compile: don't optimize away a panicing interface comparison + 2025-10-28 `30c047d0d0` cmd/compile: extend loong MOVidx rules to match ADDshiftLLV + 2025-10-28 `46e5e2b09a` runtime: define PanicBounds in funcdata.h + 2025-10-28 `3da0356685` crypto/internal/fips140test: collect 300M entropy samples for ESV + 2025-10-28 `d5953185d5` runtime: amend comments a bit + 2025-10-28 `12c8d14d94` errors: document that the target of Is must be comparable + 2025-10-28 `1f4d14e493` go/types, types2: pull up package-level object sort to a separate phase + 2025-10-28 `b8aa1ee442` go/types, types2: reduce locks held at once in resolveUnderlying + 2025-10-28 `24af441437` cmd/compile: rewrite proved multiplies by 0 or 1 into CondSelect + 2025-10-28 `2d33a456c6` cmd/compile: move branchelim supported arches to Config + 2025-10-27 `2c91c33e88` crypto/subtle,cmd/compile: add intrinsics for ConstantTimeSelect and Eq + 2025-10-27 `73d7635fae` cmd/compile: add generic rules to remove bool → int → bool roundtrips + 2025-10-27 `1662d55247` cmd/compile: do not Zext bools to 64bits in amd64 CMOV generation rules + 2025-10-27 `b8468d8c4e` cmd/compile: introduce bytesizeToConst to cleanup switches in prove + 2025-10-27 `9e25c2f6de` cmd/link: internal linking support for windows/arm64 + 2025-10-27 `ff2ebf69c4` internal/runtime/gc/scan: correct size class size check + 2025-10-27 `9a77aa4f08` cmd/compile: add position info to sccp debug messages + 2025-10-27 `77dc138030` cmd/compile: teach prove about unsigned rounding-up divide + 2025-10-27 `a0f33b2887` cmd/compile: change !l.nonzero() into l.maybezero() + 2025-10-27 `5453b788fd` cmd/compile: optimize Add64carry with unused carries into plain Add64 + 2025-10-27 `2ce5aab79e` cmd/compile: remove 68857 ModU flowLimit workaround in prove + 2025-10-27 `a50de4bda7` cmd/compile: remove 68857 min & max flowLimit workaround in prove + 2025-10-27 `53be78630a` cmd/compile: use topo-sort in prove to correctly learn facts while walking once + 2025-10-27 `dec2b4c83d` runtime: avoid bound check in freebsd binuptime + 2025-10-27 `916e682d51` cmd/internal/obj, cmd/asm: reclassify the offset of memory access operations on loong64 + 2025-10-27 `2835b994fb` cmd/go: remove global loader state variable + 2025-10-27 `139f89226f` cmd/go: use local state for telemetry + 2025-10-27 `8239156571` cmd/go: use tagged switch + 2025-10-27 `d741483a1f` cmd/go: increase stmt threshold on amd64 + 2025-10-27 `a6929cf4a7` cmd/go: removed unused code in toolchain.Exec + 2025-10-27 `180c07e2c1` go/types, types2: clarify docs for resolveUnderlying + 2025-10-27 `d8a32f3d4b` go/types, types2: wrap Named.fromRHS into Named.rhs + 2025-10-27 `b2af92270f` go/types, types2: verify stateMask transitions in debug mode + 2025-10-27 `92decdcbaa` net/url: further speed up escape and unescape + 2025-10-27 `5f4ec3541f` runtime: remove unused cgoCheckUsingType function + 2025-10-27 `189f2c08cc` time: rewrite IsZero method to use wall and ext fields + 2025-10-27 `f619b4a00d` cmd/go: reorder parameters so that context is first + 2025-10-27 `f527994c61` sync: update comments for Once.done + 2025-10-26 `5dcaf9a01b` runtime: add GOEXPERIMENT=runtimefree + 2025-10-26 `d7a52f9369` cmd/compile: use MOV(D\|F) with const for Const(64\|32)F on riscv64 + 2025-10-26 `6f04a92be3` internal/chacha8rand: provide vector implementation for riscv64 + 2025-10-26 `54e3adc533` cmd/go: use local loader state in test + 2025-10-26 `ca379b1c56` cmd/go: remove loaderstate dependency + 2025-10-26 `83a44bde64` cmd/go: remove unused loader state + 2025-10-26 `7e7cd9de68` cmd/go: remove temporary rf cleanup script + 2025-10-26 `53ad68de4b` cmd/compile: allow unaligned load/store on Wasm + 2025-10-25 `12ec09f434` cmd/go: use local state object in work.runBuild and work.runInstall + 2025-10-24 `643f80a11f` runtime: add ppc and s390 to 32 build constraints for gccgo + 2025-10-24 `0afbeb5102` runtime: add ppc and s390 to linux 32 bits syscall build constraints for gccgo + 2025-10-24 `7b506d106f` cmd/go: use local state object in `generate.runGenerate` + 2025-10-24 `26a8a21d7f` cmd/go: use local state object in `env.runEnv` + 2025-10-24 `f2dd3d7e31` cmd/go: use local state object in `vet.runVet` + 2025-10-24 `784700439a` cmd/go: use local state object in pkg `workcmd` + 2025-10-24 `69673e9be2` cmd/go: use local state object in `tool.runTool` + 2025-10-24 `2e12c5db11` cmd/go: use local state object in `test.runTest` + 2025-10-24 `fe345ff2ae` cmd/go: use local state object in `modget.runGet` + 2025-10-24 `d312e27e8b` cmd/go: use local state object in pkg `modcmd` + 2025-10-24 `ea9cf26aa1` cmd/go: use local state object in `list.runList` + 2025-10-24 `9926e1124e` cmd/go: use local state object in `bug.runBug` + 2025-10-24 `2c4fd7b2cd` cmd/go: use local state object in `run.runRun` + 2025-10-24 `ade9f33e1f` cmd/go: add loaderstate as field on `mvsReqs` + 2025-10-24 `ccf4192a31` cmd/go: make ImportMissingError work with local state + 2025-10-24 `f5403f15f0` debug/pe: check for zdebug_gdb_scripts section in testDWARF + 2025-10-24 `a26f860fa4` runtime: use 32-bit hash for maps on Wasm + 2025-10-24 `747fe2efed` encoding/json/v2: fix typo in documentation about errors.AsType + 2025-10-24 `94f47fc03f` cmd/link: remove pointless assignment in SetSymAlign + 2025-10-24 `e6cff69051` crypto/x509: move constraint checking after chain building + 2025-10-24 `f5f69a3de9` encoding/json/jsontext: avoid pinning application data in pools + 2025-10-24 `a6a59f0762` encoding/json/v2: use slices.Sort directly + 2025-10-24 `0d3dab9b1d` crypto/x509: simplify candidate chain filtering + 2025-10-24 `29046398bb` cmd/go: refactor injection of modload.LoaderState + 2025-10-24 `c18fa69e52` cmd/go: make ErrNoModRoot work with local state + 2025-10-24 `296ecc918d` cmd/go: add modload.State parameter to AllowedFunc + 2025-10-24 `c445a61e52` cmd/go: add loaderstate as field on `QueryMatchesMainModulesError` + 2025-10-24 `6ac40051d3` cmd/go: remove module loader state from ccompile + 2025-10-24 `6a5a452528` cmd/go: inject vendor dir into builder struct + 2025-10-23 `dfac972233` crypto/pbkdf2: add missing error return value in example + 2025-10-23 `47bf8f073e` unique: fix inconsistent panic prefix in canonmap cleanup path + 2025-10-23 `03bd43e8bb` go/types, types2: rename Named.resolve to unpack + 2025-10-23 `9fcdc814b2` go/types, types2: rename loaded namedState to lazyLoaded + 2025-10-23 `8401512a9b` go/types, types2: rename complete namedState to hasMethods + 2025-10-23 `cf826bfcb4` go/types, types2: set t.underlying exactly once in resolveUnderlying + 2025-10-23 `c4e910895b` net/url: speed up escape and unescape + 2025-10-23 `3f6ac3a10f` go/build: use slices.Equal + 2025-10-23 `839da71f89` encoding/pem: properly calculate end indexes + 2025-10-23 `39ed968832` cmd: update golang.org/x/arch for riscv64 disassembler + 2025-10-23 `ca448191c9` all: replace Split in loops with more efficient SplitSeq + 2025-10-23 `107fcb70de` internal/goroot: replace HasPrefix+TrimPrefix with CutPrefix + 2025-10-23 `8378276d66` strconv: optimize int-to-decimal and use consistently + 2025-10-23 `e5688d0bdd` cmd/internal/obj/riscv: simplify validation and encoding of raw instructions + 2025-10-22 `77fc27972a` doc/next: improve new(expr) release note + 2025-10-22 `d94a8c56ad` runtime: cleanup pagetrace + 2025-10-22 `02728a2846` crypto/internal/fips140test: add entropy SHA2-384 testing + 2025-10-22 `f92e01c117` runtime/cgo: fix cgoCheckArg description + 2025-10-22 `50586182ab` runtime: use backoff and ISB instruction to reduce contention in (lfstack).pop and (spanSet).pop on arm64 + 2025-10-22 `1ff59f3dd3` strconv: clean up powers-of-10 table, tests + 2025-10-22 `7c9fa4d5e9` cmd/go: check if build output should overwrite files with renames + 2025-10-22 `557b4d6e0f` comment: change slice to string in function comment/help + 2025-10-22 `d09a8c8ef4` go/types, types2: simplify locking in Named.resolveUnderlying + 2025-10-22 `5a42af7f6c` go/types, types2: in resolveUnderlying, only compute path when needed + 2025-10-22 `4bdb55b5b8` go/types, types2: rename Named.under to Named.resolveUnderlying + 2025-10-21 `29d43df8ab` go/build, cmd/go: use ast.ParseDirective for go:embed + 2025-10-21 `4e695dd634` go/ast: add ParseDirective for parsing directive comments + 2025-10-21 `06e57e60a7` go/types, types2: only report version errors if new(expr) is ok otherwise + 2025-10-21 `6c3d0d259f` path/filepath: reword documentation for Rel + 2025-10-21 `39fd61ddb0` go/types, types2: guard Named.underlying with Named.mu + 2025-10-21 `4a0115c886` runtime,syscall: implement and use syscalln on darwin + 2025-10-21 `261c561f5a` all: gofmt -w + 2025-10-21 `c9c78c06ef` strconv: embed testdata in test + 2025-10-21 `8f74f9daf4` sync: re-enable race even when panicking + 2025-10-21 `8a6c64f4fe` syscall: use rawSyscall6 to call ptrace in forkAndExecInChild + 2025-10-21 `4620db72d2` runtime: use timer_settime64 on 32-bit Linux + 2025-10-21 `b31dc77cea` os: support deleting read-only files in RemoveAll on older Windows versions + 2025-10-21 `46cc532900` cmd/compile/internal/ssa: fix typo in comment + 2025-10-21 `2163a58021` crypto/internal/fips140/entropy: increase AllocsPerRun iterations + 2025-10-21 `306eacbc11` cmd/go/testdata/script: disable list_empty_importpath test on Windows + 2025-10-21 `a5a249d6a6` all: eliminate unnecessary type conversions + 2025-10-21 `694182d77b` cmd/internal/obj/ppc64: improve large prologue generation + 2025-10-21 `b0dcb95542` cmd/compile: leave the horses alone + 2025-10-21 `9a5a1202f4` runtime: clean dead architectures from go:build constraint + 2025-10-21 `8539691d0c` crypto/internal/fips140/entropy: move to crypto/internal/entropy/v1.0.0 + 2025-10-20 `99cf4d671c` runtime: save lasx and lsx registers in loong64 async preemption + 2025-10-20 `79ae97fe9b` runtime: make procyieldAsm no longer loop infinitely if passed 0 + 2025-10-20 `f838faffe2` runtime: wrap procyield assembly and check for 0 + 2025-10-20 `ee4d2c312d` runtime/trace: dump test traces on validation failure + 2025-10-20 `7b81a1e107` net/url: reduce allocs in Encode + 2025-10-20 `e425176843` cmd/asm: fix typo in comment + 2025-10-20 `dc9a3e2a65` runtime: fix generation skew with trace reentrancy + 2025-10-20 `df33c17091` runtime: add _Gdeadextra status + 2025-10-20 `7503856d40` cmd/go: inject loaderstate into matcher function + 2025-10-20 `d57c3fd743` cmd/go: inject State parameter into `work.runInstall` + 2025-10-20 `e94a5008f6` cmd/go: inject State parameter into `work.runBuild` + 2025-10-20 `d9e6f95450` cmd/go: inject State parameter into `workcmd.runSync` + 2025-10-20 `9769a61e64` cmd/go: inject State parameter into `modget.runGet` + 2025-10-20 `f859799ccf` cmd/go: inject State parameter into `modcmd.runVerify` + 2025-10-20 `0f820aca29` cmd/go: inject State parameter into `modcmd.runVendor` + 2025-10-20 `92aa3e9e98` cmd/go: inject State parameter into `modcmd.runInit` + 2025-10-20 `e176dff41c` cmd/go: inject State parameter into `modcmd.runDownload` + 2025-10-20 `e7c66a58d5` cmd/go: inject State parameter into `toolchain.Select` + 2025-10-20 `4dc3dd9a86` cmd/go: add loaderstate to Switcher + 2025-10-20 `bcf7da1595` cmd/go: convert functions to methods + 2025-10-20 `0d3044f965` cmd/go: make Reset work with any State instance + 2025-10-20 `386d81151d` cmd/go: make setState work with any State instance + 2025-10-20 `a420aa221e` cmd/go: inject State parameter into `tool.runTool` + 2025-10-20 `441e7194a4` cmd/go: inject State parameter into `test.runTest` + 2025-10-20 `35e8309be2` cmd/go: inject State parameter into `list.runList` + 2025-10-20 `29a81624f7` cmd/go: inject state parameter into `fmtcmd.runFmt` + 2025-10-20 `f7eaea02fd` cmd/go: inject state parameter into `clean.runClean` + 2025-10-20 `58a8fdb6cf` cmd/go: inject State parameter into `bug.runBug` + 2025-10-20 `8d0bef7ffe` runtime: add linkname documentation and guidance + 2025-10-20 `3e43f48cb6` encoding/asn1: use reflect.TypeAssert to improve performance + 2025-10-20 `4ad5585c2c` runtime: fix _rt0_ppc64x_lib on aix + 2025-10-17 `a5f55a441e` cmd/fix: add modernize and inline analyzers + 2025-10-17 `80876f4b42` cmd/go/internal/vet: tweak help doc + 2025-10-17 `b5aefe07e5` all: remove unnecessary loop variable copies in tests + 2025-10-17 `5137c473b6` go/types, types2: remove references to under function in comments + 2025-10-17 `dbbb1bfc91` all: correct name for comments + 2025-10-17 `0983090171` encoding/pem: properly decode strange PEM data + 2025-10-17 `36863d6194` runtime: unify riscv64 library entry point + 2025-10-16 `0c14000f87` go/types, types2: remove under(Type) in favor of Type.Underlying() + 2025-10-16 `1099436f1b` go/types, types2: change and enforce lifecycle of Named.fromRHS and Named.underlying fields + 2025-10-16 `41f5659347` go/types, types2: remove superfluous unalias call (minor cleanup) + 2025-10-16 `e7351c03c8` runtime: use DC ZVA instead of its encoding in WORD in arm64 memclr + 2025-10-16 `6cbe0920c4` cmd: update to x/tools@7d9453cc + 2025-10-15 `45eee553e2` cmd/internal/obj: move ARM64RegisterExtension from cmd/asm/internal/arch + 2025-10-15 `27f9a6705c` runtime: increase repeat count for alloc test + 2025-10-15 `b68cebd809` net/http/httptest: record failed ResponseWriter writes + 2025-10-15 `f1fed742eb` cmd: fix three printf problems reported by newest vet + 2025-10-15 `0984dcd757` cmd/compile: fix an error in comments + 2025-10-15 `31f82877e8` go/types, types2: fix misleading internal comment + 2025-10-15 `6346349f56` cmd/compile: replace angle brackets with square + 2025-10-15 `284379cdfc` cmd/compile: remove rematerializable values from live set across calls + 2025-10-15 `519ae514ab` cmd/compile: eliminate bound check for slices of the same length + 2025-10-15 `b5a29cca48` cmd/distpack: add fix tool to inventory + 2025-10-15 `bb5eb51715` runtime/pprof: fix errors in pprof_test + 2025-10-15 `5c9a26c7f8` cmd/compile: use arm64 neon in LoweredMemmove/LoweredMemmoveLoop + 2025-10-15 `61d1ff61ad` cmd/compile: use block starting position for phi line number + 2025-10-15 `5b29875c8e` cmd/go: inject State parameter into `run.runRun` + 2025-10-15 `5113496805` runtime/pprof: skip flaky test TestProfilerStackDepth/heap for now + 2025-10-15 `36086e85f8` cmd/go: create temporary cleanup script + 2025-10-14 `7056c71d32` cmd/compile: disable use of new saturating float-to-int conversions + 2025-10-14 `6d5b13793f` Revert "cmd/compile: make 386 float-to-int conversions match amd64" + 2025-10-14 `bb2a14252b` Revert "runtime: adjust softfloat corner cases to match amd64/arm64" + 2025-10-14 `3bc9d9fa83` Revert "cmd/compile: make wasm match other platforms for FP->int32/64 conversions" + 2025-10-14 `ee5af46172` encoding/json: avoid misleading errors under goexperiment.jsonv2 + 2025-10-14 `11d3d2f77d` cmd/internal/obj/arm64: add support for PAC instructions + 2025-10-14 `4dbf1a5a4c` cmd/compile/internal/devirtualize: do not track assignments to non-PAUTO + 2025-10-14 `0ddb5ed465` cmd/compile/internal/devirtualize: use FatalfAt instead of Fatalf where possible + 2025-10-14 `0a239bcc99` Revert "net/url: disallow raw IPv6 addresses in host" + 2025-10-14 `5a9ef44bc0` cmd/compile/internal/devirtualize: fix OCONVNOP assertion + 2025-10-14 `3765758b96` go/types, types2: minor cleanup (remove TODO) + 2025-10-14 `f6b9d56aff` crypto/internal/fips140/entropy: fix benign race + 2025-10-14 `60f6d2f623` crypto/internal/fips140/entropy: support SHA-384 sizes for ACVP tests + 2025-10-13 `6fd8e88d07` encoding/json/v2: restrict presence of default options + 2025-10-13 `1abc6b0204` go/types, types2: permit type cycles through type parameter lists + 2025-10-13 `9fdd6904da` strconv: add tests that Java once mishandled + 2025-10-13 `9b8742f2e7` cmd/compile: don't depend on arch-dependent conversions in the compiler + 2025-10-13 `0e64ee1286` encoding/json/v2: report EOF for top-level values in UnmarshalDecode + 2025-10-13 `6bcd97d9f4` all: replace calls to errors.As with errors.AsType + 2025-10-11 `1cd71689f2` crypto/x509: rework fix for CVE-2025-58187 + 2025-10-11 `8aa1efa223` cmd/link: in TestFallocate, only check number of blocks on Darwin + 2025-10-10 `b497a29d25` encoding/json: fix regression in quoted numbers under goexperiment.jsonv2 + 2025-10-10 `48bb7a6114` cmd/compile: repair bisection behavior for float-to-unsigned conversion + 2025-10-10 `e8a53538b4` runtime: fail TestGoroutineLeakProfile on data race + 2025-10-10 `e3be2d1b2b` net/url: disallow raw IPv6 addresses in host + 2025-10-10 `aced4c79a2` net/http: strip request body headers on POST to GET redirects + 2025-10-10 `584a89fe74` all: omit unnecessary reassignment + 2025-10-10 `69e8279632` net/http: set cookie host to Request.Host when available + 2025-10-10 `6f4c63ba63` cmd/go: unify "go fix" and "go vet" + 2025-10-10 `955a5a0dc5` runtime: support arm64 Neon in async preemption + 2025-10-10 `5368e77429` net/http: run TestRequestWriteTransport with fake time to avoid flakes + 2025-10-09 `c53cb642de` internal/buildcfg: enable greenteagc experiment for loong64 + 2025-10-09 `954fdcc51a` cmd/compile: declare no output register for loong64 LoweredAtomic{And,Or}32 ops + 2025-10-09 `19a30ea3f2` cmd/compile: call generated size-specialized malloc functions directly + 2025-10-09 `80f3bb5516` reflect: remove timeout in TestChanOfGC + 2025-10-09 `9db7e30bb4` net/url: allow IP-literals with IPv4-mapped IPv6 addresses + 2025-10-09 `8d810286b3` cmd/compile: make wasm match other platforms for FP->int32/64 conversions + 2025-10-09 `b9f3accdcf` runtime: adjust softfloat corner cases to match amd64/arm64 + 2025-10-09 `78d75b3799` cmd/compile: make 386 float-to-int conversions match amd64 + 2025-10-09 `0e466a8d1d` cmd/compile: modify float-to-[u]int so that amd64 and arm64 match + 2025-10-08 `4837fbe414` net/http/httptest: check whether response bodies are allowed + 2025-10-08 `ee163197a8` path/filepath: return cleaned path from Rel + 2025-10-08 `de9da0de30` cmd/compile/internal/devirtualize: improve concrete type analysis + 2025-10-08 `ae094a1397` crypto/internal/fips140test: make entropy file pair names match + 2025-10-08 `941e5917c1` runtime: cleanup comments from asm_ppc64x.s improvements + 2025-10-08 `d945600d06` cmd/gofmt: change -d to exit 1 if diffs exist + 2025-10-08 `d4830c6130` cmd/internal/obj: fix Link.Diag printf errors + 2025-10-08 `e1ca1de123` net/http: format pprof.go + 2025-10-08 `e5d004c7a8` net/http: update HTTP/2 documentation to reference new config features + 2025-10-08 `97fd6bdecc` cmd/compile: fuse NaN checks with other comparisons + 2025-10-07 `78b43037dc` cmd/go: refactor usage of `workFilePath` + 2025-10-07 `bb1ca7ae81` cmd/go, testing: add TB.ArtifactDir and -artifacts flag + 2025-10-07 `1623927730` cmd/go: refactor usage of `requirements` + 2025-10-07 `a1661e776f` Revert "crypto/internal/fips140/subtle: add assembly implementation of xorBytes for mips64x" + 2025-10-07 `cb81270113` Revert "crypto/internal/fips140/subtle: add assembly implementation of xorBytes for mipsx" + 2025-10-07 `f2d0d05d28` cmd/go: refactor usage of `MainModules` + 2025-10-07 `f7a68d3804` archive/tar: set a limit on the size of GNU sparse file 1.0 regions + 2025-10-07 `463165699d` net/mail: avoid quadratic behavior in mail address parsing + 2025-10-07 `5ede095649` net/textproto: avoid quadratic complexity in Reader.ReadResponse + 2025-10-07 `5ce8cd16f3` encoding/pem: make Decode complexity linear + 2025-10-07 `f6f4e8b3ef` net/url: enforce stricter parsing of bracketed IPv6 hostnames + 2025-10-07 `7dd54e1fd7` runtime: make work.spanSPMCs.all doubly-linked + 2025-10-07 `3ee761739b` runtime: free spanQueue on P destroy + 2025-10-07 `8709a41d5e` encoding/asn1: prevent memory exhaustion when parsing using internal/saferio + 2025-10-07 `9b9d02c5a0` net/http: add httpcookiemaxnum GODEBUG option to limit number of cookies parsed + 2025-10-07 `3fc4c79fdb` crypto/x509: improve domain name verification + 2025-10-07 `6e4007e8cf` crypto/x509: mitigate DoS vector when intermediate certificate contains DSA public key + 2025-10-07 `6f7926589d` cmd/go: refactor usage of `modRoots` + 2025-10-07 `11d5484190` runtime: fix self-deadlock on sbrk platforms + 2025-10-07 `2e52060084` cmd/go: refactor usage of `RootMode` + 2025-10-07 `f86ddb54b5` cmd/go: refactor usage of `ForceUseModules` + 2025-10-07 `c938051dd0` Revert "cmd/compile: redo arm64 LR/FP save and restore" + 2025-10-07 `6469954203` runtime: assert p.destroy runs with GC not running + 2025-10-06 `4c0fd3a2b4` internal/goexperiment: remove the synctest GOEXPERIMENT + 2025-10-06 `c1e6e49d5d` fmt: reduce Errorf("x") allocations to match errors.New("x") + 2025-10-06 `7fbf54bfeb` internal/buildcfg: enable greenteagc experiment by default + 2025-10-06 `7bfeb43509` cmd/go: refactor usage of `initialized` + 2025-10-06 `1d62e92567` test/codegen: make sure assignment results are used. + 2025-10-06 `4fca79833f` runtime: delete redundant code in the page allocator + 2025-10-06 `719dfcf8a8` cmd/compile: redo arm64 LR/FP save and restore + 2025-10-06 `f3312124c2` runtime: remove batching from spanSPMC free + 2025-10-06 `24416458c2` cmd/go: export type State + 2025-10-06 `c2fb15164b` testing/synctest: remove Run + 2025-10-06 `ac2ec82172` runtime: bump thread count slack for TestReadMetricsSched + 2025-10-06 `e74b224b7c` crypto/tls: streamline BoGo testing w/ -bogo-local-dir + 2025-10-06 `3a05e7b032` spec: close tag + 2025-10-03 `2a71af11fc` net/url: improve URL docs + 2025-10-03 `ee5369b003` cmd/link: add LIBRARY statement only with -buildmode=cshared + 2025-10-03 `1bca4c1673` cmd/compile: improve slicemask removal + 2025-10-03 `38b26f29f1` cmd/compile: remove stores to unread parameters + 2025-10-03 `003b5ce1bc` cmd/compile: fix SIMD const rematerialization condition + 2025-10-03 `d91148c7a8` cmd/compile: enhance prove to infer bounds in slice len/cap calculations + 2025-10-03 `20c9377e47` cmd/compile: enhance the chunked indexing case to include reslicing + 2025-10-03 `ad3db2562e` cmd/compile: handle rematerialized op for incompatible reg constraint + 2025-10-03 `18cd4a1fc7` cmd/compile: use the right type for spill slot + 2025-10-03 `1caa95acfa` cmd/compile: enhance prove to deal with double-offset IsInBounds checks + 2025-10-03 `ec70d19023` cmd/compile: rewrite to elide Slicemask from len==c>0 slicing + 2025-10-03 `10e7968849` cmd/compile: accounts rematerialize ops's output reginfo + 2025-10-03 `ab043953cb` cmd/compile: minor tweak for race detector + 2025-10-03 `ebb72bef44` cmd/compile: don't treat devel compiler as a released compiler + 2025-10-03 `c54dc1418b` runtime: support valgrind (but not asan) in specialized malloc functions + 2025-10-03 `a7917eed70` internal/buildcfg: enable specializedmalloc experiment + 2025-10-03 `630799c6c9` crypto/tls: add flag to render HTML BoGo report Change-Id: I6bf904c523a77ee7d3dea9c8ae72292f8a5f2ba5	2025-11-13 16:54:07 -05:00
Michael Munday	34aef89366	cmd/compile: use FCLASSD for subnormal checks on riscv64 Only implemented for 64 bit floating point operations for now. goos: linux goarch: riscv64 pkg: math cpu: Spacemit(R) X60 │ sec/op │ sec/op vs base │ Acos 154.1n ± 0% 154.1n ± 0% ~ (p=0.303 n=10) Acosh 215.8n ± 6% 226.7n ± 0% ~ (p=0.439 n=10) Asin 149.2n ± 1% 149.2n ± 0% ~ (p=0.700 n=10) Asinh 262.1n ± 0% 258.5n ± 0% -1.37% (p=0.000 n=10) Atan 99.48n ± 0% 99.49n ± 0% ~ (p=0.836 n=10) Atanh 244.9n ± 0% 243.8n ± 0% -0.43% (p=0.002 n=10) Atan2 158.2n ± 1% 153.3n ± 0% -3.10% (p=0.000 n=10) Cbrt 186.8n ± 0% 181.1n ± 0% -3.03% (p=0.000 n=10) Ceil 36.71n ± 1% 36.71n ± 0% ~ (p=0.434 n=10) Copysign 6.531n ± 1% 6.526n ± 0% ~ (p=0.268 n=10) Cos 98.19n ± 0% 95.40n ± 0% -2.84% (p=0.000 n=10) Cosh 233.1n ± 0% 222.6n ± 0% -4.50% (p=0.000 n=10) Erf 122.5n ± 0% 114.2n ± 0% -6.78% (p=0.000 n=10) Erfc 126.0n ± 1% 116.6n ± 0% -7.46% (p=0.000 n=10) Erfinv 138.8n ± 0% 138.6n ± 0% ~ (p=0.082 n=10) Erfcinv 140.0n ± 0% 139.7n ± 0% ~ (p=0.359 n=10) Exp 193.3n ± 0% 184.2n ± 0% -4.68% (p=0.000 n=10) ExpGo 204.8n ± 0% 194.5n ± 0% -5.03% (p=0.000 n=10) Expm1 152.5n ± 1% 145.0n ± 0% -4.92% (p=0.000 n=10) Exp2 174.5n ± 0% 164.2n ± 0% -5.85% (p=0.000 n=10) Exp2Go 184.4n ± 1% 175.4n ± 0% -4.88% (p=0.000 n=10) Abs 4.912n ± 0% 4.914n ± 0% ~ (p=0.283 n=10) Dim 15.50n ± 1% 15.52n ± 1% ~ (p=0.331 n=10) Floor 36.89n ± 1% 36.76n ± 1% ~ (p=0.325 n=10) Max 31.05n ± 1% 31.17n ± 1% ~ (p=0.628 n=10) Min 31.01n ± 0% 31.06n ± 0% ~ (p=0.767 n=10) Mod 294.1n ± 0% 245.6n ± 0% -16.52% (p=0.000 n=10) Frexp 44.86n ± 1% 35.20n ± 0% -21.53% (p=0.000 n=10) Gamma 195.8n ± 0% 185.4n ± 1% -5.29% (p=0.000 n=10) Hypot 84.91n ± 0% 84.54n ± 1% -0.43% (p=0.006 n=10) HypotGo 96.70n ± 0% 95.42n ± 1% -1.32% (p=0.000 n=10) Ilogb 45.03n ± 0% 35.07n ± 1% -22.10% (p=0.000 n=10) J0 634.5n ± 0% 627.2n ± 0% -1.16% (p=0.000 n=10) J1 644.5n ± 0% 636.9n ± 0% -1.18% (p=0.000 n=10) Jn 1.357µ ± 0% 1.344µ ± 0% -0.92% (p=0.000 n=10) Ldexp 49.89n ± 0% 39.96n ± 0% -19.90% (p=0.000 n=10) Lgamma 186.6n ± 0% 184.3n ± 0% -1.21% (p=0.000 n=10) Log 150.4n ± 0% 141.1n ± 0% -6.15% (p=0.000 n=10) Logb 46.70n ± 0% 35.89n ± 0% -23.15% (p=0.000 n=10) Log1p 164.1n ± 0% 163.9n ± 0% ~ (p=0.122 n=10) Log10 153.1n ± 0% 143.5n ± 0% -6.24% (p=0.000 n=10) Log2 58.83n ± 0% 49.75n ± 0% -15.43% (p=0.000 n=10) Modf 40.82n ± 1% 40.78n ± 0% ~ (p=0.239 n=10) Nextafter32 49.15n ± 0% 48.93n ± 0% -0.44% (p=0.011 n=10) Nextafter64 43.33n ± 0% 43.23n ± 0% ~ (p=0.228 n=10) PowInt 269.4n ± 0% 243.8n ± 0% -9.49% (p=0.000 n=10) PowFrac 618.0n ± 0% 571.7n ± 0% -7.48% (p=0.000 n=10) Pow10Pos 13.09n ± 0% 13.05n ± 0% -0.31% (p=0.003 n=10) Pow10Neg 30.99n ± 1% 30.99n ± 0% ~ (p=0.173 n=10) Round 23.73n ± 0% 23.65n ± 0% -0.36% (p=0.011 n=10) RoundToEven 27.87n ± 0% 27.73n ± 0% -0.48% (p=0.003 n=10) Remainder 282.1n ± 0% 249.6n ± 0% -11.52% (p=0.000 n=10) Signbit 11.46n ± 0% 11.42n ± 0% -0.39% (p=0.003 n=10) Sin 115.2n ± 0% 113.2n ± 0% -1.74% (p=0.000 n=10) Sincos 140.6n ± 0% 138.6n ± 0% -1.39% (p=0.000 n=10) Sinh 252.0n ± 0% 241.4n ± 0% -4.21% (p=0.000 n=10) SqrtIndirect 4.909n ± 0% 4.893n ± 0% -0.34% (p=0.021 n=10) SqrtLatency 19.57n ± 1% 19.57n ± 0% ~ (p=0.087 n=10) SqrtIndirectLatency 19.64n ± 0% 19.57n ± 0% -0.36% (p=0.025 n=10) SqrtGoLatency 198.1n ± 0% 197.4n ± 0% -0.35% (p=0.014 n=10) SqrtPrime 5.733µ ± 0% 5.725µ ± 0% ~ (p=0.116 n=10) Tan 149.1n ± 0% 146.8n ± 0% -1.54% (p=0.000 n=10) Tanh 248.2n ± 1% 238.1n ± 0% -4.05% (p=0.000 n=10) Trunc 36.86n ± 0% 36.70n ± 0% -0.43% (p=0.029 n=10) Y0 638.2n ± 0% 633.6n ± 0% -0.71% (p=0.000 n=10) Y1 641.8n ± 0% 636.1n ± 0% -0.87% (p=0.000 n=10) Yn 1.358µ ± 0% 1.345µ ± 0% -0.92% (p=0.000 n=10) Float64bits 5.721n ± 0% 5.709n ± 0% -0.22% (p=0.044 n=10) Float64frombits 4.905n ± 0% 4.893n ± 0% ~ (p=0.266 n=10) Float32bits 12.27n ± 0% 12.23n ± 0% ~ (p=0.122 n=10) Float32frombits 4.909n ± 0% 4.893n ± 0% -0.32% (p=0.024 n=10) FMA 6.556n ± 0% 6.526n ± 0% ~ (p=0.283 n=10) geomean 86.82n 83.75n -3.54% Change-Id: I522297a79646d76543d516accce291f5a3cea337 Reviewed-on: https://go-review.googlesource.com/c/go/+/717560 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>	2025-11-12 10:03:41 -08:00
Junyang Shao	86b4fe31d9	[dev.simd] cmd/compile: add masked merging ops and optimizations This CL generates optimizations for masked variant of AVX512 instructions for patterns: x.Op(y).Merge(z, mask) => OpMasked(z, x, y mask), where OpMasked is resultInArg0. Change-Id: Ife7ccc9ddbf76ae921a085bd6a42b965da9bc179 Reviewed-on: https://go-review.googlesource.com/c/go/+/718160 Reviewed-by: David Chase <drchase@google.com> TryBot-Bypass: Junyang Shao <shaojunyang@google.com>	2025-11-11 13:34:39 -08:00
Junyang Shao	771a1dc216	[dev.simd] cmd/compile: add peepholes for all masked ops and bug fixes For 512-bits they are unchanged. This CL adds the optimization rules for 128/256-bits under feature check. This CL also fixed a bug for masked load variant of instructions and make them zeroing by default as well. Change-Id: I6fe395541c0cd509984a81841420e71c3af732f2 Reviewed-on: https://go-review.googlesource.com/c/go/+/717822 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-11-10 09:53:24 -08:00
Youlin Feng	c7ccbddf22	cmd/compile/internal/ssa: more aggressive on dead auto elim Propagate "unread" across OpMoves. If the addr of this auto is only used by an OpMove as its source arg, and the OpMove's target arg is the addr of another auto. If the 2nd auto can be eliminated, this one can also be eliminated. This CL eliminates unnecessary memory copies and makes the frame smaller in the following code snippet: func contains(m map[string][16]int, k string) bool { _, ok := m[k] return ok } These are the benchmark results followed by the benchmark code: goos: linux goarch: amd64 cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ Map1Access2Ok-8 9.582n ± 2% 9.226n ± 0% -3.72% (p=0.000 n=20) Map2Access2Ok-8 13.79n ± 1% 10.24n ± 1% -25.77% (p=0.000 n=20) Map3Access2Ok-8 68.68n ± 1% 12.65n ± 1% -81.58% (p=0.000 n=20) package main_test import "testing" var ( m1 = map[int]int{} m2 = map[int][16]int{} m3 = map[int][256]int{} ) func init() { for i := range 1000 { m1[i] = i m2[i] = [16]int{15:i} m3[i] = [256]int{255:i} } } func BenchmarkMap1Access2Ok(b testing.B) { for i := range b.N { _, ok := m1[i%1000] if !ok { b.Errorf("%d not found", i) } } } func BenchmarkMap2Access2Ok(b testing.B) { for i := range b.N { _, ok := m2[i%1000] if !ok { b.Errorf("%d not found", i) } } } func BenchmarkMap3Access2Ok(b *testing.B) { for i := range b.N { _, ok := m3[i%1000] if !ok { b.Errorf("%d not found", i) } } } Fixes #75398 Change-Id: If75e9caaa50d460efc31a94565b9ba28c8158771 Reviewed-on: https://go-review.googlesource.com/c/go/+/702875 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>	2025-11-04 12:46:15 -08:00
Russ Cox	6e165b4d17	cmd/compile: implement Avg64u, Hmul64, Hmul64u for wasm This lets us remove useAvg and useHmul from the division rules. The compiler is simpler and the generated code is faster. goos: wasip1 goarch: wasm pkg: internal/strconv │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ AppendFloat/Decimal 192.8n ± 1% 194.6n ± 0% +0.91% (p=0.000 n=10) AppendFloat/Float 328.6n ± 0% 279.6n ± 0% -14.93% (p=0.000 n=10) AppendFloat/Exp 335.6n ± 1% 289.2n ± 1% -13.80% (p=0.000 n=10) AppendFloat/NegExp 336.0n ± 0% 289.1n ± 1% -13.97% (p=0.000 n=10) AppendFloat/LongExp 332.4n ± 0% 285.2n ± 1% -14.20% (p=0.000 n=10) AppendFloat/Big 348.2n ± 0% 300.1n ± 0% -13.83% (p=0.000 n=10) AppendFloat/BinaryExp 137.4n ± 0% 138.2n ± 0% +0.55% (p=0.001 n=10) AppendFloat/32Integer 193.3n ± 1% 196.5n ± 0% +1.66% (p=0.000 n=10) AppendFloat/32ExactFraction 283.3n ± 0% 268.9n ± 1% -5.08% (p=0.000 n=10) AppendFloat/32Point 279.9n ± 0% 266.5n ± 0% -4.80% (p=0.000 n=10) AppendFloat/32Exp 300.1n ± 0% 288.3n ± 1% -3.90% (p=0.000 n=10) AppendFloat/32NegExp 288.2n ± 1% 277.9n ± 1% -3.59% (p=0.000 n=10) AppendFloat/32Shortest 261.7n ± 0% 250.2n ± 0% -4.39% (p=0.000 n=10) AppendFloat/32Fixed8Hard 173.3n ± 1% 158.9n ± 1% -8.31% (p=0.000 n=10) AppendFloat/32Fixed9Hard 180.0n ± 0% 167.9n ± 2% -6.70% (p=0.000 n=10) AppendFloat/64Fixed1 167.1n ± 0% 149.6n ± 1% -10.50% (p=0.000 n=10) AppendFloat/64Fixed2 162.4n ± 1% 146.5n ± 0% -9.73% (p=0.000 n=10) AppendFloat/64Fixed2.5 165.5n ± 0% 149.4n ± 1% -9.70% (p=0.000 n=10) AppendFloat/64Fixed3 166.4n ± 1% 150.2n ± 0% -9.74% (p=0.000 n=10) AppendFloat/64Fixed4 163.7n ± 0% 149.6n ± 1% -8.62% (p=0.000 n=10) AppendFloat/64Fixed5Hard 182.8n ± 1% 167.1n ± 1% -8.61% (p=0.000 n=10) AppendFloat/64Fixed12 222.2n ± 0% 208.8n ± 0% -6.05% (p=0.000 n=10) AppendFloat/64Fixed16 197.6n ± 1% 181.7n ± 0% -8.02% (p=0.000 n=10) AppendFloat/64Fixed12Hard 194.5n ± 0% 181.0n ± 0% -6.99% (p=0.000 n=10) AppendFloat/64Fixed17Hard 205.1n ± 1% 191.9n ± 0% -6.44% (p=0.000 n=10) AppendFloat/64Fixed18Hard 6.269µ ± 0% 6.643µ ± 0% +5.97% (p=0.000 n=10) AppendFloat/64FixedF1 211.7n ± 1% 197.0n ± 0% -6.95% (p=0.000 n=10) AppendFloat/64FixedF2 189.4n ± 0% 174.2n ± 0% -8.08% (p=0.000 n=10) AppendFloat/64FixedF3 169.0n ± 0% 154.9n ± 0% -8.32% (p=0.000 n=10) AppendFloat/Slowpath64 321.2n ± 0% 274.2n ± 1% -14.63% (p=0.000 n=10) AppendFloat/SlowpathDenormal64 307.4n ± 1% 261.2n ± 0% -15.03% (p=0.000 n=10) AppendInt 3.367µ ± 1% 3.376µ ± 0% ~ (p=0.517 n=10) AppendUint 675.5n ± 0% 676.9n ± 0% ~ (p=0.196 n=10) AppendIntSmall 28.13n ± 1% 28.17n ± 0% +0.14% (p=0.015 n=10) AppendUintVarlen/digits=1 20.70n ± 0% 20.51n ± 1% -0.89% (p=0.018 n=10) AppendUintVarlen/digits=2 20.43n ± 0% 20.27n ± 0% -0.81% (p=0.001 n=10) AppendUintVarlen/digits=3 38.48n ± 0% 37.93n ± 0% -1.43% (p=0.000 n=10) AppendUintVarlen/digits=4 41.10n ± 0% 38.78n ± 1% -5.62% (p=0.000 n=10) AppendUintVarlen/digits=5 42.25n ± 1% 42.11n ± 0% -0.32% (p=0.041 n=10) AppendUintVarlen/digits=6 45.40n ± 1% 43.14n ± 0% -4.98% (p=0.000 n=10) AppendUintVarlen/digits=7 46.81n ± 1% 46.03n ± 0% -1.66% (p=0.000 n=10) AppendUintVarlen/digits=8 48.88n ± 1% 46.59n ± 1% -4.68% (p=0.000 n=10) AppendUintVarlen/digits=9 49.94n ± 2% 49.41n ± 1% -1.06% (p=0.000 n=10) AppendUintVarlen/digits=10 57.28n ± 1% 56.92n ± 1% -0.62% (p=0.045 n=10) AppendUintVarlen/digits=11 60.09n ± 1% 58.11n ± 2% -3.30% (p=0.000 n=10) AppendUintVarlen/digits=12 62.22n ± 0% 61.85n ± 0% -0.59% (p=0.000 n=10) AppendUintVarlen/digits=13 64.94n ± 0% 62.92n ± 0% -3.10% (p=0.000 n=10) AppendUintVarlen/digits=14 65.42n ± 1% 65.19n ± 1% -0.34% (p=0.005 n=10) AppendUintVarlen/digits=15 68.17n ± 0% 66.13n ± 0% -2.99% (p=0.000 n=10) AppendUintVarlen/digits=16 70.21n ± 1% 70.09n ± 1% ~ (p=0.517 n=10) AppendUintVarlen/digits=17 72.93n ± 0% 70.49n ± 0% -3.34% (p=0.000 n=10) AppendUintVarlen/digits=18 73.01n ± 0% 72.75n ± 0% -0.35% (p=0.000 n=10) AppendUintVarlen/digits=19 79.27n ± 1% 79.49n ± 1% ~ (p=0.671 n=10) AppendUintVarlen/digits=20 82.18n ± 0% 80.43n ± 1% -2.14% (p=0.000 n=10) geomean 143.4n 136.0n -5.20% Change-Id: I8245814a0259ad13cf9225f57db8e9fe3d2e4267 Reviewed-on: https://go-review.googlesource.com/c/go/+/717407 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2025-11-04 11:38:18 -08:00
Russ Cox	1e5bb416d8	cmd/compile: implement bits.Mul64 on 32-bit systems This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u on 32-bit systems, with the effect that constant division of both int64s and uint64s can now be emitted directly in all cases, and also that bits.Mul64 can be intrinsified on 32-bit systems. Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were implemented as uint32 divisions by c and some fixup. After expanding those smaller constant divisions, the code for i/999 required: (386) 7 mul, 10 add, 2 sub, 3 rotate, 3 shift (104 bytes) (arm) 7 mul, 9 add, 3 sub, 2 shift (104 bytes) (mips) 7 mul, 10 add, 5 sub, 6 shift, 3 sgtu (176 bytes) For that much code, we might as well use a full 64x64->128 multiply that can be used for all divisors, not just small ones. Having done that, the same i/999 now generates: (386) 4 mul, 9 add, 2 sub, 2 or, 6 shift (112 bytes) (arm) 4 mul, 8 add, 2 sub, 2 or, 3 shift (92 bytes) (mips) 4 mul, 11 add, 3 sub, 6 shift, 8 sgtu, 4 or (196 bytes) The size increase on 386 is due to a few extra register spills. The size increase on mips is due to add-with-carry being hard. The new approach is more general, letting us delete the old special case and guarantee that all int64 and uint64 divisions by constants are generated directly on 32-bit systems. This especially speeds up code making heavy use of bits.Mul64 with a constant argument, which happens in strconv and various crypto packages. A few examples are benchmarked below. pkg: cmd/compile/internal/test benchmark \ host local linux-amd64 s7 linux-386 s7:GOARCH=386 vs base vs base vs base vs base vs base DivconstI64 ~ ~ ~ -49.66% -21.02% ModconstI64 ~ ~ ~ -13.45% +14.52% DivisiblePow2constI64 ~ ~ ~ +0.97% -1.32% DivisibleconstI64 ~ ~ ~ -20.01% -48.28% DivisibleWDivconstI64 ~ ~ -1.76% -38.59% -42.74% DivconstU64/3 ~ ~ ~ -13.82% -4.09% DivconstU64/5 ~ ~ ~ -14.10% -3.54% DivconstU64/37 -2.07% -4.45% ~ -19.60% -9.55% DivconstU64/1234567 ~ ~ ~ -61.55% -56.93% ModconstU64 ~ ~ ~ -6.25% ~ DivisibleconstU64 ~ ~ ~ -2.78% -7.82% DivisibleWDivconstU64 ~ ~ ~ +4.23% +2.56% pkg: math/bits benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base Add ~ ~ ~ ~ Add32 +1.59% ~ ~ ~ Add64 ~ ~ ~ ~ Add64multiple ~ ~ ~ ~ Sub ~ ~ ~ ~ Sub32 ~ ~ ~ ~ Sub64 ~ ~ -9.20% ~ Sub64multiple ~ ~ ~ ~ Mul ~ ~ ~ ~ Mul32 ~ ~ ~ ~ Mul64 ~ ~ -41.58% -53.21% Div ~ ~ ~ ~ Div32 ~ ~ ~ ~ Div64 ~ ~ ~ ~ pkg: strconv benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base ParseInt/Pos/7bit ~ ~ -11.08% -6.75% ParseInt/Pos/26bit ~ ~ -13.65% -11.02% ParseInt/Pos/31bit ~ ~ -14.65% -9.71% ParseInt/Pos/56bit -1.80% ~ -17.97% -10.78% ParseInt/Pos/63bit ~ ~ -13.85% -9.63% ParseInt/Neg/7bit ~ ~ -12.14% -7.26% ParseInt/Neg/26bit ~ ~ -14.18% -9.81% ParseInt/Neg/31bit ~ ~ -14.51% -9.02% ParseInt/Neg/56bit ~ ~ -15.79% -9.79% ParseInt/Neg/63bit ~ ~ -15.68% -11.07% AppendFloat/Decimal ~ ~ -7.25% -12.26% AppendFloat/Float ~ ~ -15.96% -19.45% AppendFloat/Exp ~ ~ -13.96% -17.76% AppendFloat/NegExp ~ ~ -14.89% -20.27% AppendFloat/LongExp ~ ~ -12.68% -17.97% AppendFloat/Big ~ ~ -11.10% -16.64% AppendFloat/BinaryExp ~ ~ ~ ~ AppendFloat/32Integer ~ ~ -10.05% -10.91% AppendFloat/32ExactFraction ~ ~ -8.93% -13.00% AppendFloat/32Point ~ ~ -10.36% -14.89% AppendFloat/32Exp ~ ~ -9.88% -13.54% AppendFloat/32NegExp ~ ~ -10.16% -14.26% AppendFloat/32Shortest ~ ~ -11.39% -14.96% AppendFloat/32Fixed8Hard ~ ~ ~ -2.31% AppendFloat/32Fixed9Hard ~ ~ ~ -7.01% AppendFloat/64Fixed1 ~ ~ -2.83% -8.23% AppendFloat/64Fixed2 ~ ~ ~ -7.94% AppendFloat/64Fixed3 ~ ~ -4.07% -7.22% AppendFloat/64Fixed4 ~ ~ -7.24% -7.62% AppendFloat/64Fixed12 ~ ~ -6.57% -4.82% AppendFloat/64Fixed16 ~ ~ -4.00% -5.81% AppendFloat/64Fixed12Hard -2.22% ~ -4.07% -6.35% AppendFloat/64Fixed17Hard -2.12% ~ ~ -3.79% AppendFloat/64Fixed18Hard -1.89% ~ +2.48% ~ AppendFloat/Slowpath64 -1.85% ~ -14.49% -18.21% AppendFloat/SlowpathDenormal64 ~ ~ -13.08% -19.41% pkg: crypto/internal/fips140/nistec/fiat benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base Mul/P224 ~ ~ -29.95% -39.60% Mul/P384 ~ ~ -37.11% -63.33% Mul/P521 ~ ~ -26.62% -12.42% Square/P224 +1.46% ~ -40.62% -49.18% Square/P384 ~ ~ -45.51% -69.68% Square/P521 +90.37% ~ -25.26% -11.23% (The +90% is a separate problem and not real; that much variation can be seen on that system by running the same binary from two different files.) pkg: crypto/internal/fips140/edwards25519 benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base EncodingDecoding ~ ~ -34.67% -35.75% ScalarBaseMult ~ ~ -31.25% -30.29% ScalarMult ~ ~ -33.45% -32.54% VarTimeDoubleScalarBaseMult ~ ~ -33.78% -33.68% Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65 Reviewed-on: https://go-review.googlesource.com/c/go/+/716061 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2025-10-30 08:04:20 -07:00
Russ Cox	9bbda7c99d	cmd/compile: make prove understand div, mod better This CL introduces new divisible and divmod passes that rewrite divisibility checks and div, mod, and mul. These happen after prove, so that prove can make better sense of the code for deriving bounds, and they must run before decompose, so that 64-bit ops can be lowered to 32-bit ops on 32-bit systems. And then they need another generic pass as well, to optimize the generated code before decomposing. The three opt passes are "opt", "middle opt", and "late opt". (Perhaps instead they should be "generic", "opt", and "late opt"?) The "late opt" pass repeats the "middle opt" work on any new code that has been generated in the interim. There will not be new divs or mods, but there may be new muls. The x%c==0 rewrite rules are much simpler now, since they can match before divs have been rewritten. This has the effect of applying them more consistently and making the rewrite rules independent of the exact div rewrites. Prove is also now charged with marking signed div/mod as unsigned when the arguments call for it, allowing simpler code to be emitted in various cases. For example, t.Seconds()/2 and len(x)/2 are now recognized as unsigned, meaning they compile to a simple shift (unsigned division), avoiding the more complex fixup we need for signed values. https://gist.github.com/rsc/99d9d3bd99cde87b6a1a390e3d85aa32 shows a diff of 'go build -a -gcflags=-d=ssa/prove/debug=1 std' output before and after. "Proved Rsh64x64 shifts to zero" is replaced by the higher-level "Proved Div64 is unsigned" (the shift was in the signed expansion of div by constant), but otherwise prove is only finding more things to prove. One short example, in code that does x[i%len(x)]: < runtime/mfinal.go:131:34: Proved Rsh64x64 shifts to zero --- > runtime/mfinal.go:131:34: Proved Div64 is unsigned > runtime/mfinal.go:131:38: Proved IsInBounds A longer example: < crypto/internal/fips140/sha3/shake.go:28:30: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:38:27: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:53:46: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:55:46: Proved Rsh64x64 shifts to zero --- > crypto/internal/fips140/sha3/shake.go:28:30: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:28:30: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:28:30: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:38:27: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:45:7: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:46:4: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:53:46: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:53:46: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:53:46: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:55:46: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:55:46: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:55:46: Proved IsSliceInBounds These diffs are due to the smaller opt being better and taking work away from prove: < image/jpeg/dct.go:307:5: Proved IsInBounds < image/jpeg/dct.go:308:5: Proved IsInBounds ... < image/jpeg/dct.go:442:5: Proved IsInBounds In the old opt, Mul by 8 was rewritten to Lsh by 3 early. This CL delays that rule to help prove recognize mods, but it also helps opt constant-fold the slice x[8i:8i+8:8*i+8]. Specifically, computing the length, opt can now do: (Sub64 (Add (Mul 8 i) 8) (Add (Mul 8 i) 8)) -> (Add 8 (Sub (Mul 8 i) (Mul 8 i))) -> (Add 8 (Mul 8 (Sub i i))) -> (Add 8 (Mul 8 0)) -> (Add 8 0) -> 8 The key step is (Sub (Mul x y) (Mul x z)) -> (Mul x (Sub y z)), Leaving the multiply as Mul enables using that step; the old rewrite to Lsh blocked it, leaving prove to figure out the length and then remove the bounds checks. But now opt can evaluate the length down to a constant 8 and then constant-fold away the bounds checks 0 < 8, 1 < 8, and so on. After that, the compiler has nothing left to prove. Benchmarks are noisy in general; I checked the assembly for the many large increases below, and the vast majority are unchanged and presumably hitting the caches differently in some way. The divisibility optimizations were not reliably triggering before. This leads to a very large improvement in some cases, like DivisiblePow2constI64, DivisibleconstI64 on 64-bit systems and DivisbleconstU64 on 32-bit systems. Another way the divisibility optimizations were unreliable before was incorrectly triggering for x/3, x%3 even though they are written not to do that. There is a real but small slowdown in the DivisibleWDivconst benchmarks on Mac because in the cases used in the benchmark, it is still faster (on Mac) to do the divisibility check than to remultiply. This may be worth further study. Perhaps when there is no rotate (meaning the divisor is odd), the divisibility optimization should be enabled always. In any event, this CL makes it possible to study that. benchmark \ host s7 linux-amd64 mac linux-arm64 linux-ppc64le linux-386 s7:GOARCH=386 linux-arm vs base vs base vs base vs base vs base vs base vs base vs base LoadAdd ~ ~ ~ ~ ~ -1.59% ~ ~ ExtShift ~ ~ -42.14% +0.10% ~ +1.44% +5.66% +8.50% Modify ~ ~ ~ ~ ~ ~ ~ -1.53% MullImm ~ ~ ~ ~ ~ +37.90% -21.87% +3.05% ConstModify ~ ~ ~ ~ -49.14% ~ ~ ~ BitSet ~ ~ ~ ~ -15.86% -14.57% +6.44% +0.06% BitClear ~ ~ ~ ~ ~ +1.78% +3.50% +0.06% BitToggle ~ ~ ~ ~ ~ -16.09% +2.91% ~ BitSetConst ~ ~ ~ ~ ~ ~ ~ -0.49% BitClearConst ~ ~ ~ ~ -28.29% ~ ~ -0.40% BitToggleConst ~ ~ ~ +8.89% -31.19% ~ ~ -0.77% MulNeg ~ ~ ~ ~ ~ ~ ~ ~ Mul2Neg ~ ~ -4.83% ~ ~ -13.75% -5.92% ~ DivconstI64 ~ ~ ~ ~ ~ -30.12% ~ +0.50% ModconstI64 ~ ~ -9.94% -4.63% ~ +3.15% ~ +5.32% DivisiblePow2constI64 -34.49% -12.58% ~ ~ -12.25% ~ ~ ~ DivisibleconstI64 -24.69% -25.06% -0.40% -2.27% -42.61% -3.31% ~ +1.63% DivisibleWDivconstI64 ~ ~ ~ ~ ~ -17.55% ~ -0.60% DivconstU64/3 ~ ~ ~ ~ ~ +1.51% ~ ~ DivconstU64/5 ~ ~ ~ ~ ~ ~ ~ ~ DivconstU64/37 ~ ~ -0.18% ~ ~ +2.70% ~ ~ DivconstU64/1234567 ~ ~ ~ ~ ~ ~ ~ +0.12% ModconstU64 ~ ~ ~ -0.24% ~ -5.10% -1.07% -1.56% DivisibleconstU64 ~ ~ ~ ~ ~ -29.01% -59.13% -50.72% DivisibleWDivconstU64 ~ ~ -12.18% -18.88% ~ -5.50% -3.91% +5.17% DivconstI32 ~ ~ -0.48% ~ -34.69% +89.01% -6.01% -16.67% ModconstI32 ~ +2.95% -0.33% ~ ~ -2.98% -5.40% -8.30% DivisiblePow2constI32 ~ ~ ~ ~ ~ ~ ~ -16.22% DivisibleconstI32 ~ ~ ~ ~ ~ -37.27% -47.75% -25.03% DivisibleWDivconstI32 -11.59% +5.22% -12.99% -23.83% ~ +45.95% -7.03% -10.01% DivconstU32 ~ ~ ~ ~ ~ +74.71% +4.81% ~ ModconstU32 ~ ~ +0.53% +0.18% ~ +51.16% ~ ~ DivisibleconstU32 ~ ~ ~ -0.62% ~ -4.25% ~ ~ DivisibleWDivconstU32 -2.77% +5.56% +11.12% -5.15% ~ +48.70% +25.11% -4.07% DivconstI16 -6.06% ~ -0.33% +0.22% ~ ~ -9.68% +5.47% ModconstI16 ~ ~ +4.44% +2.82% ~ ~ ~ +5.06% DivisiblePow2constI16 ~ ~ ~ ~ ~ ~ ~ -0.17% DivisibleconstI16 ~ ~ -0.23% ~ ~ ~ +4.60% +6.64% DivisibleWDivconstI16 -1.44% -0.43% +13.48% -5.76% ~ +1.62% -23.15% -9.06% DivconstU16 +1.61% ~ -0.35% -0.47% ~ ~ +15.59% ~ ModconstU16 ~ ~ ~ ~ ~ -0.72% ~ +14.23% DivisibleconstU16 ~ ~ -0.05% +3.00% ~ ~ ~ +5.06% DivisibleWDivconstU16 +52.10% +0.75% +17.28% +4.79% ~ -37.39% +5.28% -9.06% DivconstI8 ~ ~ -0.34% -0.96% ~ ~ -9.20% ~ ModconstI8 +2.29% ~ +4.38% +2.96% ~ ~ ~ ~ DivisiblePow2constI8 ~ ~ ~ ~ ~ ~ ~ ~ DivisibleconstI8 ~ ~ ~ ~ ~ ~ +6.04% ~ DivisibleWDivconstI8 -26.44% +1.69% +17.03% +4.05% ~ +32.48% -24.90% ~ DivconstU8 -4.50% +14.06% -0.28% ~ ~ ~ +4.16% +0.88% ModconstU8 ~ ~ +25.84% -0.64% ~ ~ ~ ~ DivisibleconstU8 ~ ~ -5.70% ~ ~ ~ ~ ~ DivisibleWDivconstU8 +49.55% +9.07% ~ +4.03% +53.87% -40.03% +39.72% -3.01% Mul2 ~ ~ ~ ~ ~ ~ ~ ~ MulNeg2 ~ ~ ~ ~ -11.73% ~ ~ -0.02% EfaceInteger ~ ~ ~ ~ ~ +18.11% ~ +2.53% TypeAssert +33.90% +2.86% ~ ~ ~ -1.07% -5.29% -1.04% Div64UnsignedSmall ~ ~ ~ ~ ~ ~ ~ ~ Div64Small ~ ~ ~ ~ ~ -0.88% ~ +2.39% Div64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +0.35% Div64SmallNegDividend ~ ~ ~ ~ ~ -0.84% ~ +3.57% Div64SmallNegBoth ~ ~ ~ ~ ~ -0.86% ~ +3.55% Div64Unsigned ~ ~ ~ ~ ~ ~ ~ -0.11% Div64 ~ ~ ~ ~ ~ ~ ~ +0.11% Div64NegDivisor ~ ~ ~ ~ ~ -1.29% ~ ~ Div64NegDividend ~ ~ ~ ~ ~ -1.44% ~ ~ Div64NegBoth ~ ~ ~ ~ ~ ~ ~ +0.28% Mod64UnsignedSmall ~ ~ ~ ~ ~ +0.48% ~ +0.93% Mod64Small ~ ~ ~ ~ ~ ~ ~ ~ Mod64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +1.44% Mod64SmallNegDividend ~ ~ ~ ~ ~ +0.22% ~ +1.37% Mod64SmallNegBoth ~ ~ ~ ~ ~ ~ ~ -2.22% Mod64Unsigned ~ ~ ~ ~ ~ -0.95% ~ +0.11% Mod64 ~ ~ ~ ~ ~ ~ ~ ~ Mod64NegDivisor ~ ~ ~ ~ ~ ~ ~ -0.02% Mod64NegDividend ~ ~ ~ ~ ~ ~ ~ ~ Mod64NegBoth ~ ~ ~ ~ ~ ~ ~ -0.02% MulconstI32/3 ~ ~ ~ -25.00% ~ ~ ~ +47.37% MulconstI32/5 ~ ~ ~ +33.28% ~ ~ ~ +32.21% MulconstI32/12 ~ ~ ~ -2.13% ~ ~ ~ -0.02% MulconstI32/120 ~ ~ ~ +2.93% ~ ~ ~ -0.03% MulconstI32/-120 ~ ~ ~ -2.17% ~ ~ ~ -0.03% MulconstI32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03% MulconstI32/65538 ~ ~ ~ ~ ~ -33.38% ~ +0.04% MulconstI64/3 ~ ~ ~ +33.35% ~ -0.37% ~ -0.13% MulconstI64/5 ~ ~ ~ -25.00% ~ -0.34% ~ ~ MulconstI64/12 ~ ~ ~ +2.13% ~ +11.62% ~ +2.30% MulconstI64/120 ~ ~ ~ -1.98% ~ ~ ~ ~ MulconstI64/-120 ~ ~ ~ +0.75% ~ ~ ~ ~ MulconstI64/65537 ~ ~ ~ ~ ~ +5.61% ~ ~ MulconstI64/65538 ~ ~ ~ ~ ~ +5.25% ~ ~ MulconstU32/3 ~ +0.81% ~ +33.39% ~ +77.92% ~ -32.31% MulconstU32/5 ~ ~ ~ -24.97% ~ +77.92% ~ -24.47% MulconstU32/12 ~ ~ ~ +2.06% ~ ~ ~ +0.03% MulconstU32/120 ~ ~ ~ -2.74% ~ ~ ~ +0.03% MulconstU32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03% MulconstU32/65538 ~ ~ ~ ~ ~ -33.42% ~ -0.03% MulconstU64/3 ~ ~ ~ +33.33% ~ -0.28% ~ +1.22% MulconstU64/5 ~ ~ ~ -25.00% ~ ~ ~ -0.64% MulconstU64/12 ~ ~ ~ +2.30% ~ +11.59% ~ +0.14% MulconstU64/120 ~ ~ ~ -2.82% ~ ~ ~ +0.04% MulconstU64/65537 ~ +0.37% ~ ~ ~ +5.58% ~ ~ MulconstU64/65538 ~ ~ ~ ~ ~ +5.16% ~ ~ ShiftArithmeticRight ~ ~ ~ ~ ~ -10.81% ~ +0.31% Switch8Predictable +14.69% ~ ~ ~ ~ -24.85% ~ ~ Switch8Unpredictable ~ -0.58% -3.80% ~ ~ -11.78% ~ -0.79% Switch32Predictable -10.33% +17.89% ~ ~ ~ +5.76% ~ ~ Switch32Unpredictable -3.15% +1.19% +9.42% ~ ~ -10.30% -5.09% +0.44% SwitchStringPredictable +70.88% +20.48% ~ ~ ~ +2.39% ~ +0.31% SwitchStringUnpredictable ~ +3.91% -5.06% -0.98% ~ +0.61% +2.03% ~ SwitchTypePredictable +146.58% -1.10% ~ -12.45% ~ -0.46% -3.81% ~ SwitchTypeUnpredictable +0.46% -0.83% ~ +4.18% ~ +0.43% ~ +0.62% SwitchInterfaceTypePredictable -13.41% -10.13% +11.03% ~ ~ -4.38% ~ +0.75% SwitchInterfaceTypeUnpredictable -6.37% -2.14% ~ -3.21% ~ -4.20% ~ +1.08% Fixes #63110. Fixes #75954. Change-Id: I55a876f08c6c14f419ce1a8cbba2eaae6c6efbf0 Reviewed-on: https://go-review.googlesource.com/c/go/+/714160 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-10-29 18:49:40 -07:00
Russ Cox	915c1839fe	test/codegen: simplify asmcheck pattern matching Separate patterns in asmcheck by spaces instead of commas. Many patterns end in comma (like "MOV [$]123,") so separating patterns by comma is not great; they're already quoted, so spaces are fine. Also replace all tabs in the assembly lines with spaces before matching. Finally, replace \$ or \\$ with [$] as the matching idiom. The effect of all these is to make the patterns look like: // amd64:"BSFQ" "ORQ [$]256" instead of the old: // amd64:"BSFQ","ORQ\t\\$256" Update all tests as well. Change-Id: Ia39febe5d7f67ba115846422789e11b185d5c807 Reviewed-on: https://go-review.googlesource.com/c/go/+/716060 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com>	2025-10-29 13:55:00 -07:00
Jorropo	73d7635fae	cmd/compile: add generic rules to remove bool → int → bool roundtrips Change-Id: I8b0a3b64c89fe167d304f901a5d38470f35400ab Reviewed-on: https://go-review.googlesource.com/c/go/+/715200 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-10-27 23:24:54 -07:00
Meng Zhuo	d7a52f9369	cmd/compile: use MOV(D\|F) with const for Const(64\|32)F on riscv64 The original Const64F using: AUIPC + LD + FMVDX to load float64 const, we can use AUIPC + FLD instead, same as Const32F. Change-Id: I8ca0a0e90d820a26e69b74cd25df3cc662132bf7 Reviewed-on: https://go-review.googlesource.com/c/go/+/703215 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>	2025-10-26 18:35:09 -07:00
David Chase	7056c71d32	cmd/compile: disable use of new saturating float-to-int conversions The new conversions can be activated (or bisected) with -gcflags=all=-d=converthash=PATTERN where PATTERN is either a hash string or n, qn, y, qy for no, quietly no, yes, quietly yes. This CL makes the default pattern be "qn" instead of the default-default which is an efficient encoding of "qy". Updates #75834 Change-Id: I88a9fd7880bc999132420c8d0a22a8fdc1e95a2a Reviewed-on: https://go-review.googlesource.com/c/go/+/711845 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Bypass: David Chase <drchase@google.com>	2025-10-14 15:09:35 -07:00
Keith Randall	9b8742f2e7	cmd/compile: don't depend on arch-dependent conversions in the compiler Leave those constant foldings for runtime, similar to how we do it for NaN generation. These are the only instances I could find in cmd/compile/..., using objdump -d ../pkg/tool/darwin_arm64/compile\| egrep "(fcvtz\|>:)" \| grep -B1 fcvt (There are instances in other places, like runtime and reflect, but I don't think those places would affect compiler output.) Change-Id: I4113fe4570115e4765825cf442cb1fde97cf2f27 Reviewed-on: https://go-review.googlesource.com/c/go/+/711281 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>	2025-10-13 12:19:32 -07:00
Michael Matloob	19a30ea3f2	cmd/compile: call generated size-specialized malloc functions directly This change creates calls to size-specialized malloc functions instead of calls to newObject when we know the size of the allocation at compilation time. Most of it is a matter of calling the newObject function (which will create calls to the size-specialized functions) rather then the newObjectNonSpecialized function (which won't). In the newHeapaddr, small, non-pointer case, we'll create a non specialized newObject and transform that into the appropriate size-specialized function when we produce the mallocgc in flushPendingHeapAllocations. We have to update some of the rewrites in generic.rules to also apply to the size-specialized functions when they apply to newObject. The messiest thing is we have to adjust the offset we use to save the memory profiler stack, because the depth of the call to profilealloc is two frames fewer in the size-specialized malloc functions compared to when newObject calls mallocgc. A bunch of tests have been adjusted to account for that. Change-Id: I6a6a6964c9037fb6719e392c4a498ed700b617d7 Reviewed-on: https://go-review.googlesource.com/c/go/+/707856 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Matloob <matloob@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-10-09 14:59:40 -07:00
Michael Munday	97fd6bdecc	cmd/compile: fuse NaN checks with other comparisons NaN checks can often be merged into other comparisons by inverting them. For example, `math.IsNaN(x) \|\| x > 0` is equivalent to `!(x <= 0)`. goos: linux goarch: amd64 pkg: math cpu: 12th Gen Intel(R) Core(TM) i7-12700T │ sec/op │ sec/op vs base │ Acos 4.315n ± 0% 4.314n ± 0% ~ (p=0.642 n=10) Acosh 8.398n ± 0% 7.779n ± 0% -7.37% (p=0.000 n=10) Asin 4.203n ± 0% 4.211n ± 0% +0.20% (p=0.001 n=10) Asinh 10.150n ± 0% 9.562n ± 0% -5.79% (p=0.000 n=10) Atan 2.363n ± 0% 2.363n ± 0% ~ (p=0.801 n=10) Atanh 8.192n ± 2% 7.685n ± 0% -6.20% (p=0.000 n=10) Atan2 4.013n ± 0% 4.010n ± 0% ~ (p=0.073 n=10) Cbrt 4.858n ± 0% 4.755n ± 0% -2.12% (p=0.000 n=10) Cos 4.596n ± 0% 4.357n ± 0% -5.20% (p=0.000 n=10) Cosh 5.071n ± 0% 5.071n ± 0% ~ (p=0.585 n=10) Erf 2.802n ± 1% 2.788n ± 0% -0.54% (p=0.002 n=10) Erfc 3.087n ± 1% 3.071n ± 0% ~ (p=0.320 n=10) Erfinv 3.981n ± 0% 3.965n ± 0% -0.41% (p=0.000 n=10) Erfcinv 3.985n ± 0% 3.977n ± 0% -0.20% (p=0.000 n=10) ExpGo 8.721n ± 2% 8.252n ± 0% -5.38% (p=0.000 n=10) Expm1 4.378n ± 0% 4.228n ± 0% -3.43% (p=0.000 n=10) Exp2 8.313n ± 0% 7.855n ± 0% -5.52% (p=0.000 n=10) Exp2Go 8.498n ± 2% 7.921n ± 0% -6.79% (p=0.000 n=10) Mod 15.16n ± 4% 12.20n ± 1% -19.58% (p=0.000 n=10) Frexp 1.780n ± 2% 1.496n ± 0% -15.96% (p=0.000 n=10) Gamma 4.378n ± 1% 4.013n ± 0% -8.35% (p=0.000 n=10) HypotGo 2.655n ± 5% 2.427n ± 1% -8.57% (p=0.000 n=10) Ilogb 1.912n ± 5% 1.749n ± 0% -8.53% (p=0.000 n=10) J0 22.43n ± 9% 20.46n ± 0% -8.76% (p=0.000 n=10) J1 21.03n ± 4% 19.96n ± 0% -5.09% (p=0.000 n=10) Jn 45.40n ± 1% 42.59n ± 0% -6.20% (p=0.000 n=10) Ldexp 2.312n ± 1% 1.944n ± 0% -15.94% (p=0.000 n=10) Lgamma 4.617n ± 1% 4.584n ± 0% -0.73% (p=0.000 n=10) Log 4.226n ± 0% 4.213n ± 0% -0.31% (p=0.001 n=10) Logb 1.771n ± 0% 1.775n ± 0% ~ (p=0.097 n=10) Log1p 5.102n ± 2% 5.001n ± 0% -1.97% (p=0.000 n=10) Log10 4.407n ± 0% 4.408n ± 0% ~ (p=1.000 n=10) Log2 2.416n ± 1% 2.138n ± 0% -11.51% (p=0.000 n=10) Modf 1.669n ± 2% 1.611n ± 0% -3.50% (p=0.000 n=10) Nextafter32 2.186n ± 0% 2.185n ± 0% ~ (p=0.051 n=10) Nextafter64 2.182n ± 0% 2.184n ± 0% +0.09% (p=0.016 n=10) PowInt 11.39n ± 6% 10.68n ± 2% -6.24% (p=0.000 n=10) PowFrac 26.60n ± 2% 26.12n ± 0% -1.80% (p=0.000 n=10) Pow10Pos 0.5067n ± 4% 0.5003n ± 1% -1.27% (p=0.001 n=10) Pow10Neg 0.8552n ± 0% 0.8552n ± 0% ~ (p=0.928 n=10) Round 1.181n ± 0% 1.182n ± 0% +0.08% (p=0.001 n=10) RoundToEven 1.709n ± 0% 1.710n ± 0% ~ (p=0.053 n=10) Remainder 12.54n ± 5% 11.99n ± 2% -4.46% (p=0.000 n=10) Sin 3.933n ± 5% 3.926n ± 0% -0.17% (p=0.000 n=10) Sincos 5.672n ± 0% 5.522n ± 0% -2.65% (p=0.000 n=10) Sinh 5.447n ± 1% 5.444n ± 0% -0.06% (p=0.029 n=10) Tan 4.061n ± 0% 4.058n ± 0% -0.07% (p=0.005 n=10) Tanh 5.599n ± 0% 5.595n ± 0% -0.06% (p=0.042 n=10) Y0 20.75n ± 5% 19.73n ± 1% -4.92% (p=0.000 n=10) Y1 20.87n ± 2% 19.78n ± 1% -5.20% (p=0.000 n=10) Yn 44.50n ± 2% 42.04n ± 2% -5.53% (p=0.000 n=10) geomean 4.989n 4.791n -3.96% goos: linux goarch: riscv64 pkg: math cpu: Spacemit(R) X60 │ sec/op │ sec/op vs base │ Acos 159.9n ± 0% 159.9n ± 0% ~ (p=0.269 n=10) Acosh 244.7n ± 0% 235.0n ± 0% -3.98% (p=0.000 n=10) Asin 159.9n ± 0% 159.9n ± 0% ~ (p=0.154 n=10) Asinh 270.8n ± 0% 261.1n ± 0% -3.60% (p=0.000 n=10) Atan 119.1n ± 0% 119.1n ± 0% ~ (p=0.347 n=10) Atanh 260.2n ± 0% 261.8n ± 4% ~ (p=0.459 n=10) Atan2 186.8n ± 0% 186.8n ± 0% ~ (p=0.487 n=10) Cbrt 203.5n ± 0% 198.2n ± 0% -2.60% (p=0.000 n=10) Ceil 31.82n ± 0% 31.81n ± 0% ~ (p=0.714 n=10) Copysign 4.894n ± 0% 4.893n ± 0% ~ (p=0.161 n=10) Cos 107.6n ± 0% 103.6n ± 0% -3.76% (p=0.000 n=10) Cosh 259.0n ± 0% 252.8n ± 0% -2.39% (p=0.000 n=10) Erf 133.7n ± 0% 133.7n ± 0% ~ (p=0.720 n=10) Erfc 137.9n ± 0% 137.8n ± 0% -0.04% (p=0.033 n=10) Erfinv 173.7n ± 0% 168.8n ± 0% -2.82% (p=0.000 n=10) Erfcinv 173.7n ± 0% 168.8n ± 0% -2.82% (p=0.000 n=10) Exp 215.3n ± 0% 208.1n ± 0% -3.34% (p=0.000 n=10) ExpGo 226.7n ± 0% 220.6n ± 0% -2.69% (p=0.000 n=10) Expm1 164.8n ± 0% 159.0n ± 0% -3.52% (p=0.000 n=10) Exp2 185.0n ± 0% 182.7n ± 0% -1.22% (p=0.000 n=10) Exp2Go 198.9n ± 0% 196.5n ± 0% -1.21% (p=0.000 n=10) Abs 4.894n ± 0% 4.893n ± 0% ~ (p=0.262 n=10) Dim 16.31n ± 0% 16.31n ± 0% ~ (p=1.000 n=10) Floor 31.81n ± 0% 31.81n ± 0% ~ (p=0.067 n=10) Max 26.11n ± 0% 26.10n ± 0% ~ (p=0.080 n=10) Min 26.10n ± 0% 26.10n ± 0% ~ (p=0.095 n=10) Mod 337.7n ± 0% 291.9n ± 0% -13.56% (p=0.000 n=10) Frexp 50.57n ± 0% 42.41n ± 0% -16.13% (p=0.000 n=10) Gamma 206.3n ± 0% 198.1n ± 0% -4.00% (p=0.000 n=10) Hypot 94.62n ± 0% 94.61n ± 0% ~ (p=0.437 n=10) HypotGo 109.3n ± 0% 109.3n ± 0% ~ (p=1.000 n=10) Ilogb 44.05n ± 0% 44.04n ± 0% -0.02% (p=0.025 n=10) J0 663.1n ± 0% 663.9n ± 0% +0.13% (p=0.002 n=10) J1 663.9n ± 0% 666.4n ± 0% +0.38% (p=0.000 n=10) Jn 1.404µ ± 0% 1.407µ ± 0% +0.21% (p=0.000 n=10) Ldexp 57.10n ± 0% 48.93n ± 0% -14.30% (p=0.000 n=10) Lgamma 185.1n ± 0% 187.6n ± 0% +1.32% (p=0.000 n=10) Log 182.7n ± 0% 170.1n ± 0% -6.87% (p=0.000 n=10) Logb 46.49n ± 0% 46.49n ± 0% ~ (p=0.675 n=10) Log1p 184.3n ± 0% 179.4n ± 0% -2.63% (p=0.000 n=10) Log10 184.3n ± 0% 171.2n ± 0% -7.08% (p=0.000 n=10) Log2 66.05n ± 0% 57.90n ± 0% -12.34% (p=0.000 n=10) Modf 34.25n ± 0% 34.24n ± 0% ~ (p=0.163 n=10) Nextafter32 49.33n ± 1% 48.93n ± 0% -0.81% (p=0.002 n=10) Nextafter64 43.64n ± 0% 43.23n ± 0% -0.93% (p=0.000 n=10) PowInt 267.6n ± 0% 251.2n ± 0% -6.11% (p=0.000 n=10) PowFrac 672.9n ± 0% 637.9n ± 0% -5.19% (p=0.000 n=10) Pow10Pos 13.87n ± 0% 13.87n ± 0% ~ (p=1.000 n=10) Pow10Neg 19.58n ± 62% 19.59n ± 62% ~ (p=0.355 n=10) Round 23.65n ± 0% 23.65n ± 0% ~ (p=1.000 n=10) RoundToEven 27.73n ± 0% 27.73n ± 0% ~ (p=0.635 n=10) Remainder 309.9n ± 0% 280.5n ± 0% -9.49% (p=0.000 n=10) Signbit 13.05n ± 0% 13.05n ± 0% ~ (p=1.000 n=10) ¹ Sin 120.7n ± 0% 120.7n ± 0% ~ (p=1.000 n=10) ¹ Sincos 148.4n ± 0% 143.5n ± 0% -3.30% (p=0.000 n=10) Sinh 275.6n ± 0% 267.5n ± 0% -2.94% (p=0.000 n=10) SqrtIndirect 3.262n ± 0% 3.262n ± 0% ~ (p=0.263 n=10) SqrtLatency 19.57n ± 0% 19.57n ± 0% ~ (p=0.582 n=10) SqrtIndirectLatency 19.57n ± 0% 19.57n ± 0% ~ (p=1.000 n=10) SqrtGoLatency 203.2n ± 0% 197.6n ± 0% -2.78% (p=0.000 n=10) SqrtPrime 4.952µ ± 0% 4.952µ ± 0% -0.01% (p=0.025 n=10) Tan 153.3n ± 0% 153.3n ± 0% ~ (p=1.000 n=10) Tanh 280.5n ± 0% 272.4n ± 0% -2.91% (p=0.000 n=10) Trunc 31.81n ± 0% 31.81n ± 0% ~ (p=1.000 n=10) Y0 680.1n ± 0% 664.8n ± 0% -2.25% (p=0.000 n=10) Y1 684.2n ± 0% 669.6n ± 0% -2.14% (p=0.000 n=10) Yn 1.444µ ± 0% 1.410µ ± 0% -2.35% (p=0.000 n=10) Float64bits 5.709n ± 0% 5.708n ± 0% ~ (p=0.573 n=10) Float64frombits 4.893n ± 0% 4.893n ± 0% ~ (p=0.734 n=10) Float32bits 12.23n ± 0% 12.23n ± 0% ~ (p=0.628 n=10) Float32frombits 4.893n ± 0% 4.893n ± 0% ~ (p=0.971 n=10) FMA 4.893n ± 0% 4.893n ± 0% ~ (p=0.736 n=10) geomean 88.96n 87.05n -2.15% ¹ all samples are equal Change-Id: I8db8ac7b7b3430b946b89e88dd6c1546804125c3 Reviewed-on: https://go-review.googlesource.com/c/go/+/697360 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Munday <mikemndy@gmail.com>	2025-10-08 08:11:17 -07:00
Cherry Mui	1d62e92567	test/codegen: make sure assignment results are used. Some tests make assignments to an argument without reading it. With CL 708865, they are treated as dead stores and are removed. Make sure the results are used. Fixes #75745. Fixes #75746. Change-Id: I05580beb1006505ec1550e5fa245b54dcefd10b9 Reviewed-on: https://go-review.googlesource.com/c/go/+/708916 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-10-06 14:51:23 -07:00
Cherry Mui	38b26f29f1	cmd/compile: remove stores to unread parameters Currently, we remove stores to local variables that are not read. We don't do that for arguments. But arguments and locals are essentially the same. Arguments are passed by value, and are not expected to be read in the caller's frame. So we can remove the writes to them as well. One exception is the cgo_unsafe_arg directive, which makes all the arguments effectively address-taken. cgo_unsafe_arg implies ABI0, so we just skip ABI0 functions' arguments. Cherry-picked from the dev.simd branch. This CL is not necessarily SIMD specific. Apply early to reduce risk. Change-Id: I8999fc50da6a87f22c1ec23e9a0c15483b6f7df8 Reviewed-on: https://go-review.googlesource.com/c/go/+/705815 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/708865	2025-10-03 12:31:20 -07:00
Cherry Mui	fb1749a3fe	[dev.simd] all: merge master (`adce7f1`) into dev.simd Conflicts: - src/internal/goexperiment/flags.go - src/runtime/export_test.go Merge List: + 2025-10-03 `adce7f196e` cmd/link: support .def file with MSVC clang toolchain + 2025-10-03 `d5b950399d` cmd/cgo: fix unaligned arguments typedmemmove crash on iOS + 2025-10-02 `53845004d6` net/http/httputil: deprecate ReverseProxy.Director + 2025-10-02 `bbdff9e8e1` net/http: update bundled x/net/http2 and delete obsolete http2inTests + 2025-10-02 `4008e07080` io/fs: move path name documentation up to the package doc comment + 2025-10-02 `0e4e2e6832` runtime: skip TestGoroutineLeakProfile under mayMoreStackPreempt + 2025-10-02 `f03c392295` runtime: fix aix/ppc64 library initialization + 2025-10-02 `707454b41f` cmd/go: update `go help mod edit` with the tool and ignore sections + 2025-10-02 `8c68a1c1ab` runtime,net/http/pprof: goroutine leak detection by using the garbage collector + 2025-10-02 `84db201ae1` cmd/compile: propagate len([]T{}) to make builtin to allow stack allocation + 2025-10-02 `5799c139a7` crypto/tls: rm marshalEncryptedClientHelloConfigList dead code + 2025-10-01 `633dd1d475` encoding/json: fix Decoder.InputOffset regression in goexperiment.jsonv2 + 2025-10-01 `8ad27fb656` doc/go_spec.html: update date + 2025-10-01 `3f451f2c54` testing/synctest: fix inverted test failure message in TestContextAfterFunc + 2025-10-01 `be0fed8a5f` cmd/go/testdata/script/test_fuzz_fuzztime.txt: disable + 2025-09-30 `eb1c7f6e69` runtime: move loong64 library entry point to os-agnostic file + 2025-09-30 `c9257151e5` runtime: unify ppc64/ppc64le library entry point + 2025-09-30 `4ff8a457db` test/codegen: codify handling of floating point constants on arm64 + 2025-09-30 `fcb893fc4b` cmd/compile/internal/ssa: remove redundant "type:" prefix check + 2025-09-30 `19cc1022ba` mime: reduce allocs incurred by ParseMediaType + 2025-09-30 `08afc50bea` mime: extend "builtinTypes" to include a more complete list of common types + 2025-09-30 `97da068774` cmd/compile: eliminate nil checks on .dict arg + 2025-09-30 `300d9d2714` runtime: initialise debug settings much earlier in startup process + 2025-09-30 `a846bb0aa5` errors: add AsType + 2025-09-30 `7c8166d02d` cmd/link/internal/arm64: support Mach-O ARM64_RELOC_SUBTRACTOR in internal linking + 2025-09-30 `6e95748335` cmd/link/internal/arm64: support Mach-O ARM64_RELOC_POINTER_TO_GOT in internal linking + 2025-09-30 `742f92063e` cmd/compile, runtime: always enable Wasm signext and satconv features + 2025-09-30 `db10db6be3` internal/poll: remove operation fields from FD + 2025-09-29 `75c87df58e` internal/poll: pass the I/O mode instead of an overlapped object in execIO + 2025-09-29 `fc88e18b4a` crypto/internal/fips140/entropy: add CPU jitter-based entropy source + 2025-09-29 `db4fade759` crypto/internal/fips140/mlkem: make CAST conditional + 2025-09-29 `db3cb3fd9a` runtime: correct reference to getStackMap in comment + 2025-09-29 `690fc2fb05` internal/poll: remove buf field from operation + 2025-09-29 `eaf2345256` cmd/link: use a .def file to mark exported symbols on Windows + 2025-09-29 `4b77733565` internal/syscall/windows: regenerate GetFileSizeEx + 2025-09-29 `4e9006a716` crypto/tls: quote protocols in ALPN error message + 2025-09-29 `047c2ab841` cmd/link: don't pass -Wl,-S on Solaris + 2025-09-29 `ae8eba071b` cmd/link: use correct length for pcln.cutab + 2025-09-29 `fe3ba74b9e` cmd/link: skip TestFlagW on platforms without DWARF symbol table + 2025-09-29 `d42d56b764` encoding/xml: make use of reflect.TypeAssert + 2025-09-29 `6d51f93257` runtime: jump instead of branch in netbsd/arm64 entry point + 2025-09-28 `5500cbf0e4` debug/elf: prevent offset overflow + 2025-09-27 `34e67623a8` all: fix typos + 2025-09-27 `af6999e60d` cmd/compile: implement jump table on loong64 + 2025-09-26 `63cd912083` os/user: simplify go:build + 2025-09-26 `53009b26dd` runtime: use a smaller arena size on Wasm + 2025-09-26 `3a5df9d2b2` net/http: add HTTP2Config.StrictMaxConcurrentRequests + 2025-09-26 `16be34df02` net/http: add more tests of transport connection pool + 2025-09-26 `3e4540b49d` os/user: use getgrouplist on illumos && cgo + 2025-09-26 `15fbe3480b` internal/poll: simplify WriteMsg and ReadMsg on Windows + 2025-09-26 `16ae11a9e1` runtime: move TestReadMetricsSched to testprog + 2025-09-26 `459f3a3adc` cmd/link: don't pass -Wl,-S on AIX + 2025-09-26 `4631a2d3c6` cmd/link: skip TestFlagW on AIX + 2025-09-26 `0f31d742cd` cmd/compile: fix ICE with new(<untyped expr>) + 2025-09-26 `7d7cd6e07b` internal/poll: don't call SetFilePointerEx in Seek for overlapped handles + 2025-09-26 `41cba31e66` mime/multipart: percent-encode CR and LF in header values to avoid CRLF injection + 2025-09-26 `dd1d597c3a` Revert "cmd/internal/obj/loong64: use the MOVVP instruction to optimize prologue" + 2025-09-26 `45d6bc76af` runtime: unify arm64 entry point code + 2025-09-25 `fdea7da3e6` runtime: use common library entry point on windows amd64/386 + 2025-09-25 `e8a4f508d1` lib/fips140: re-seal v1.0.0 + 2025-09-25 `9b7a328089` crypto/internal/fips140: remove key import PCTs, make keygen PCTs fatal + 2025-09-25 `7f9ab7203f` crypto/internal/fips140: update frozen module version to "v1.0.0" + 2025-09-25 `fb5719cbda` crypto/internal/fips140/ecdsa: make TestingOnlyNewDRBG generic + 2025-09-25 `56067e31f2` std: remove unused declarations Change-Id: Iecb28fd62c69fbed59da557f46d31bae55889e2c	2025-10-03 10:11:21 -04:00
Joel Sing	4ff8a457db	test/codegen: codify handling of floating point constants on arm64 While here, reorder Float32ConstantStore/Float64ConstantStore for consistency. Change-Id: Ic1b3e9f9474965d15bc94518d78d1a4a7bda93f3 Reviewed-on: https://go-review.googlesource.com/c/go/+/703756 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@google.com>	2025-09-30 14:49:25 -07:00
Jake Bailey	97da068774	cmd/compile: eliminate nil checks on .dict arg The first arg of a generic function is the dictionary. This dictionary is never nil, but it gets a nil check becuase the dict arg is treated as a slice during construction. cmp.Compare[go.shape.int] was: 00006 (+41) TESTB AX, (AX) 00007 (+52) CMPQ CX, BX 00008 (52) JGT 14 00009 (+55) JGE 12 00010 (+56) MOVL $1, AX 00011 (56) RET 00012 (+58) XORL AX, AX 00013 (58) RET 00014 (+53) MOVQ $-1, AX 00015 (53) RET Note how the function begins with a TESTB that loads the dict to perform the nil check. This CL eliminates that nil check. For most generic functions, this doesn't matter too much, but not infrequently are generic functions written which never actually use the dictionary (like cmp.Compare), so I suspect this might help in hot code to avoid repeatedly touching the dictionary in memory, and in cases where the generic function is not inlined (and thus the dict dropped). compilecmp shows these changes (deduped): cmp.Compare[go.shape.float64] 73 -> 72 (-1.37%) cmp.Compare[go.shape.int] 26 -> 24 (-7.69%) cmp.Compare[go.shape.int32] 25 -> 23 (-8.00%) cmp.Compare[go.shape.int64] 26 -> 24 (-7.69%) cmp.Compare[go.shape.string] 142 -> 141 (-0.70%) cmp.Compare[go.shape.uint16] 26 -> 24 (-7.69%) cmp.Compare[go.shape.uint] 26 -> 24 (-7.69%) cmp.Compare[go.shape.uint32] 25 -> 23 (-8.00%) cmp.Compare[go.shape.uint64] 26 -> 24 (-7.69%) cmp.Compare[go.shape.uint8] 25 -> 23 (-8.00%) cmp.Compare[go.shape.uintptr] 26 -> 24 (-7.69%) cmp.Less[go.shape.float64] 35 -> 34 (-2.86%) cmp.Less[go.shape.int32] 8 -> 6 (-25.00%) cmp.Less[go.shape.int64] 9 -> 7 (-22.22%) cmp.Less[go.shape.int] 9 -> 7 (-22.22%) cmp.Less[go.shape.string] 112 -> 110 (-1.79%) cmp.Less[go.shape.uint16] 9 -> 7 (-22.22%) cmp.Less[go.shape.uint32] 8 -> 6 (-25.00%) cmp.Less[go.shape.uint64] 9 -> 7 (-22.22%) internal/synctest.Associate[go.shape.struct 114 -> 113 (-0.88%) internal/trace.(dataTable[go.shape.uint64,go.shape.string]).insert 805 -> 791 (-1.74%) internal/trace.(dataTable[go.shape.uint64,go.shape.struct 858 -> 852 (-0.70%) main.(gState[go.shape.int64]).stop 2111 -> 2085 (-1.23%) main.(gState[go.shape.int64]).unblock 941 -> 923 (-1.91%) runtime.fmax[go.shape.float32] 85 -> 83 (-2.35%) runtime.fmax[go.shape.float64] 89 -> 87 (-2.25%) runtime.fmin[go.shape.float32] 85 -> 83 (-2.35%) runtime.fmin[go.shape.float64] 89 -> 87 (-2.25%) slices.BinarySearch[go.shape.[]string,go.shape.string] 346 -> 337 (-2.60%) slices.Concat[go.shape.[]uint8,go.shape.uint8] 462 -> 453 (-1.95%) slices.ContainsFunc[go.shape.[]cmd/vendor/github.com/google/pprof/profile.Sample,go.shape.uint8] 170 -> 169 (-0.59%) slices.ContainsFunc[go.shape.[]debug/dwarf.StructField,go.shape.uint8] 170 -> 169 (-0.59%) slices.ContainsFunc[go.shape.[]go/ast.Field,go.shape.uint8] 170 -> 169 (-0.59%) slices.ContainsFunc[go.shape.[]string,go.shape.string] 186 -> 181 (-2.69%) slices.Contains[go.shape.[]cmd/compile/internal/syntax.BranchStmt,go.shape.cmd/compile/internal/syntax.BranchStmt] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]cmd/compile/internal/syntax.Type,go.shape.interface 223 -> 219 (-1.79%) slices.Contains[go.shape.[]crypto/tls.CurveID,go.shape.uint16] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]crypto/tls.SignatureScheme,go.shape.uint16] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]go/ast.BranchStmt,go.shape.go/ast.BranchStmt] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]go/types.Type,go.shape.interface 223 -> 219 (-1.79%) slices.Contains[go.shape.[]int,go.shape.int] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]string,go.shape.string] 223 -> 219 (-1.79%) slices.Contains[go.shape.[]uint16,go.shape.uint16] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]uint8,go.shape.uint8] 44 -> 42 (-4.55%) slices.Insert[go.shape.[]string,go.shape.string] 1189 -> 1170 (-1.60%) slices.medianCmpFunc[go.shape.struct 1118 -> 1113 (-0.45%) slices.medianCmpFunc[go.shape.struct 1214 -> 1209 (-0.41%) slices.medianCmpFunc[go.shape.struct 889 -> 887 (-0.22%) slices.medianCmpFunc[go.shape.struct 901 -> 874 (-3.00%) slices.order2Ordered[go.shape.float64] 89 -> 87 (-2.25%) slices.order2Ordered[go.shape.uint16] 75 -> 70 (-6.67%) slices.partialInsertionSortOrdered[go.shape.string] 1115 -> 1110 (-0.45%) slices.partialInsertionSortOrdered[go.shape.uint16] 358 -> 352 (-1.68%) slices.partitionEqualOrdered[go.shape.int] 208 -> 203 (-2.40%) slices.partitionEqualOrdered[go.shape.int32] 208 -> 198 (-4.81%) slices.partitionEqualOrdered[go.shape.int64] 208 -> 203 (-2.40%) slices.partitionEqualOrdered[go.shape.uint32] 208 -> 198 (-4.81%) slices.partitionEqualOrdered[go.shape.uint64] 208 -> 203 (-2.40%) slices.partitionOrdered[go.shape.float64] 538 -> 533 (-0.93%) slices.partitionOrdered[go.shape.int] 437 -> 427 (-2.29%) slices.partitionOrdered[go.shape.int64] 437 -> 427 (-2.29%) slices.partitionOrdered[go.shape.uint16] 447 -> 442 (-1.12%) slices.partitionOrdered[go.shape.uint64] 437 -> 427 (-2.29%) slices.rotateCmpFunc[go.shape.struct 1045 -> 1029 (-1.53%) slices.rotateCmpFunc[go.shape.struct 1205 -> 1163 (-3.49%) slices.rotateCmpFunc[go.shape.struct 1226 -> 1176 (-4.08%) slices.rotateCmpFunc[go.shape.struct 1322 -> 1272 (-3.78%) slices.rotateCmpFunc[go.shape.struct 1419 -> 1400 (-1.34%) slices.rotateCmpFunc[go.shape.uint8] 549 -> 538 (-2.00%) slices.rotateLeft[go.shape.string] 603 -> 588 (-2.49%) slices.rotateLeft[go.shape.uint8] 255 -> 250 (-1.96%) slices.siftDownOrdered[go.shape.int] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.int32] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.int64] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.string] 614 -> 592 (-3.58%) slices.siftDownOrdered[go.shape.uint32] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.uint64] 181 -> 171 (-5.52%) time.parseRFC3339[go.shape.string] 1774 -> 1758 (-0.90%) unique.(canonMap[go.shape.struct 280 -> 276 (-1.43%) unique.clone[go.shape.struct 311 -> 293 (-5.79%) weak.Make[go.shape.6880e4598856efac32416085c0172278cf0fb9e5050ce6518bd9b7f7d1662440] 136 -> 134 (-1.47%) weak.Make[go.shape.struct 136 -> 134 (-1.47%) weak.Make[go.shape.uint8] 136 -> 134 (-1.47%) Change-Id: I43dcea5f2aa37372f773e5edc6a2ef1dee0a8db7 Reviewed-on: https://go-review.googlesource.com/c/go/+/706655 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org>	2025-09-30 11:22:35 -07:00
limeidan	af6999e60d	cmd/compile: implement jump table on loong64 Following CL 357330, use jump tables on Loong64. goos: linux goarch: loong64 pkg: cmd/compile/internal/test cpu: Loongson-3A6000-HV @ 2500.00MHz │ old │ new │ │ sec/op │ sec/op vs base │ Switch8Predictable 2.352n ± 0% 2.101n ± 0% -10.65% (p=0.000 n=10) Switch8Unpredictable 11.99n ± 0% 10.25n ± 0% -14.51% (p=0.000 n=10) Switch32Predictable 3.153n ± 0% 1.887n ± 1% -40.14% (p=0.000 n=10) Switch32Unpredictable 12.47n ± 0% 10.22n ± 0% -18.00% (p=0.000 n=10) SwitchStringPredictable 3.162n ± 0% 3.352n ± 0% +6.01% (p=0.000 n=10) SwitchStringUnpredictable 14.70n ± 0% 13.31n ± 0% -9.46% (p=0.000 n=10) SwitchTypePredictable 3.702n ± 0% 2.201n ± 0% -40.55% (p=0.000 n=10) SwitchTypeUnpredictable 16.18n ± 0% 14.48n ± 0% -10.51% (p=0.000 n=10) SwitchInterfaceTypePredictable 7.654n ± 0% 9.680n ± 0% +26.47% (p=0.000 n=10) SwitchInterfaceTypeUnpredictable 22.04n ± 0% 22.44n ± 0% +1.81% (p=0.000 n=10) geomean 7.441n 6.469n -13.07% Change-Id: Id6f30fa73349c60fac17670084daee56973a955f Reviewed-on: https://go-review.googlesource.com/c/go/+/705396 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn>	2025-09-27 05:02:58 -07:00
Junyang Shao	578777bf7c	[dev.simd] cmd/compile: make condtion of CanSSA smarter for SIMD fields This CL tires to improve a situation pointed out by https://github.com/golang/go/issues/73787#issuecomment-3305494947. Change-Id: Ic23c80fe71344fc25383ab238ad6631e0f0cd22e Reviewed-on: https://go-review.googlesource.com/c/go/+/705416 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-09-26 10:53:39 -07:00
Cherry Mui	2b50ffe172	[dev.simd] cmd/compile: remove stores to unread parameters Currently, we remove stores to local variables that are not read. We don't do that for arguments. But arguments and locals are essentially the same. Arguments are passed by value, and are not expected to be read in the caller's frame. So we can remove the writes to them as well. One exception is the cgo_unsafe_arg directive, which makes all the arguments effectively address-taken. cgo_unsafe_arg implies ABI0, so we just skip ABI0 functions' arguments. Change-Id: I8999fc50da6a87f22c1ec23e9a0c15483b6f7df8 Reviewed-on: https://go-review.googlesource.com/c/go/+/705815 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>	2025-09-23 08:05:41 -07:00
Cherry Mui	2d8cb80d7c	[dev.simd] all: merge master (`9b2d39b`) into dev.simd Conflicts: - src/internal/buildcfg/exp.go Merge List: + 2025-09-22 `9b2d39b75b` cmd/compile/internal/ssa: match style and formatting + 2025-09-22 `e23edf5e55` runtime: don't re-read metrics before check in TestReadMetricsSched + 2025-09-22 `177cd8d763` log/slog: use a pooled json encoder + 2025-09-22 `2353c15785` cmd/cgo/internal/test: skip TestMultipleAssign when using UCRT on Windows + 2025-09-22 `32dfd69282` cmd/dist: disable FIPS 140-3 mode when testing maphash with purego + 2025-09-19 `7f6ff5ec3e` cmd/compile: fix doc word + 2025-09-19 `9693b94be0` runtime: include stderr when objdump fails + 2025-09-19 `8616981ce6` log/slog: optimize slog Level.String() to avoid fmt.Sprintf + 2025-09-19 `b8af744360` testing: fix example for unexported identifier + 2025-09-19 `51dc5bfe6c` Revert "cmd/go: disable cgo by default if CC unset and DefaultCC doesn't exist" + 2025-09-19 `ee7bf06cb3` time: improve ParseDuration performance for invalid input + 2025-09-19 `f9e61a9a32` cmd/compile: duplicate nil check to two branches of write barrier + 2025-09-18 `3cf1aaf8b9` runtime: use futexes with 64-bit time on Linux + 2025-09-18 `0ab038af62` cmd/compile/internal/abi: use clear built-in + 2025-09-18 `00bf24fdca` bytes: use clear in test + 2025-09-18 `f9701d21d2` crypto: use clear built-in + 2025-09-18 `a58afe44fa` net: fix testHookCanceledDial race + 2025-09-18 `3203a5da29` net/http: avoid connCount underflow race + 2025-09-18 `8ca209ec39` context: don't return a non-nil from Err before Done is closed + 2025-09-18 `3032894e04` runtime: make explicit nil check in heapSetTypeSmallHeader + 2025-09-17 `ef05b66d61` cmd/internal/obj/riscv: add support for Zicond instructions + 2025-09-17 `78ef487a6f` cmd/compile: fix the issue of shift amount exceeding the valid range + 2025-09-17 `77aac7bb75` runtime: don't enable heap randomization if MSAN or ASAN is enabled + 2025-09-17 `465b85eb76` runtime: fix CheckScavengedBitsCleared with randomized heap base + 2025-09-17 `909704b85e` encoding/json/v2: fix typo in comment + 2025-09-17 `3db5979e8c` testing: use reflect.TypeAssert and reflect.TypeFor + 2025-09-17 `6a8dbbecbf` path/filepath: fix EvalSymlinks to return ENOTDIR on plan9 + 2025-09-17 `bffe7ad9f1` go/parser: Add TestBothLineAndLeadComment + 2025-09-17 `02a888e820` go/ast: document that (*ast.File).Comments is sorted by position + 2025-09-16 `594deca981` cmd/link: simplify PE relocations mapping + 2025-09-16 `9df1a289ac` go/parser: simplify expectSemi + 2025-09-16 `72ba117bda` internal/buildcfg: enable randomizedHeapBase64 by default + 2025-09-16 `796ea3bc2e` os/user: align test file name and build tags + 2025-09-16 `a69395eab2` runtime/_mkmalloc: add a copy of cloneNode + 2025-09-16 `cbdad4fc3c` cmd/go: check pattern for utf8 validity before call regexp.MustCompile + 2025-09-16 `c2d85eb999` cmd/go: disable cgo by default if CC unset and DefaultCC doesn't exist + 2025-09-16 `ac82fe68aa` bytes,strings: remove reference to non-existent SplitFunc + 2025-09-16 `0b26678db2` cmd/compile: fix mips zerorange implementation + 2025-09-16 `e2cfc1eb3a` cmd/internal/obj/riscv: improve handling of float point moves + 2025-09-16 `281c632e6e` crypto/x509/internal/macos: standardize package name + 2025-09-16 `61dc7fe30d` iter: document that calling yield after terminated range loop causes runtime panic Change-Id: Ic06019efc855913632003f41eb10c746b3410b0a	2025-09-23 10:32:03 -04:00
Junyang Shao	e34ad6de42	[dev.simd] cmd/compile: optimize VPTEST for 2-operand cases Change-Id: Ica2d5ee48082c69e86b12b519ba8df7a2556392f Reviewed-on: https://go-review.googlesource.com/c/go/+/704355 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>	2025-09-18 11:07:23 -07:00
Xiaolin Zhao	78ef487a6f	cmd/compile: fix the issue of shift amount exceeding the valid range Fixes #75479 Change-Id: I362d3e49090e94f91a840dd5a475978b59222a00 Reviewed-on: https://go-review.googlesource.com/c/go/+/704135 Reviewed-by: Mark Freeman <markfreeman@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: abner chenc <chenguoqi@loongson.cn>	2025-09-17 18:05:31 -07:00
Meng Zhuo	2469e92d8c	cmd/compile: combine doubling with shift on riscv64 Change-Id: I4bee2770fedf97e35b5a5b9187a8ba3c41f9ec2e Reviewed-on: https://go-review.googlesource.com/c/go/+/702697 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@google.com>	2025-09-15 17:31:56 -07:00
Meng Zhuo	e5ee1f2600	test/codegen: check zerobase for newobject on 0-sized types This CL also adds riscv64 checks Change-Id: I693e4e606f470615f6b49085592d6d5ca61473d3 Reviewed-on: https://go-review.googlesource.com/c/go/+/703716 Reviewed-by: Pengcheng Wang <wangpengcheng.pp@bytedance.com> Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>	2025-09-15 07:47:55 -07:00
Jake Bailey	dc960d0bfe	cmd/compile, reflect: further allow inlining of TypeFor Previous CLs optimized direct use of abi.Type, but reflect.Type is indirected, so was not benefiting. For TypeFor, we can use toRType directly without a nil check because the types are statically known. Normally, I'd think SSA would remove the nil check, but due to some oddity (specifically, late fuse being required to remove the nil check, but opt doesn't run that late) means that the nil check persists and gets in the way. Manually writing the code in this instance seems to fix the problem. It also exposed another problem; depending on the ordering, writeType could get to a type symbol before SSA, thereby preventing Extra from being created on the symbol for later lookups that don't go through TypeLinksym directly. In writeType, for non-shape types, call TypeLinksym to ensure that the type is set up for later callers. That change itself passed toolstash -cmp. All up, this stack put through compilecmp shows a lot of improvement in various reflect-using packages, and reflect itself. It is too big to fit in the commit message but here's some info: compilecmp master -> HEAD master (`d767064170`): cmd/compile: mark abi.PtrType.Elem sym as used HEAD (846a94c568): cmd/compile, reflect: further allow inlining of TypeFor file before after Δ % addr2line 3735911 3735391 -520 -0.014% asm 6382235 6382091 -144 -0.002% buildid 3608568 3608360 -208 -0.006% cgo 5951816 5951480 -336 -0.006% compile 28362080 28339772 -22308 -0.079% cover 6668686 6661414 -7272 -0.109% dist 4311961 4311425 -536 -0.012% fix 3771706 3771474 -232 -0.006% link 8686073 8684993 -1080 -0.012% nm 3715923 3715459 -464 -0.012% objdump 6074366 6073774 -592 -0.010% pack 3025653 3025277 -376 -0.012% pprof 18269485 18261653 -7832 -0.043% test2json 3442726 3438390 -4336 -0.126% trace 16984831 16981767 -3064 -0.018% vet 10701931 10696355 -5576 -0.052% total 133693951 133639075 -54876 -0.041% runtime runtime.stkobjinit 240 -> 165 (-31.25%) runtime [cmd/compile] runtime.stkobjinit 240 -> 165 (-31.25%) reflect reflect.Value.Seq2.func3 309 -> 245 (-20.71%) reflect.Value.Seq2.func1.1 281 -> 198 (-29.54%) reflect.Value.Seq.func1.1 242 -> 165 (-31.82%) reflect.Value.Seq2.func2 360 -> 285 (-20.83%) reflect.Value.Seq.func4 281 -> 239 (-14.95%) reflect.Value.Seq2.func4 399 -> 284 (-28.82%) reflect.Value.Seq.func2 271 -> 230 (-15.13%) reflect.TypeFor[go.shape.uint64] 33 -> 18 (-45.45%) reflect.Value.Seq.func3 219 -> 178 (-18.72%) reflect [cmd/compile] reflect.Value.Seq2.func2 360 -> 285 (-20.83%) reflect.Value.Seq.func4 281 -> 239 (-14.95%) reflect.Value.Seq.func2 271 -> 230 (-15.13%) reflect.Value.Seq.func1.1 242 -> 165 (-31.82%) reflect.Value.Seq2.func1.1 281 -> 198 (-29.54%) reflect.Value.Seq2.func3 309 -> 245 (-20.71%) reflect.Value.Seq.func3 219 -> 178 (-18.72%) reflect.TypeFor[go.shape.uint64] 33 -> 18 (-45.45%) reflect.Value.Seq2.func4 399 -> 284 (-28.82%) fmt fmt.(*pp).fmtBytes 1723 -> 1691 (-1.86%) database/sql/driver reflect.TypeFor[go.shape.interface 33 -> 18 (-45.45%) database/sql/driver.init 72 -> 57 (-20.83%) Change-Id: I9eb750cf0b7ebf532589f939431feb0a899e42ff Reviewed-on: https://go-review.googlesource.com/c/go/+/701301 Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-09-12 09:34:43 -07:00
Keith Randall	80a2aae922	Revert "cmd/compile: improve stp merging for non-sequent cases" This reverts commit `4c63d798cb`. Reason for revert: Causes miscompilations. See issue 75365. Change-Id: Icd1fcfeb23d2ec524b16eb556030f43875e1c90d Reviewed-on: https://go-review.googlesource.com/c/go/+/702455 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <markfreeman@google.com>	2025-09-10 11:11:11 -07:00
Youlin Feng	a5fa5ea51c	cmd/compile/internal/ssa: expand runtime.memequal for length {3,5,6,7} This CL slightly speeds up strings.HasPrefix when testing constant prefixes of length {3,5,6,7}. goos: linux goarch: amd64 cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz │ old │ new │ │ sec/op │ sec/op vs base │ StringPrefix3-8 11.125n ± 2% 8.539n ± 1% -23.25% (p=0.000 n=20) StringPrefix5-8 11.170n ± 2% 8.700n ± 1% -22.11% (p=0.000 n=20) StringPrefix6-8 11.190n ± 2% 8.655n ± 1% -22.65% (p=0.000 n=20) StringPrefix7-8 11.095n ± 1% 8.878n ± 1% -19.98% (p=0.000 n=20) Change-Id: I510a80d59cf78680b57d68780d35d212d24030e2 Reviewed-on: https://go-review.googlesource.com/c/go/+/700816 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <markfreeman@google.com> Auto-Submit: Keith Randall <khr@golang.org>	2025-09-09 12:10:07 -07:00
Melnikov Denis	4c63d798cb	cmd/compile: improve stp merging for non-sequent cases Original algorithm merges stores with the first mergeable store in the chain, but it misses some cases. Additional reordering stores in increasing order of memory access in the chain allows merging in these cases. Fixes #71987 There are the results of sweet benchmarks and the difference between sizes of sections .text │ old.results │ new.results │ │ sec/op │ sec/op vs base │ BleveIndexBatch100-4 7.614 ± 2% 7.548 ± 1% ~ (p=0.190 n=10) ESBuildThreeJS-4 821.3m ± 0% 819.0m ± 1% ~ (p=0.165 n=10) ESBuildRomeTS-4 206.2m ± 1% 204.4m ± 1% -0.90% (p=0.023 n=10) EtcdPut-4 64.89m ± 1% 64.94m ± 2% ~ (p=0.684 n=10) EtcdSTM-4 318.4m ± 0% 319.2m ± 1% ~ (p=0.631 n=10) GoBuildKubelet-4 157.4 ± 0% 157.6 ± 0% ~ (p=0.105 n=10) GoBuildKubeletLink-4 12.42 ± 2% 12.41 ± 1% ~ (p=0.529 n=10) GoBuildIstioctl-4 124.4 ± 0% 124.4 ± 0% ~ (p=0.579 n=10) GoBuildIstioctlLink-4 8.700 ± 1% 8.693 ± 1% ~ (p=0.912 n=10) GoBuildFrontend-4 46.52 ± 0% 46.50 ± 0% ~ (p=0.971 n=10) GoBuildFrontendLink-4 2.282 ± 1% 2.272 ± 1% ~ (p=0.529 n=10) GoBuildTsgo-4 75.02 ± 1% 75.31 ± 1% ~ (p=0.436 n=10) GoBuildTsgoLink-4 1.229 ± 1% 1.219 ± 1% -0.82% (p=0.035 n=10) GopherLuaKNucleotide-4 34.77 ± 5% 34.31 ± 1% -1.33% (p=0.015 n=10) MarkdownRenderXHTML-4 286.6m ± 0% 285.7m ± 1% ~ (p=0.315 n=10) Tile38QueryLoad-4 657.2µ ± 1% 660.3µ ± 0% ~ (p=0.436 n=10) geomean 2.570 2.563 -0.24% Executable Old .text New .text Change ------------------------------------------------------- benchmark 6504820 6504020 -0.01% bleve-index-bench 3903860 3903636 -0.01% esbuild 4801012 4801172 +0.00% esbuild-bench 1256404 1256340 -0.01% etcd 9188148 9187076 -0.01% etcd-bench 6462228 6461524 -0.01% go 5924468 5923892 -0.01% go-build-bench 1282004 1281940 -0.00% gopher-lua-bench 1639540 1639348 -0.01% markdown-bench 1478452 1478356 -0.01% tile38-bench 2753524 2753300 -0.01% tile38-server 10241380 10240068 -0.01% Change-Id: Ieb4fdfd656aca458f65fc45938de70550632bd13 Reviewed-on: https://go-review.googlesource.com/c/go/+/698097 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: Keith Randall <khr@google.com>	2025-09-09 12:10:01 -07:00
Xiaolin Zhao	f5b20689e9	cmd/compile: optimize loads from readonly globals into constants on loong64 Ref: CL 141118 Update #26498 Change-Id: I9c4ad2bedc4d50bd273bbe9119a898d4fca95e45 Reviewed-on: https://go-review.googlesource.com/c/go/+/700875 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-09-05 08:42:28 -07:00
Xiaolin Zhao	3492e4262b	cmd/compile: simplify specific addition operations using the ADDV16 instruction On loong64, the addi.d instruction can only directly handle 12-bit immediate numbers. If a larger immediate number needs to be processed, it must first be placed in a register, and then the add.d instruction is used to complete the processing of the larger immediate number. If a larger immediate number c satisfies is32Bit(c) && c&0xffff == 0, then the ADDV16 instruction can be used to complete the addition operation. Removes 164 instructions from the go binary on loong64. Change-Id: I404de93cc4eaaa12fe424f5a0d61b03231215d1a Reviewed-on: https://go-review.googlesource.com/c/go/+/700536 Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>	2025-09-05 08:18:04 -07:00
Youlin Feng	df29038486	cmd/compile/internal/ssa: load constant values from abi.PtrType.Elem This CL makes the generated code for reflect.TypeFor as simple as an intrinsic function. Fixes #75203 Change-Id: I7bb48787101f07e77ab5c583292e834c28a028d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/700336 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Keith Randall <khr@golang.org>	2025-09-04 07:25:26 -07:00

1 2 3 4 5 ...

636 commits