Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
thepudds	4879151d1d	cmd/compile: introduce alias analysis and automatically free non-aliased memory after growslice This CL is part of a set of CLs that attempt to reduce how much work the GC must do. See the design in https://go.dev/design/74299-runtime-freegc This CL updates the compiler to examine append calls to prove whether or not the slice is aliased. If proven unaliased, the compiler automatically inserts a call to a new runtime function introduced with this CL, runtime.growsliceNoAlias, which frees the old backing memory immediately after slice growth is complete and the old storage is logically dead. Two append benchmarks below show promising results, executing up to ~2x faster and up to factor of ~3 memory reduction with this CL. The approach works with multiple append calls for the same slice, including inside loops, and the final slice memory can be escaping, such as in a classic pattern of returning a slice from a function after the slice is built. (The final slice memory is never freed with this CL, though we have other work that tackles that.) An example target for this CL is we automatically free the intermediate memory for the appends in the loop in this function: func f1(input []int) []int { var s []int for _, x := range input { s = append(s, g(x)) // s cannot be aliased here if h(x) { s = append(s, x) // s cannot be aliased here } } return s // slice escapes at end } In this case, the compiler and the runtime collaborate so that the heap allocated backing memory for s is automatically freed after a successful grow. (For the first grow, there is nothing to free, but for the second and subsequent growths, the old heap memory is freed automatically.) The new runtime.growsliceNoAlias is primarily implemented by calling runtime.freegc, which we introduced in CL 673695. The high-level approach here is we step through the IR starting from a slice declaration and look for any operations that either alias the slice or might do so, and treat any IR construct we don't specifically handle as a potential alias (and therefore conservatively fall back to treating the slice as aliased when encountering something not understood). For loops, some additional care is required. We arrange the analysis so that an alias in the body of a loop causes all the appends in that same loop body to be marked aliased, even if the aliasing occurs after the append in the IR: func f2() { var s []int for i := range 10 { s = append(s, i) // aliased due to next line alias = s } } For nested loops, we analyse the nesting appropriately so that for example this append is still proven as non-aliased in the inner loop even though it aliased for the outer loop: func f3() { for range 10 { var s []int for i := range 10 { s = append(s, i) // append using non-aliased slice } alias = s } } A good starting point is the beginning of the test/escape_alias.go file, which starts with ~10 introductory examples with brief comments that attempt to illustrate the high-level approach. For more details, see the new .../internal/escape/alias.go file, especially the (*aliasAnalysis).analyze method. In the first benchmark, an append in a loop builds up a slice from nothing, where the slice elements are each 64 bytes. In the table below, 'count' is the number of appends. With 1 append, there is no opportunity for this CL to free memory. Once there are 2 appends, the growth from 1 element to 2 elements means the compiler-inserted growsliceNoAlias frees the 1-element array, and we see a ~33% reduction in memory use and a small reported speed improvement. As the number of appends increases for example to 5, we are at a ~20% speed improvement and ~45% memory reduction, and so on until we reach ~40% faster and ~50% less memory allocated at the end of the table. There can be variation in the reported numbers based on -randlayout, so this table is for 30 different values of -randlayout with a total n=150. (Even so, there is still some variation, so we probably should not read too much into small changes.) This is with GOAMD64=v3 on a VM that gcc reports is cascadelake. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ Append64Bytes/count=1-4 31.09n ± 2% 31.69n ± 1% +1.95% (n=150) Append64Bytes/count=2-4 73.31n ± 1% 70.27n ± 0% -4.15% (n=150) Append64Bytes/count=3-4 142.7n ± 1% 124.6n ± 1% -12.68% (n=150) Append64Bytes/count=4-4 149.6n ± 1% 127.7n ± 0% -14.64% (n=150) Append64Bytes/count=5-4 277.1n ± 1% 213.6n ± 0% -22.90% (n=150) Append64Bytes/count=6-4 280.7n ± 1% 216.5n ± 1% -22.87% (n=150) Append64Bytes/count=10-4 544.3n ± 1% 386.6n ± 0% -28.97% (n=150) Append64Bytes/count=20-4 1058.5n ± 1% 715.6n ± 1% -32.39% (n=150) Append64Bytes/count=50-4 2.121µ ± 1% 1.404µ ± 1% -33.83% (n=150) Append64Bytes/count=100-4 4.152µ ± 1% 2.736µ ± 1% -34.11% (n=150) Append64Bytes/count=200-4 7.753µ ± 1% 4.882µ ± 1% -37.03% (n=150) Append64Bytes/count=400-4 15.163µ ± 2% 9.273µ ± 1% -38.84% (n=150) geomean 601.8n 455.0n -24.39% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ Append64Bytes/count=1-4 64.00 ± 0% 64.00 ± 0% ~ (n=150) Append64Bytes/count=2-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) Append64Bytes/count=3-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=4-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=5-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=6-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=10-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) Append64Bytes/count=20-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) Append64Bytes/count=50-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) Append64Bytes/count=100-4 15.938Ki ± 0% 8.021Ki ± 0% -49.67% (n=150) Append64Bytes/count=200-4 31.94Ki ± 0% 16.08Ki ± 0% -49.64% (n=150) Append64Bytes/count=400-4 63.94Ki ± 0% 32.33Ki ± 0% -49.44% (n=150) geomean 1.991Ki 1.124Ki -43.54% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ Append64Bytes/count=1-4 1.000 ± 0% 1.000 ± 0% ~ (n=150) Append64Bytes/count=2-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) Append64Bytes/count=3-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=4-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=5-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=6-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=10-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) Append64Bytes/count=20-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) Append64Bytes/count=50-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) Append64Bytes/count=100-4 8.000 ± 0% 1.000 ± 0% -87.50% (n=150) Append64Bytes/count=200-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) Append64Bytes/count=400-4 10.000 ± 0% 1.000 ± 0% -90.00% (n=150) geomean 4.331 1.000 -76.91% The second benchmark is similar, but instead uses an 8-byte integer for the slice element. The first 4 appends in the loop never call into the runtime thanks to the excellent CL 664299 introduced by Keith in Go 1.25 that allows some <= 32 byte dynamically-sized slices to be on the stack, so this CL is neutral for <= 32 bytes. Once the 5th append occurs at count=5, a grow happens via the runtime and heap allocates as normal, but freegc does not yet have anything to free, so we see a small ~1.4ns penalty reported there. But once the second growth happens, the older heap memory is now automatically freed by freegc, so we start to see some benefit in memory reductions and speed improvements, starting at a tiny speed improvement (close to a wash, or maybe noise) by the second growth before count=10, and building up to ~2x faster with ~68% fewer allocated bytes reported. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ AppendInt/count=1-4 2.978n ± 0% 2.969n ± 0% -0.30% (p=0.000 n=150) AppendInt/count=4-4 4.292n ± 3% 4.163n ± 3% ~ (p=0.528 n=150) AppendInt/count=5-4 33.50n ± 0% 34.93n ± 0% +4.25% (p=0.000 n=150) AppendInt/count=10-4 76.21n ± 1% 75.67n ± 0% -0.72% (p=0.000 n=150) AppendInt/count=20-4 150.6n ± 1% 133.0n ± 0% -11.65% (n=150) AppendInt/count=50-4 284.1n ± 1% 225.6n ± 0% -20.59% (n=150) AppendInt/count=100-4 544.2n ± 1% 392.4n ± 1% -27.89% (n=150) AppendInt/count=200-4 1051.5n ± 1% 702.3n ± 0% -33.21% (n=150) AppendInt/count=400-4 2.041µ ± 1% 1.312µ ± 1% -35.70% (n=150) AppendInt/count=1000-4 5.224µ ± 2% 2.851µ ± 1% -45.43% (n=150) AppendInt/count=2000-4 11.770µ ± 1% 6.010µ ± 1% -48.94% (n=150) AppendInt/count=3000-4 17.747µ ± 2% 8.264µ ± 1% -53.44% (n=150) geomean 331.8n 246.4n -25.72% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) AppendInt/count=20-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) AppendInt/count=50-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) AppendInt/count=100-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) AppendInt/count=200-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) AppendInt/count=400-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) AppendInt/count=1000-4 24.56Ki ± 0% 10.05Ki ± 0% -59.07% (n=150) AppendInt/count=2000-4 58.56Ki ± 0% 20.31Ki ± 0% -65.32% (n=150) AppendInt/count=3000-4 85.19Ki ± 0% 27.30Ki ± 0% -67.95% (n=150) geomean ² -42.81% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) AppendInt/count=20-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) AppendInt/count=50-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) AppendInt/count=100-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) AppendInt/count=200-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) AppendInt/count=400-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) AppendInt/count=1000-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) AppendInt/count=2000-4 11.000 ± 0% 1.000 ± 0% -90.91% (n=150) AppendInt/count=3000-4 12.000 ± 0% 1.000 ± 0% -91.67% (n=150) geomean ² -72.76% ² Of course, these are just microbenchmarks, but likely indicate there are some opportunities here. The immediately following CL 712422 tackles inlining and is able to get runtime.freegc working automatically with iterators such as used by slices.Collect, which becomes able to automatically free the intermediate memory from its repeated appends (which earlier in this work required a temporary hand edit to the slices package). For now, we only use the NoAlias version for element types without pointers while waiting on additional runtime support in CL 698515. Updates #74299 Change-Id: I1b9d286aa97c170dcc2e203ec0f8ca72d84e8221 Reviewed-on: https://go-review.googlesource.com/c/go/+/710015 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-11-26 19:04:05 -08:00
khr@golang.org	32f5aadd2f	cmd/compile: stack allocate backing stores during append We can already stack allocate the backing store during append if the resulting backing store doesn't escape. See CL 664299. This CL enables us to often stack allocate the backing store during append even if the result escapes. Typically, for code like: func f(n int) []int { var r []int for i := range n { r = append(r, i) } return r } the backing store for r escapes, but only by returning it. Could we operate with r on the stack for most of its lifeime, and only move it to the heap at the return point? The current implementation of append will need to do an allocation each time it calls growslice. This will happen on the 1st, 2nd, 4th, 8th, etc. append calls. The allocations done by all but the last growslice call will then immediately be garbage. We'd like to avoid doing some of those intermediate allocations if possible. We rewrite the above code by introducing a move2heap operation: func f(n int) []int { var r []int for i := range n { r = append(r, i) } r = move2heap(r) return r } Using the move2heap runtime function, which does: move2heap(r): If r is already backed by heap storage, return r. Otherwise, copy r to the heap and return the copy. Now we can treat the backing store of r allocated at the append site as not escaping. Previous stack allocation optimizations now apply, which can use a fixed-size stack-allocated backing store for r when appending. See the description in cmd/compile/internal/slice/slice.go for how we ensure that this optimization is safe. Change-Id: I81f36e58bade2241d07f67967d8d547fff5302b8 Reviewed-on: https://go-review.googlesource.com/c/go/+/707755 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-11-20 09:19:39 -08:00
Jes Cok	e151db3e06	all: omit unnecessary type conversions Found by github.com/mdempsky/unconvert Change-Id: Ib78cceb718146509d96dbb6da87b27dbaeba1306 GitHub-Last-Rev: `dedf354811` GitHub-Pull-Request: golang/go#74771 Reviewed-on: https://go-review.googlesource.com/c/go/+/690735 Reviewed-by: Mark Freeman <mark@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2025-07-28 11:13:58 -07:00
Michael Pratt	81c92352a7	runtime: move getcallerpc to internal/runtime/sys Moving these intrinsics to a base package enables other internal/runtime packages to use them. For #54766. Change-Id: I0b3eded3bb45af53e3eb5bab93e3792e6a8beb46 Reviewed-on: https://go-review.googlesource.com/c/go/+/613260 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2024-09-17 15:14:14 +00:00
David Chase	fc5073bc15	runtime,internal: move runtime/internal/sys to internal/runtime/sys Cleanup and friction reduction For #65355. Change-Id: Ia14c9dc584a529a35b97801dd3e95b9acc99a511 Reviewed-on: https://go-review.googlesource.com/c/go/+/600436 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>	2024-07-23 19:05:35 +00:00
David Chase	f9eb3e3cd5	runtime,internal: move runtime/internal/math to internal/runtime/math Cleanup and friction reduction. Updates #65355. Change-Id: I6c4fcd409d044c00d16561fe9ed2257877d73f5b Reviewed-on: https://go-review.googlesource.com/c/go/+/600435 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>	2024-07-23 19:05:16 +00:00
Russ Cox	9a8995b8b6	all: document legacy //go:linkname for modules with ≥100 dependents For #67401. Change-Id: I015408a3f437c1733d97160ef2fb5da6d4efcc5c Reviewed-on: https://go-review.googlesource.com/c/go/+/587598 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>	2024-05-29 16:25:21 +00:00
Russ Cox	4cac885741	all: document legacy //go:linkname for modules with ≥200 dependents Ignored these linknames which have not worked for a while: github.com/xtls/xray-core: context.newCancelCtx removed in CL 463999 (Feb 2023) github.com/u-root/u-root: funcPC removed in CL 513837 (Jul 2023) tinygo.org/x/drivers: net.useNetdev never existed For #67401. Change-Id: I9293f4ef197bb5552b431de8939fa94988a060ce Reviewed-on: https://go-review.googlesource.com/c/go/+/587576 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-05-23 01:17:26 +00:00
Russ Cox	bde905af5b	all: document legacy //go:linkname for modules with ≥20,000 dependents For #67401. Change-Id: Icc10ede72547d8020c0ba45e89d954822a4b2455 Reviewed-on: https://go-review.googlesource.com/c/go/+/587218 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-05-23 00:18:55 +00:00
Russ Cox	5fee159bc2	all: document legacy //go:linkname for modules with ≥50,000 dependents Note that this depends on the revert of CL 581395 to move zeroVal back. For #67401. Change-Id: I507c27c2404ad1348aabf1ffa3740e6b1957495b Reviewed-on: https://go-review.googlesource.com/c/go/+/587217 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2024-05-22 21:17:41 +00:00
Pouriya	4c08c12593	runtime: use .Pointers() instead of manual checking Change-Id: Ib78c1513616089f4942297cd17212b1b11871fd5 GitHub-Last-Rev: `f97fe5b5bf` GitHub-Pull-Request: golang/go#65819 Reviewed-on: https://go-review.googlesource.com/c/go/+/565515 Reviewed-by: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2024-03-04 17:34:30 +00:00
Mateusz Poliwczak	968b71bce4	strings: make use of sizeclasses in (*Builder).Grow Fixes #64833 Change-Id: Ice3f5dfab65f5525bc7a6f57ddeaabda8d64dfa3 GitHub-Last-Rev: `38f1d6c19d` GitHub-Pull-Request: golang/go#64835 Reviewed-on: https://go-review.googlesource.com/c/go/+/552135 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-02-19 19:51:15 +00:00
Michael Anthony Knyszek	d6ef98b8fa	runtime: optimize bulkBarrierPreWrite with allocheaders Currently bulkBarrierPreWrite follows a fairly slow path wherein it calls typePointersOf, which ends up calling into fastForward. This does some fairly heavy computation to move the iterator forward without any assumptions about where it lands at all. It needs to be completely general to support splitting at arbitrary boundaries, for example for scanning oblets. This means that copying objects during the GC mark phase is fairly expensive, and is a regression from before allocheaders. However, in almost all cases bulkBarrierPreWrite and bulkBarrierPreWriteSrcOnly have perfect type information. We can do a lot better in these cases because we're starting on a type-size boundary, which is exactly what the iterator is built around. This change adds the typePointersOfType method which produces a typePointers iterator from a pointer and a type. This change significantly improves the performance of these bulk write barriers, eliminating some performance regressions that were noticed on the perf dashboard. There are still just a couple cases where we have to use the more general typePointersOf calls, but they're fairly rare; most bulk barriers have perfect type information. This change is tested by the GCInfo tests in the runtime and the GCBits tests in the reflect package via an additional check in getgcmask. Results for tile38 before and after allocheaders. There was previous a regression in the p90, now it's gone. Also, the overall win has been boosted slightly. tile38 $ benchstat noallocheaders.results allocheaders.results name old time/op new time/op delta Tile38QueryLoad 481µs ± 1% 468µs ± 1% -2.71% (p=0.000 n=10+10) name old average-RSS-bytes new average-RSS-bytes delta Tile38QueryLoad 6.32GB ± 1% 6.23GB ± 0% -1.38% (p=0.000 n=9+8) name old peak-RSS-bytes new peak-RSS-bytes delta Tile38QueryLoad 6.49GB ± 1% 6.40GB ± 1% -1.38% (p=0.002 n=10+10) name old peak-VM-bytes new peak-VM-bytes delta Tile38QueryLoad 7.72GB ± 1% 7.64GB ± 1% -1.07% (p=0.007 n=10+10) name old p50-latency-ns new p50-latency-ns delta Tile38QueryLoad 212k ± 1% 205k ± 0% -3.02% (p=0.000 n=10+9) name old p90-latency-ns new p90-latency-ns delta Tile38QueryLoad 622k ± 1% 616k ± 1% -1.03% (p=0.005 n=10+10) name old p99-latency-ns new p99-latency-ns delta Tile38QueryLoad 4.55M ± 2% 4.39M ± 2% -3.51% (p=0.000 n=10+10) name old ops/s new ops/s delta Tile38QueryLoad 12.5k ± 1% 12.8k ± 1% +2.78% (p=0.000 n=10+10) Change-Id: I0a48f848eae8777d0fd6769c3a1fe449f8d9d0a6 Reviewed-on: https://go-review.googlesource.com/c/go/+/542219 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2023-11-16 05:53:55 +00:00
Michael Anthony Knyszek	38ac7c41aa	runtime: implement experiment to replace heap bitmap with alloc headers This change replaces the 1-bit-per-word heap bitmap for most size classes with allocation headers for objects that contain pointers. The header consists of a single pointer to a type. All allocations with headers are treated as implicitly containing one or more instances of the type in the header. As the name implies, headers are usually stored as the first word of an object. There are two additional exceptions to where headers are stored and how they're used. Objects smaller than 512 bytes do not have headers. Instead, a heap bitmap is reserved at the end of spans for objects of this size. A full word of overhead is too much for these small objects. The bitmap is of the same format of the old bitmap, minus the noMorePtrs bits which are unnecessary. All the objects <512 bytes have a bitmap less than a pointer-word in size, and that was the granularity at which noMorePtrs could stop scanning early anyway. Objects that are larger than 32 KiB (which have their own span) have their headers stored directly in the span, to allow power-of-two-sized allocations to not spill over into an extra page. The full implementation is behind GOEXPERIMENT=allocheaders. The purpose of this change is performance. First and foremost, with headers we no longer have to unroll pointer/scalar data at allocation time for most size classes. Small size classes still need some unrolling, but their bitmaps are small so we can optimize that case fairly well. Larger objects effectively have their pointer/scalar data unrolled on-demand from type data, which is much more compactly represented and results in less TLB pressure. Furthermore, since the headers are usually right next to the object and where we're about to start scanning, we get an additional temporal locality benefit in the data cache when looking up type metadata. The pointer/scalar data is now effectively unrolled on-demand, but it's also simpler to unroll than before; that unrolled data is never written anywhere, and for arrays we get the benefit of retreading the same data per element, as opposed to looking it up from scratch for each pointer-word of bitmap. Lastly, because we no longer have a heap bitmap that spans the entire heap, there's a flat 1.5% memory use reduction. This is balanced slightly by some objects possibly being bumped up a size class, but most objects are not tightly optimized to size class sizes so there's some memory to spare, making the header basically free in those cases. See the follow-up CL which turns on this experiment by default for benchmark results. (CL 538217.) Change-Id: I4c9034ee200650d06d8bdecd579d5f7c1bbf1fc5 Reviewed-on: https://go-review.googlesource.com/c/go/+/437955 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2023-11-09 19:58:08 +00:00
qiulaidongfeng	53827ba49e	cmd/compile,runtime: remove runtime.mulUintptr For #48798 Change-Id: I3e928d3921cfd5a7bf35b23d0ae6442aa6d2d482 GitHub-Last-Rev: `b101a8a54f` GitHub-Pull-Request: golang/go#63349 Reviewed-on: https://go-review.googlesource.com/c/go/+/532355 TryBot-Result: Gopher Robot <gobot@golang.org> Commit-Queue: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2023-10-03 16:35:23 +00:00
Egon Elbre	1d84b02b22	runtime: introduce nextslicecap This allows to reuse the slice cap computation across specialized growslice funcs. Updates #49480 Change-Id: Ie075d9c3075659ea14c11d51a9cd4ed46aa0e961 Reviewed-on: https://go-review.googlesource.com/c/go/+/495876 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Egon Elbre <egonelbre@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@golang.org>	2023-09-04 17:50:50 +00:00
Egon Elbre	c56f463412	runtime: optimize growslice This is tiny optimization for growslice, which is probably too small to measure easily. Move the for loop to avoid multiple checks inside the loop. Also, use >> 2 instead of /4, which generates fewer instructions. Change-Id: I9ab09bdccb56f98ab22073f23d9e102c252238c7 Reviewed-on: https://go-review.googlesource.com/c/go/+/493795 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Egon Elbre <egonelbre@gmail.com> Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2023-09-04 17:50:46 +00:00
David Chase	bdc6ae579a	internal/abi: refactor (basic) type struct into one definition This touches a lot of files, which is bad, but it is also good, since there's N copies of this information commoned into 1. The new files in internal/abi are copied from the end of the stack; ultimately this will all end up being used. Change-Id: Ia252c0055aaa72ca569411ef9f9e96e3d610889e Reviewed-on: https://go-review.googlesource.com/c/go/+/462995 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2023-05-05 14:59:28 +00:00
Joe Tsai	132fae93b7	bytes, strings: avoid unnecessary zero initialization Add bytealg.MakeNoZero that specially allocates a []byte without zeroing it. It assumes the caller will populate every byte. From within the bytes and strings packages, we can use bytealg.MakeNoZero in a way where our logic ensures that the entire slice is overwritten such that uninitialized bytes are never leaked to the end user. We use bytealg.MakeNoZero from within the following functions: * bytes.Join * bytes.Repeat * bytes.ToUpper * bytes.ToLower * strings.Builder.Grow The optimization in strings.Builder transitively benefits the following: * strings.Join * strings.Map * strings.Repeat * strings.ToUpper * strings.ToLower * strings.ToValidUTF8 * strings.Replace * any user logic that depends on strings.Builder This optimization is especially notable on large buffers that do not fit in the CPU cache, such that the cost of runtime.memclr and runtime.memmove are non-trivial since they are both limited by the relatively slow speed of physical RAM. Performance: RepeatLarge/256/1 66.0ns ± 3% 64.5ns ± 1% ~ (p=0.095 n=5+5) RepeatLarge/256/16 55.4ns ± 5% 53.1ns ± 3% -4.17% (p=0.016 n=5+5) RepeatLarge/512/1 95.5ns ± 7% 87.1ns ± 2% -8.78% (p=0.008 n=5+5) RepeatLarge/512/16 84.4ns ± 9% 76.2ns ± 5% -9.73% (p=0.016 n=5+5) RepeatLarge/1024/1 161ns ± 4% 144ns ± 7% -10.45% (p=0.016 n=5+5) RepeatLarge/1024/16 148ns ± 3% 141ns ± 5% ~ (p=0.095 n=5+5) RepeatLarge/2048/1 296ns ± 7% 288ns ± 5% ~ (p=0.841 n=5+5) RepeatLarge/2048/16 298ns ± 8% 281ns ± 5% ~ (p=0.151 n=5+5) RepeatLarge/4096/1 593ns ± 8% 539ns ± 8% -8.99% (p=0.032 n=5+5) RepeatLarge/4096/16 568ns ±12% 526ns ± 7% ~ (p=0.056 n=5+5) RepeatLarge/8192/1 1.15µs ± 8% 1.08µs ±12% ~ (p=0.095 n=5+5) RepeatLarge/8192/16 1.12µs ± 4% 1.07µs ± 7% ~ (p=0.310 n=5+5) RepeatLarge/8192/4097 1.77ns ± 1% 1.76ns ± 2% ~ (p=0.310 n=5+5) RepeatLarge/16384/1 2.06µs ± 7% 1.94µs ± 5% ~ (p=0.222 n=5+5) RepeatLarge/16384/16 2.02µs ± 4% 1.92µs ± 6% ~ (p=0.095 n=5+5) RepeatLarge/16384/4097 1.50µs ±15% 1.44µs ±11% ~ (p=0.802 n=5+5) RepeatLarge/32768/1 3.90µs ± 8% 3.65µs ±11% ~ (p=0.151 n=5+5) RepeatLarge/32768/16 3.92µs ±14% 3.68µs ±12% ~ (p=0.222 n=5+5) RepeatLarge/32768/4097 3.71µs ± 5% 3.43µs ± 4% -7.54% (p=0.032 n=5+5) RepeatLarge/65536/1 7.47µs ± 8% 6.88µs ± 9% ~ (p=0.056 n=5+5) RepeatLarge/65536/16 7.29µs ± 4% 6.74µs ± 6% -7.60% (p=0.016 n=5+5) RepeatLarge/65536/4097 7.90µs ±11% 6.34µs ± 5% -19.81% (p=0.008 n=5+5) RepeatLarge/131072/1 17.0µs ±18% 14.1µs ± 6% -17.32% (p=0.008 n=5+5) RepeatLarge/131072/16 15.2µs ± 2% 16.2µs ±17% ~ (p=0.151 n=5+5) RepeatLarge/131072/4097 15.7µs ± 6% 14.8µs ±11% ~ (p=0.095 n=5+5) RepeatLarge/262144/1 30.4µs ± 5% 31.4µs ±13% ~ (p=0.548 n=5+5) RepeatLarge/262144/16 30.1µs ± 4% 30.7µs ±11% ~ (p=1.000 n=5+5) RepeatLarge/262144/4097 31.2µs ± 7% 32.7µs ±13% ~ (p=0.310 n=5+5) RepeatLarge/524288/1 67.5µs ± 9% 63.7µs ± 3% ~ (p=0.095 n=5+5) RepeatLarge/524288/16 67.2µs ± 5% 62.9µs ± 6% ~ (p=0.151 n=5+5) RepeatLarge/524288/4097 65.5µs ± 4% 65.2µs ±18% ~ (p=0.548 n=5+5) RepeatLarge/1048576/1 141µs ± 6% 137µs ±14% ~ (p=0.421 n=5+5) RepeatLarge/1048576/16 140µs ± 2% 134µs ±11% ~ (p=0.222 n=5+5) RepeatLarge/1048576/4097 141µs ± 3% 134µs ±10% ~ (p=0.151 n=5+5) RepeatLarge/2097152/1 258µs ± 2% 271µs ±10% ~ (p=0.222 n=5+5) RepeatLarge/2097152/16 263µs ± 6% 273µs ± 9% ~ (p=0.151 n=5+5) RepeatLarge/2097152/4097 270µs ± 2% 277µs ± 6% ~ (p=0.690 n=5+5) RepeatLarge/4194304/1 684µs ± 3% 467µs ± 6% -31.69% (p=0.008 n=5+5) RepeatLarge/4194304/16 682µs ± 1% 471µs ± 7% -30.91% (p=0.008 n=5+5) RepeatLarge/4194304/4097 685µs ± 2% 465µs ±20% -32.12% (p=0.008 n=5+5) RepeatLarge/8388608/1 1.50ms ± 1% 1.16ms ± 8% -22.63% (p=0.008 n=5+5) RepeatLarge/8388608/16 1.50ms ± 2% 1.22ms ±17% -18.49% (p=0.008 n=5+5) RepeatLarge/8388608/4097 1.51ms ± 7% 1.33ms ±11% -11.56% (p=0.008 n=5+5) RepeatLarge/16777216/1 3.48ms ± 4% 2.66ms ±13% -23.76% (p=0.008 n=5+5) RepeatLarge/16777216/16 3.37ms ± 3% 2.57ms ±13% -23.72% (p=0.008 n=5+5) RepeatLarge/16777216/4097 3.38ms ± 9% 2.50ms ±11% -26.16% (p=0.008 n=5+5) RepeatLarge/33554432/1 7.74ms ± 1% 4.70ms ±19% -39.31% (p=0.016 n=4+5) RepeatLarge/33554432/16 7.90ms ± 4% 4.78ms ± 9% -39.50% (p=0.008 n=5+5) RepeatLarge/33554432/4097 7.80ms ± 2% 4.86ms ±11% -37.60% (p=0.008 n=5+5) RepeatLarge/67108864/1 16.4ms ± 3% 9.7ms ±15% -41.29% (p=0.008 n=5+5) RepeatLarge/67108864/16 16.5ms ± 1% 9.9ms ±15% -39.83% (p=0.008 n=5+5) RepeatLarge/67108864/4097 16.5ms ± 1% 11.0ms ±18% -32.95% (p=0.008 n=5+5) RepeatLarge/134217728/1 35.2ms ±12% 19.2ms ±10% -45.58% (p=0.008 n=5+5) RepeatLarge/134217728/16 34.6ms ± 6% 19.3ms ± 7% -44.07% (p=0.008 n=5+5) RepeatLarge/134217728/4097 33.2ms ± 2% 19.3ms ±14% -41.79% (p=0.008 n=5+5) RepeatLarge/268435456/1 70.9ms ± 2% 36.2ms ± 5% -48.87% (p=0.008 n=5+5) RepeatLarge/268435456/16 77.4ms ± 7% 36.1ms ± 8% -53.33% (p=0.008 n=5+5) RepeatLarge/268435456/4097 75.8ms ± 4% 37.0ms ± 4% -51.15% (p=0.008 n=5+5) RepeatLarge/536870912/1 163ms ±14% 77ms ± 9% -52.94% (p=0.008 n=5+5) RepeatLarge/536870912/16 156ms ± 4% 76ms ± 6% -51.42% (p=0.008 n=5+5) RepeatLarge/536870912/4097 151ms ± 2% 76ms ± 6% -49.64% (p=0.008 n=5+5) RepeatLarge/1073741824/1 293ms ± 5% 149ms ± 8% -49.18% (p=0.008 n=5+5) RepeatLarge/1073741824/16 308ms ± 9% 150ms ± 8% -51.19% (p=0.008 n=5+5) RepeatLarge/1073741824/4097 299ms ± 5% 151ms ± 6% -49.51% (p=0.008 n=5+5) Updates #57153 Change-Id: I024553b7e676d6da6408278109ac1fa8def0a802 Reviewed-on: https://go-review.googlesource.com/c/go/+/456336 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Joseph Tsai <joetsai@digital-static.net> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>	2023-02-27 19:11:00 +00:00
Youlin Feng	7ae652b7c0	runtime: replace all uses of CtzXX with TrailingZerosXX Replace all uses of Ctz64/32/8 with TrailingZeros64/32/8, because they are the same and maybe duplicated. Also renamed CtzXX functions in 386 assembly code. Change-Id: I19290204858083750f4be589bb0923393950ae6d Reviewed-on: https://go-review.googlesource.com/c/go/+/438935 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Auto-Submit: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2022-10-18 18:06:27 +00:00
Joe Tsai	61f0409c31	reflect: add Value.Grow The Grow method is like the proposed slices.Grow function in that it ensures that the slice has enough capacity to append n elements without allocating. The implementation of Grow is a thin wrapper over runtime.growslice. This also changes Append and AppendSlice to use growslice under the hood. Fixes #48000 Change-Id: I992a58584a2ff1448c1c2bc0877fe76073609111 Reviewed-on: https://go-review.googlesource.com/c/go/+/389635 Run-TryBot: Joseph Tsai <joetsai@digital-static.net> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>	2022-10-15 17:02:11 +00:00
Keith Randall	dd323fe205	cmd/compile,runtime: redo growslice calling convention Instead of passing the original length and the new length, pass the new length and the length increment. Also use the new length in all the post-growslice calculations so that the original length is dead and does not need to be spilled/restored around the growslice. old: growslice(typ, oldPtr, oldLen, oldCap, newLen) (newPtr, newLen, newCap) new: growslice(oldPtr, newLen, oldCap, inc, typ) (newPtr, newLen, newCap) where inc = # of elements added = newLen-oldLen Also move the element type to the end of the call. This makes register allocation more efficient, as oldPtr and newPtr can often be in the same register (e.g. AX on amd64) and thus the phi takes no instructions. Makes the go binary 0.3% smaller. Change-Id: I7295a60227dbbeecec2bf039eeef2950a72df760 Reviewed-on: https://go-review.googlesource.com/c/go/+/418554 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2022-09-01 21:28:35 +00:00
cuiweixie	c708532936	cmd/compile: add support for unsafe.{String,StringData,SliceData} For #53003 Change-Id: I13a761daca8b433b271a1feb711c103d9820772d Reviewed-on: https://go-review.googlesource.com/c/go/+/423774 Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: hopehook <hopehook@golangcn.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-31 17:15:15 +00:00
Cuong Manh Le	a719a78c1b	runtime: add and use runtime/internal/sys.NotInHeap Updates #46731 Change-Id: Ic2208c8bb639aa1e390be0d62e2bd799ecf20654 Reviewed-on: https://go-review.googlesource.com/c/go/+/421878 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2022-08-19 00:29:18 +00:00
Keith Randall	6a9c674a09	runtime: redo heap bitmap [this is a retry of CL 407035 + its revert CL 422395. The content is unchanged] Use just 1 bit per word to record the ptr/nonptr bitmap. Use word-sized operations to manipulate the bitmap, so we can operate on up to 64 ptr/nonptr bits at a time. Use a separate bitmap, one bit per word of the ptr/nonptr bitmap, to encode a no-more-pointers signal. Since we can check 64 ptr/nonptr bits at once, knowing the exact last pointer location is not necessary. As a followon CL, we should make the gcdata bitmap an array of uintptr instead of an array of byte, so we can load 64 bits of it at once. Similarly for the processing of gc programs. Change-Id: Ica5eb622f5b87e647be64f471d67b02732ef8be6 Reviewed-on: https://go-review.googlesource.com/c/go/+/422634 Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2022-08-16 20:39:36 +00:00
Keith Randall	f19f6c79e4	runtime: fix gofmt error Introduced in https://go-review.googlesource.com/c/go/+/419755 Change-Id: I7ca353d495dd7e833e46b3eeb972eac38b3a7a24 Reviewed-on: https://go-review.googlesource.com/c/go/+/422474 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: xie cui <523516579@qq.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com>	2022-08-10 16:27:29 +00:00
Keith Randall	ad0287f496	Revert "runtime: redo heap bitmap" This reverts commit `b589208c8c`. Reason for revert: Bug somewhere in this code, causing wasm and maybe linux/386 to fail. Change-Id: I5e1e501d839584e0219271bb937e94348f83c11f Reviewed-on: https://go-review.googlesource.com/c/go/+/422395 Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-09 16:10:10 +00:00
Keith Randall	2493072db6	cmd/compile: avoid assignment conversion in append(a, b...) There's no need for a and b to match types. The typechecker already ensured that a and b are both slices with the same base type, or a and b are (possibly named) []byte and string. The optimization to treat append(b, make([], ...)) as a zeroing slice extension doesn't fire when there's a OCONVNOP wrapping the make. Fixes #53888 Change-Id: Ied871ed0bbb8e4a4b35d280c71acbab8103691bc Reviewed-on: https://go-review.googlesource.com/c/go/+/418475 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2022-08-08 16:58:57 +00:00
Keith Randall	b589208c8c	runtime: redo heap bitmap Use just 1 bit per word to record the ptr/nonptr bitmap. Use word-sized operations to manipulate the bitmap, so we can operate on up to 64 ptr/nonptr bits at a time. Use a separate bitmap, one bit per word of the ptr/nonptr bitmap, to encode a no-more-pointers signal. Since we can check 64 ptr/nonptr bits at once, knowing the exact last pointer location is not necessary. This cleans up the bitmap implementation significantly, which will hopefully make it faster. TODO: measure As a followon CL, we should make the gcdata bitmap an array of uintptr instead of an array of byte, so we can load 64 bits of it at once. Similarly for the processing of gc programs. Change-Id: I18151b1876d9543599800dec51e2a1b19df97d49 Reviewed-on: https://go-review.googlesource.com/c/go/+/407035 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com>	2022-08-08 16:57:33 +00:00
cuiweixie	52d0667e6b	cmd/compile,runtime: panic when unsafe.Slice param is nil and > 0 Fixes #54092 Change-Id: Ib917922ed36ee5410e5515f812737203c44f46ae GitHub-Last-Rev: `dfd0c3883c` GitHub-Pull-Request: golang/go#54107 Reviewed-on: https://go-review.googlesource.com/c/go/+/419755 Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-08 16:37:49 +00:00
Cuong Manh Le	579902d0b1	cmd/compile,runtime: open code unsafe.Slice So prevent heavy runtime call overhead, and the compiler will have a chance to optimize the bound check. With this optimization, changing runtime/stack.go to use unsafe.Slice no longer negatively impacts stack copying performance: name old time/op new time/op delta StackCopyWithStkobj-8 16.3ms ± 6% 16.5ms ± 5% ~ (p=0.382 n=8+8) name old alloc/op new alloc/op delta StackCopyWithStkobj-8 17.0B ± 0% 17.0B ± 0% ~ (all equal) name old allocs/op new allocs/op delta StackCopyWithStkobj-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Fixes #48798 Change-Id: I731a9a4abd6dd6846f44eece7f86025b7bb1141b Reviewed-on: https://go-review.googlesource.com/c/go/+/362934 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2022-05-11 04:25:16 +00:00
Ian Lance Taylor	d588f48770	runtime: change sys.PtrSize to goarch.PtrSize in comments The code was updated, the comments were not. Change-Id: If387779f3abd5e8a1b487fe34c33dcf9ce5fa7ff Reviewed-on: https://go-review.googlesource.com/c/go/+/383495 Trust: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-02-05 06:14:58 +00:00
fanzha02	6f327f7b88	runtime, syscall: add calls to asan functions Add explicit address sanitizer instrumentation to the runtime and syscall packages. The compiler does not instrument the runtime package. It does instrument the syscall package, but we need to add a couple of cases that it can't see. Refer to the implementation of the asan malloc runtime library, this patch also allocates extra memory as the redzone, around the returned memory region, and marks the redzone as unaddressable to detect the overflows or underflows. Updates #44853. Change-Id: I2753d1cc1296935a66bf521e31ce91e35fcdf798 Reviewed-on: https://go-review.googlesource.com/c/go/+/298614 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Trust: fannie zhang <Fannie.Zhang@arm.com>	2021-11-02 05:35:11 +00:00
Matthew Dempsky	40f82f8a09	unsafe: optimize Slice bounds checking This reduces the number of branches to bounds check non-empty slices from 5 to 3. It does also increase the number of branches to handle empty slices from 1 to 3; but for non-panicking calls, they should all be predictable. Updates #48798. Change-Id: I3ffa66857096486f4dee417e1a66eb8fdf7a3777 Reviewed-on: https://go-review.googlesource.com/c/go/+/355490 Trust: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2021-10-13 18:15:48 +00:00
Matthew Dempsky	4efa216c9d	unsafe: allow unsafe.Slice up to end of address space Allow the user to construct slices that are larger than the Go heap as long as they don't overflow the address space. Updates #48798. Change-Id: I659c8334d04676e1f253b9c3cd499eab9b9f989a Reviewed-on: https://go-review.googlesource.com/c/go/+/355489 Trust: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2021-10-13 18:15:16 +00:00
Keith Randall	2dda92ff6f	runtime: make slice growth formula a bit smoother Instead of growing 2x for < 1024 elements and 1.25x for >= 1024 elements, use a somewhat smoother formula for the growth factor. Start reducing the growth factor after 256 elements, but slowly. starting cap growth factor 256 2.0 512 1.63 1024 1.44 2048 1.35 4096 1.30 (Note that the real growth factor, both before and now, is somewhat larger because we round up to the next size class.) This CL also makes the growth monotonic (larger initial capacities make larger final capacities, which was not true before). See discussion at https://groups.google.com/g/golang-nuts/c/UaVlMQ8Nz3o 256 was chosen as the threshold to roughly match the total number of reallocations when appending to eventually make a very large slice. (We allocate smaller when appending to capacities [256,1024] and larger with capacities [1024,...]). Change-Id: I876df09fdc9ae911bb94e41cb62675229cb10512 Reviewed-on: https://go-review.googlesource.com/c/go/+/347917 Trust: Keith Randall <khr@golang.org> Trust: Martin Möhrmann <martin@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Martin Möhrmann <martin@golang.org>	2021-09-27 20:53:51 +00:00
Matthew Dempsky	ad7e5b219e	[dev.typeparams] all: merge master (`4711bf3`) into dev.typeparams Conflicts: - src/cmd/compile/internal/walk/builtin.go On dev.typeparams, CL 330194 changed OCHECKNIL to not require manual SetTypecheck(1) anymore; while on master, CL 331070 got rid of the OCHECKNIL altogether by moving the check into the runtime support functions. - src/internal/buildcfg/exp.go On master, CL 331109 refactored the logic for parsing the GOEXPERIMENT string, so that it could be more easily reused by cmd/go; while on dev.typeparams, several CLs tweaked the regabi experiment defaults. Merge List: + 2021-06-30 `4711bf30e5` doc/go1.17: linkify "language changes" in the runtime section + 2021-06-30 `ed56ea73e8` path/filepath: deflake TestEvalSymlinksAboveRoot on darwin + 2021-06-30 `c080d0323b` cmd/dist: pass -Wno-unknown-warning-option in swig_callback_lto + 2021-06-30 `7d0e9e6e74` image/gif: fix typo in the comment (io.ReadByte -> io.ByteReader) + 2021-06-30 `0fa3265fe1` os: change example to avoid deprecated function + 2021-06-30 `d19a53338f` image: add Uniform.RGBA64At and Rectangle.RGBA64At + 2021-06-30 `c45e800e0c` crypto/x509: don't fail on optional auth key id fields + 2021-06-29 `f9d50953b9` net: fix failure of TestCVE202133195 + 2021-06-29 `e294b8a49e` doc/go1.17: fix typo "MacOS" -> "macOS" + 2021-06-29 `3463852b76` math/big: fix typo of comment (`BytesScanner` to `ByteScanner`) + 2021-06-29 `fd4b587da3` cmd/compile: suppress details error for invalid variadic argument type + 2021-06-29 `e2e05af6e1` cmd/internal/obj/arm64: fix an encoding error of CMPW instruction + 2021-06-28 `4bb0847b08` cmd/compile,runtime: change unsafe.Slice((T)(nil), 0) to return []T(nil) + 2021-06-28 `1519271a93` spec: change unsafe.Slice((T)(nil), 0) to return []T(nil) + 2021-06-28 `5385e2386b` runtime/internal/atomic: drop Cas64 pointer indirection in comments + 2021-06-28 `956c81bfe6` cmd/go: add GOEXPERIMENT to `go env` output + 2021-06-28 `a1d27269d6` cmd/go: prep for 'go env' refactoring + 2021-06-28 `901510ed4e` cmd/link/internal/ld: skip the windows ASLR test when CGO_ENABLED=0 + 2021-06-28 `361159c055` cmd/cgo: fix 'see gmp.go' to 'see doc.go' + 2021-06-27 `c95464f0ea` internal/buildcfg: refactor GOEXPERIMENT parsing code somewhat + 2021-06-25 `ed01ceaf48` runtime/race: use race build tag on syso_test.go + 2021-06-25 `d1916e5e84` go/types: in TestCheck/issues.src, import regexp/syntax instead of cmd/compile/internal/syntax + 2021-06-25 `5160896c69` go/types: in TestStdlib, import from source instead of export data + 2021-06-25 `d01bc571f7` runtime: make ncgocall a global counter Change-Id: I1ce4a3b3ff7c824c67ad66dd27d9d5f1d25c0023	2021-06-30 18:28:34 -07:00
Matthew Dempsky	4bb0847b08	cmd/compile,runtime: change unsafe.Slice((*T)(nil), 0) to return []T(nil) This CL removes the unconditional OCHECKNIL check added in walkUnsafeSlice by instead passing it as a pointer to runtime.unsafeslice, and hiding the check behind a `len == 0` check. While here, this CL also implements checkptr functionality for unsafe.Slice and disallows use of unsafe.Slice with //go:notinheap types. Updates #46742. Change-Id: I743a445ac124304a4d7322a7fe089c4a21b9a655 Reviewed-on: https://go-review.googlesource.com/c/go/+/331070 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2021-06-28 23:31:13 +00:00
Michael Anthony Knyszek	9c58e399a4	[dev.typeparams] runtime: fix import sort order [generated] [git-generate] cd src/runtime goimports -w *.go Change-Id: I1387af0f2fd1a213dc2f4c122e83a8db0fcb15f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/329189 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org>	2021-06-17 20:42:23 +00:00
Michael Anthony Knyszek	6d85891b29	[dev.typeparams] runtime: replace uses of runtime/internal/sys.PtrSize with internal/goarch.PtrSize [generated] [git-generate] cd src/runtime/internal/math gofmt -w -r "sys.PtrSize -> goarch.PtrSize" . goimports -w .go cd ../.. gofmt -w -r "sys.PtrSize -> goarch.PtrSize" . goimports -w .go Change-Id: I43491cdd54d2e06d4d04152b3d213851b7d6d423 Reviewed-on: https://go-review.googlesource.com/c/go/+/328337 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2021-06-17 18:54:48 +00:00
Cherry Mui	626e89c261	[dev.typeparams] runtime: replace funcPC with internal/abi.FuncPCABIInternal At this point all funcPC references are ABIInternal functions. Replace with the intrinsics. Change-Id: I3ba7e485c83017408749b53f92877d3727a75e27 Reviewed-on: https://go-review.googlesource.com/c/go/+/321954 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2021-05-21 22:40:36 +00:00
Matthew Dempsky	fadad851a3	cmd/compile: implement unsafe.Add and unsafe.Slice Updates #19367. Updates #40481. Change-Id: Iabd2afdd0d520e5d68fd9e6dedd013335a4b3886 Reviewed-on: https://go-review.googlesource.com/c/go/+/312214 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Trust: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>	2021-05-02 20:38:13 +00:00
Keith Randall	2333c6299f	runtime: use old capacity to decide on append growth regime We grow the backing store on append by 2x for small sizes and 1.25x for large sizes. The threshold we use for picking the growth factor used to depend on the old length, not the old capacity. That's kind of unfortunate, because then doing append(s, 0, 0) and append(append(s, 0), 0) do different things. (If s has one more spot available, then the former expression chooses its growth based on len(s) and the latter on len(s)+1.) If we instead use the old capacity, we get more consistent behavior. (Both expressions use len(s)+1 == cap(s) to decide.) Fixes #41239 Change-Id: I40686471d256edd72ec92aef973a89b52e235d4b Reviewed-on: https://go-review.googlesource.com/c/go/+/257338 Trust: Keith Randall <khr@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-09-25 03:59:54 +00:00
Martin Möhrmann	790fa1c546	cmd/compile: unify reflect, string and slice copy runtime functions Use a common runtime slicecopy function to copy strings or slices into slices. This deduplicates similar code previously used in reflect.slicecopy and runtime.stringslicecopy. Change-Id: I09572ff0647a9e12bb5c6989689ce1c43f16b7f1 Reviewed-on: https://go-review.googlesource.com/c/go/+/254658 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2020-09-16 04:37:14 +00:00
Martin Möhrmann	6ffca22602	runtime: do not attempt bulkBarrierPreWrite when dst slice length is zero If dst slice length is zero in makeslicecopy then the called mallocgc is using a fast path to only return a pointer to runtime.zerobase. There may be no heapBits for that address readable by bulkBarrierPreWriteSrcOnly which will cause a panic. Protect against this by not calling bulkBarrierPreWriteSrcOnly if there is nothing to copy. This is the case for all cases where the length of the destination slice is zero. runtime.growslice and runtime.typedslicecopy have fast paths that do not call bulkBarrierPreWrite for zero copy lengths either. Fixes #38929 Change-Id: I78ece600203a0a8d24de5b6c9eef56f605d44e99 Reviewed-on: https://go-review.googlesource.com/c/go/+/232800 Reviewed-by: Keith Randall <khr@golang.org>	2020-05-07 23:24:49 +00:00
Martin Möhrmann	6ed4661807	cmd/compile: optimize make+copy pattern to avoid memclr match: m = make([]T, x); copy(m, s) for pointer free T and x==len(s) rewrite to: m = mallocgc(xelemsize(T), nil, false); memmove(&m, &s, xelemsize(T)) otherwise rewrite to: m = makeslicecopy([]T, x, s) This avoids memclear and shading of pointers in the newly created slice before the copy. With this CL "s" is only be allowed to bev a variable and not a more complex expression. This restriction could be lifted in future versions of this optimization when it can be proven that "s" is not referencing "m". Triggers 450 times during make.bash.. Reduces go binary size by ~8 kbyte. name old time/op new time/op delta MakeSliceCopy/mallocmove/Byte 71.1ns ± 1% 65.8ns ± 0% -7.49% (p=0.000 n=10+9) MakeSliceCopy/mallocmove/Int 71.2ns ± 1% 66.0ns ± 0% -7.27% (p=0.000 n=10+8) MakeSliceCopy/mallocmove/Ptr 104ns ± 4% 99ns ± 1% -5.13% (p=0.000 n=10+10) MakeSliceCopy/makecopy/Byte 70.3ns ± 0% 68.0ns ± 0% -3.22% (p=0.000 n=10+9) MakeSliceCopy/makecopy/Int 70.3ns ± 0% 68.5ns ± 1% -2.59% (p=0.000 n=9+10) MakeSliceCopy/makecopy/Ptr 102ns ± 0% 99ns ± 1% -2.97% (p=0.000 n=9+9) MakeSliceCopy/nilappend/Byte 75.4ns ± 0% 74.9ns ± 2% -0.63% (p=0.015 n=9+9) MakeSliceCopy/nilappend/Int 75.6ns ± 0% 76.4ns ± 3% ~ (p=0.245 n=9+10) MakeSliceCopy/nilappend/Ptr 107ns ± 0% 108ns ± 1% +0.93% (p=0.005 n=9+10) Fixes #26252 Change-Id: Iec553dd1fef6ded16197216a472351c8799a8e71 Reviewed-on: https://go-review.googlesource.com/c/go/+/146719 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-05-07 17:50:24 +00:00
Josh Bleecher Snyder	b6feb03b24	cmd/compile,runtime: pass only ptr and len to some runtime calls Some runtime calls accept a slice, but only use ptr and len. This change modifies most such routines to accept only ptr and len. After this change, the only runtime calls that accept an unnecessary cap arg are concatstrings and slicerunetostring. Neither is particularly common, and both are complicated to modify. Negligible compiler performance impact. Shrinks binaries a little. There are only a few regressions; the one I investigated was due to register allocation fluctuation. Passes 'go test -race std cmd', modulo #38265 and #38266. Wow, does that take a long time to run. Updates #36890 file before after Δ % compile 19655024 19655152 +128 +0.001% cover 5244840 5236648 -8192 -0.156% dist 3662376 3658280 -4096 -0.112% link 6680056 6675960 -4096 -0.061% pprof 14789844 14777556 -12288 -0.083% test2json 2824744 2820648 -4096 -0.145% trace 11647876 11639684 -8192 -0.070% vet 8260472 8256376 -4096 -0.050% total 115163736 115118808 -44928 -0.039% Change-Id: Idb29fa6a81d6a82bfd3b65740b98cf3275ca0a78 Reviewed-on: https://go-review.googlesource.com/c/go/+/227163 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-04-08 22:19:53 +00:00
Ian Lance Taylor	08dd4ad7e3	runtime: only check for pointers up to ptrdata, not size Change-Id: I166cf253b7f2483d652c98d2fba36c380e2f3347 Reviewed-on: https://go-review.googlesource.com/c/go/+/227177 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2020-04-08 04:18:04 +00:00
Keith Randall	089e482b3d	runtime: reorder race detector calls in slicecopy In rare circumstances, this helps report a race which would otherwise go undetected. Fixes #36794 Change-Id: I8a3c9bd6fc34efa51516393f7ee72531c34fb073 Reviewed-on: https://go-review.googlesource.com/c/go/+/220685 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2020-02-25 23:41:03 +00:00
Ville Skyttä	440f7d6404	all: fix a bunch of misspellings Change-Id: I5b909df0fd048cd66c5a27fca1b06466d3bcaac7 GitHub-Last-Rev: `778c5d2131` GitHub-Pull-Request: golang/go#35624 Reviewed-on: https://go-review.googlesource.com/c/go/+/207421 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2019-11-15 21:04:43 +00:00

1 2

98 commits