runtime: optimize bulkBarrierPreWrite with allocheaders

Currently bulkBarrierPreWrite follows a fairly slow path wherein it calls typePointersOf, which ends up calling into fastForward. This does some fairly heavy computation to move the iterator forward without any assumptions about where it lands at all. It needs to be completely general to support splitting at arbitrary boundaries, for example for scanning oblets. This means that copying objects during the GC mark phase is fairly expensive, and is a regression from before allocheaders. However, in almost all cases bulkBarrierPreWrite and bulkBarrierPreWriteSrcOnly have perfect type information. We can do a lot better in these cases because we're starting on a type-size boundary, which is exactly what the iterator is built around. This change adds the typePointersOfType method which produces a typePointers iterator from a pointer and a type. This change significantly improves the performance of these bulk write barriers, eliminating some performance regressions that were noticed on the perf dashboard. There are still just a couple cases where we have to use the more general typePointersOf calls, but they're fairly rare; most bulk barriers have perfect type information. This change is tested by the GCInfo tests in the runtime and the GCBits tests in the reflect package via an additional check in getgcmask. Results for tile38 before and after allocheaders. There was previous a regression in the p90, now it's gone. Also, the overall win has been boosted slightly. tile38 $ benchstat noallocheaders.results allocheaders.results name old time/op new time/op delta Tile38QueryLoad 481µs ± 1% 468µs ± 1% -2.71% (p=0.000 n=10+10) name old average-RSS-bytes new average-RSS-bytes delta Tile38QueryLoad 6.32GB ± 1% 6.23GB ± 0% -1.38% (p=0.000 n=9+8) name old peak-RSS-bytes new peak-RSS-bytes delta Tile38QueryLoad 6.49GB ± 1% 6.40GB ± 1% -1.38% (p=0.002 n=10+10) name old peak-VM-bytes new peak-VM-bytes delta Tile38QueryLoad 7.72GB ± 1% 7.64GB ± 1% -1.07% (p=0.007 n=10+10) name old p50-latency-ns new p50-latency-ns delta Tile38QueryLoad 212k ± 1% 205k ± 0% -3.02% (p=0.000 n=10+9) name old p90-latency-ns new p90-latency-ns delta Tile38QueryLoad 622k ± 1% 616k ± 1% -1.03% (p=0.005 n=10+10) name old p99-latency-ns new p99-latency-ns delta Tile38QueryLoad 4.55M ± 2% 4.39M ± 2% -3.51% (p=0.000 n=10+10) name old ops/s new ops/s delta Tile38QueryLoad 12.5k ± 1% 12.8k ± 1% +2.78% (p=0.000 n=10+10) Change-Id: I0a48f848eae8777d0fd6769c3a1fe449f8d9d0a6 Reviewed-on: https://go-review.googlesource.com/c/go/+/542219 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-08 06:10:04 +00:00 · 2023-11-14 22:05:53 +00:00 · 2023-11-14 22:05:53 +00:00 · d6ef98b8fa
commit d6ef98b8fa
parent 17eb0a2bac
5 changed files with 237 additions and 26 deletions
--- a/src/runtime/slice.go
+++ b/src/runtime/slice.go
@ -64,7 +64,11 @@ func makeslicecopy(et *_type, tolen int, fromlen int, from unsafe.Pointer) unsaf
 		if copymem > 0 && writeBarrier.enabled {
 			// Only shade the pointers in old.array since we know the destination slice to
 			// only contains nil pointers because it has been cleared during alloc.
-			bulkBarrierPreWriteSrcOnly(uintptr(to), uintptr(from), copymem)
+			//
+			// It's safe to pass a type to this function as an optimization because
+			// from and to only ever refer to memory representing whole values of
+			// type et. See the comment on bulkBarrierPreWrite.
+			bulkBarrierPreWriteSrcOnly(uintptr(to), uintptr(from), copymem, et)
 		}
 	}

@ -247,7 +251,11 @@ func growslice(oldPtr unsafe.Pointer, newLen, oldCap, num int, et *_type) slice
 		if lenmem > 0 && writeBarrier.enabled {
 			// Only shade the pointers in oldPtr since we know the destination slice p
 			// only contains nil pointers because it has been cleared during alloc.
-			bulkBarrierPreWriteSrcOnly(uintptr(p), uintptr(oldPtr), lenmem-et.Size_+et.PtrBytes)
+			//
+			// It's safe to pass a type to this function as an optimization because
+			// from and to only ever refer to memory representing whole values of
+			// type et. See the comment on bulkBarrierPreWrite.
+			bulkBarrierPreWriteSrcOnly(uintptr(p), uintptr(oldPtr), lenmem-et.Size_+et.PtrBytes, et)
 		}
 	}
 	memmove(p, oldPtr, lenmem)