runtime: eliminate false positives in ctrlGroupMatchH2 on ARM64

The generic implementation of ctrlGroupMatchH2 uses a well-known
bit-parallel matching trick:

    v := g ^ (lsb * h)
    ((v - lsb) &^ v) & msb

This detects zero bytes in (g ^ h), but can produce false positives in
rare cases due to cross-byte borrow during the subtraction. In
particular, when a byte equals h and the following byte equals h+1
(e.g. 0x02, 0x03), the borrow chain may cause both bytes to be reported
as matches.

These false positives are benign (filtered by subsequent key comparison)
but introduce unnecessary probes.

This change rewrites ctrlGroupMatchH2 on ARM64 using an alternative
bit-parallel formulation that avoids cross-byte borrow propagation:

    v   = ~(g ^ (lsb * h))
    clr = v & 0x7f...7f
    msk = v & 0x80...80
    res = (clr + lsb) & msk

This formulation operates on the complemented value ~(g ^ h), which is
equivalent for match detection. By separating the low 7 bits and the high
bit of each byte, the addition avoids inter-byte carries. As a result,
matches are computed without false positives.

On ARM64, under a favorable instruction selection and scheduling, both
formulations compile to a similar number of instructions with comparable
latency, so the theoretical computational cost is unchanged.

Benchmark results (gomapbench) show small performance improvements in
most cases, with minor regressions in a few cases, resulting in an
overall net positive effect. Full benchmark data is available at:
https://gist.github.com/spy20051623/3227bea1520ea1871254eb0c219a0abb

No functional change; reduces unnecessary work in hashmap lookups under
certain patterns.

Change-Id: Ia85d0115a2431861d54aeeb4d2e6c9b3a69e72e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/759480
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This commit is contained in:
Piaoyang Shu 2026-03-24 18:05:47 +08:00 committed by Gopher Robot
parent 816c1a79fb
commit f133609b75

View file

@ -24,6 +24,7 @@ const (
bitsetLSB = 0x0101010101010101
bitsetMSB = 0x8080808080808080
bitsetL7B = 0x7f7f7f7f7f7f7f7f
bitsetEmpty = bitsetLSB * uint64(ctrlEmpty)
)
@ -158,6 +159,11 @@ func (g ctrlGroup) matchH2(h uintptr) bitset {
// Note: On AMD64, this is an intrinsic implemented with SIMD instructions. See
// note on bitset about the packed intrinsified return value.
func ctrlGroupMatchH2(g ctrlGroup, h uintptr) bitset {
v := uint64(g) ^ (bitsetLSB * uint64(h))
if goarch.IsArm64 == 1 {
v = ^v
return bitset((v&bitsetL7B + bitsetLSB) & (v & bitsetMSB))
}
// NB: This generic matching routine produces false positive matches when
// h is 2^N and the control bytes have a seq of 2^N followed by 2^N+1. For
// example: if ctrls==0x0302 and h=02, we'll compute v as 0x0100. When we
@ -166,7 +172,6 @@ func ctrlGroupMatchH2(g ctrlGroup, h uintptr) bitset {
// just a rare inefficiency. Note that they only occur if there is a real
// match and never occur on ctrlEmpty, or ctrlDeleted. The subsequent key
// comparisons ensure that there is no correctness issue.
v := uint64(g) ^ (bitsetLSB * uint64(h))
return bitset(((v - bitsetLSB) &^ v) & bitsetMSB)
}