mirror of
https://github.com/golang/go.git
synced 2026-06-27 03:11:23 +00:00
runtime: eliminate false positives in ctrlGroupMatchH2 on ARM64
The generic implementation of ctrlGroupMatchH2 uses a well-known
bit-parallel matching trick:
v := g ^ (lsb * h)
((v - lsb) &^ v) & msb
This detects zero bytes in (g ^ h), but can produce false positives in
rare cases due to cross-byte borrow during the subtraction. In
particular, when a byte equals h and the following byte equals h+1
(e.g. 0x02, 0x03), the borrow chain may cause both bytes to be reported
as matches.
These false positives are benign (filtered by subsequent key comparison)
but introduce unnecessary probes.
This change rewrites ctrlGroupMatchH2 on ARM64 using an alternative
bit-parallel formulation that avoids cross-byte borrow propagation:
v = ~(g ^ (lsb * h))
clr = v & 0x7f...7f
msk = v & 0x80...80
res = (clr + lsb) & msk
This formulation operates on the complemented value ~(g ^ h), which is
equivalent for match detection. By separating the low 7 bits and the high
bit of each byte, the addition avoids inter-byte carries. As a result,
matches are computed without false positives.
On ARM64, under a favorable instruction selection and scheduling, both
formulations compile to a similar number of instructions with comparable
latency, so the theoretical computational cost is unchanged.
Benchmark results (gomapbench) show small performance improvements in
most cases, with minor regressions in a few cases, resulting in an
overall net positive effect. Full benchmark data is available at:
https://gist.github.com/spy20051623/3227bea1520ea1871254eb0c219a0abb
No functional change; reduces unnecessary work in hashmap lookups under
certain patterns.
Change-Id: Ia85d0115a2431861d54aeeb4d2e6c9b3a69e72e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/759480
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This commit is contained in:
parent
816c1a79fb
commit
f133609b75
1 changed files with 6 additions and 1 deletions
|
|
@ -24,6 +24,7 @@ const (
|
|||
|
||||
bitsetLSB = 0x0101010101010101
|
||||
bitsetMSB = 0x8080808080808080
|
||||
bitsetL7B = 0x7f7f7f7f7f7f7f7f
|
||||
bitsetEmpty = bitsetLSB * uint64(ctrlEmpty)
|
||||
)
|
||||
|
||||
|
|
@ -158,6 +159,11 @@ func (g ctrlGroup) matchH2(h uintptr) bitset {
|
|||
// Note: On AMD64, this is an intrinsic implemented with SIMD instructions. See
|
||||
// note on bitset about the packed intrinsified return value.
|
||||
func ctrlGroupMatchH2(g ctrlGroup, h uintptr) bitset {
|
||||
v := uint64(g) ^ (bitsetLSB * uint64(h))
|
||||
if goarch.IsArm64 == 1 {
|
||||
v = ^v
|
||||
return bitset((v&bitsetL7B + bitsetLSB) & (v & bitsetMSB))
|
||||
}
|
||||
// NB: This generic matching routine produces false positive matches when
|
||||
// h is 2^N and the control bytes have a seq of 2^N followed by 2^N+1. For
|
||||
// example: if ctrls==0x0302 and h=02, we'll compute v as 0x0100. When we
|
||||
|
|
@ -166,7 +172,6 @@ func ctrlGroupMatchH2(g ctrlGroup, h uintptr) bitset {
|
|||
// just a rare inefficiency. Note that they only occur if there is a real
|
||||
// match and never occur on ctrlEmpty, or ctrlDeleted. The subsequent key
|
||||
// comparisons ensure that there is no correctness issue.
|
||||
v := uint64(g) ^ (bitsetLSB * uint64(h))
|
||||
return bitset(((v - bitsetLSB) &^ v) & bitsetMSB)
|
||||
}
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue