cmd/compile: optimize Ctz64 on 386

Compared with the version generated by dec64.rules based on Ctz32,
the number of assembly instructions is reduced by half.

SwissMap uses TrailingZeros64 to find the first match in its control
group and may benefit from this CL on 386 architectures.

goos: linux
goarch: 386
cpu: 13th Gen Intel(R) Core(TM) i7-13700H
                   │   old.txt    │               new.txt                │
                   │    sec/op    │    sec/op     vs base                │
TrailingZeros64-20   0.8828n ± 1%   0.6299n ± 1%  -28.65% (p=0.000 n=20)

Change-Id: Iba08a3f4e13efd3349715dfb7fcd5fd470286cd3
Reviewed-on: https://go-review.googlesource.com/c/go/+/624376
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
This commit is contained in:
Youlin Feng 2024-10-22 17:18:11 +08:00 committed by Gopher Robot
parent bea9b91f0f
commit 0140aae6d0
7 changed files with 87 additions and 4 deletions

View file

@ -229,6 +229,7 @@ var genericOps = []opData{
{name: "Ctz16", argLength: 1}, // Count trailing (low order) zeroes (returns 0-16)
{name: "Ctz32", argLength: 1}, // Count trailing (low order) zeroes (returns 0-32)
{name: "Ctz64", argLength: 1}, // Count trailing (low order) zeroes (returns 0-64)
{name: "Ctz64On32", argLength: 2}, // Count trailing (low order) zeroes (returns 0-64) in arg[1]<<32+arg[0]
{name: "Ctz8NonZero", argLength: 1}, // same as above, but arg[0] known to be non-zero, returns 0-7
{name: "Ctz16NonZero", argLength: 1}, // same as above, but arg[0] known to be non-zero, returns 0-15
{name: "Ctz32NonZero", argLength: 1}, // same as above, but arg[0] known to be non-zero, returns 0-31