[dev.simd] cmd/compile, simd: change DotProductQuadruple and add peepholes

This CL addressed some API change decisions in the API audit.
Instead of exposing the Intel format, we hide the add part of the
instructions under the peephole, and rename the API as
DotProdQuadruple

Change-Id: I471c0a755174bc15dd83bdc0f757d6356b92d835
Reviewed-on: https://go-review.googlesource.com/c/go/+/721420
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This commit is contained in:
Junyang Shao 2025-11-17 23:19:56 +00:00
parent be9c50c6a0
commit 896f293a25
14 changed files with 441 additions and 235 deletions

View file

@ -346,40 +346,6 @@ func (x Uint64x4) Add(y Uint64x4) Uint64x4
// Asm: VPADDQ, CPU Feature: AVX512
func (x Uint64x8) Add(y Uint64x8) Uint64x8
/* AddDotProductQuadruple */
// AddDotProductQuadruple performs dot products on groups of 4 elements of x and y and then adds z.
//
// Asm: VPDPBUSD, CPU Feature: AVXVNNI
func (x Int8x16) AddDotProductQuadruple(y Uint8x16, z Int32x4) Int32x4
// AddDotProductQuadruple performs dot products on groups of 4 elements of x and y and then adds z.
//
// Asm: VPDPBUSD, CPU Feature: AVXVNNI
func (x Int8x32) AddDotProductQuadruple(y Uint8x32, z Int32x8) Int32x8
// AddDotProductQuadruple performs dot products on groups of 4 elements of x and y and then adds z.
//
// Asm: VPDPBUSD, CPU Feature: AVX512VNNI
func (x Int8x64) AddDotProductQuadruple(y Uint8x64, z Int32x16) Int32x16
/* AddDotProductQuadrupleSaturated */
// AddDotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y and then adds z.
//
// Asm: VPDPBUSDS, CPU Feature: AVXVNNI
func (x Int8x16) AddDotProductQuadrupleSaturated(y Uint8x16, z Int32x4) Int32x4
// AddDotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y and then adds z.
//
// Asm: VPDPBUSDS, CPU Feature: AVXVNNI
func (x Int8x32) AddDotProductQuadrupleSaturated(y Uint8x32, z Int32x8) Int32x8
// AddDotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y and then adds z.
//
// Asm: VPDPBUSDS, CPU Feature: AVX512VNNI
func (x Int8x64) AddDotProductQuadrupleSaturated(y Uint8x64, z Int32x16) Int32x16
/* AddPairs */
// AddPairs horizontally adds adjacent pairs of elements.
@ -2228,6 +2194,46 @@ func (x Uint8x32) DotProductPairsSaturated(y Int8x32) Int16x16
// Asm: VPMADDUBSW, CPU Feature: AVX512
func (x Uint8x64) DotProductPairsSaturated(y Int8x64) Int16x32
/* DotProductQuadruple */
// DotProductQuadruple performs dot products on groups of 4 elements of x and y.
// DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.
//
// Asm: VPDPBUSD, CPU Feature: AVXVNNI
func (x Int8x16) DotProductQuadruple(y Uint8x16) Int32x4
// DotProductQuadruple performs dot products on groups of 4 elements of x and y.
// DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.
//
// Asm: VPDPBUSD, CPU Feature: AVXVNNI
func (x Int8x32) DotProductQuadruple(y Uint8x32) Int32x8
// DotProductQuadruple performs dot products on groups of 4 elements of x and y.
// DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.
//
// Asm: VPDPBUSD, CPU Feature: AVX512VNNI
func (x Int8x64) DotProductQuadruple(y Uint8x64) Int32x16
/* DotProductQuadrupleSaturated */
// DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y.
// DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.
//
// Asm: VPDPBUSDS, CPU Feature: AVXVNNI
func (x Int8x16) DotProductQuadrupleSaturated(y Uint8x16) Int32x4
// DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y.
// DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.
//
// Asm: VPDPBUSDS, CPU Feature: AVXVNNI
func (x Int8x32) DotProductQuadrupleSaturated(y Uint8x32) Int32x8
// DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y.
// DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.
//
// Asm: VPDPBUSDS, CPU Feature: AVX512VNNI
func (x Int8x64) DotProductQuadrupleSaturated(y Uint8x64) Int32x16
/* Equal */
// Equal compares for equality.