go/src/crypto/internal/bigmod/nat_wasm.go

62 lines
1.4 KiB
Go
Raw Normal View History

crypto/internal/bigmod: optimize addMulVVW on Wasm The current implementation of addMulVVW makes heavy use of 64x64->128 bit multiplications and 64-bit add-with-carry, which are compiler intrinsics and are very efficient on many architectures. However, those are not supported on Wasm. Here we implement it with 32x32->64 bit operations, which is more efficient on Wasm. crypto/rsa benchmarks with Node: │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 7.726m ± 1% 4.895m ± 2% -36.65% (p=0.000 n=35) DecryptPKCS1v15/3072 23.52m ± 1% 15.33m ± 1% -34.83% (p=0.000 n=35) DecryptPKCS1v15/4096 52.64m ± 2% 35.40m ± 1% -32.75% (p=0.000 n=35) EncryptPKCS1v15/2048 264.2µ ± 1% 176.9µ ± 1% -33.02% (p=0.000 n=35) DecryptOAEP/2048 7.608m ± 1% 4.911m ± 1% -35.45% (p=0.000 n=35) EncryptOAEP/2048 266.2µ ± 0% 183.3µ ± 2% -31.15% (p=0.000 n=35) SignPKCS1v15/2048 7.836m ± 1% 5.009m ± 2% -36.08% (p=0.000 n=35) VerifyPKCS1v15/2048 262.9µ ± 1% 176.3µ ± 1% -32.94% (p=0.000 n=35) SignPSS/2048 7.814m ± 0% 5.020m ± 1% -35.76% (p=0.000 n=35) VerifyPSS/2048 267.0µ ± 1% 183.8µ ± 1% -31.17% (p=0.000 n=35) geomean 2.718m 1.794m -34.01% With wazero: │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 13.445m ± 0% 6.528m ± 0% -51.45% (p=0.000 n=25) DecryptPKCS1v15/3072 41.07m ± 0% 18.85m ± 0% -54.10% (p=0.000 n=25) DecryptPKCS1v15/4096 91.84m ± 1% 39.66m ± 0% -56.81% (p=0.000 n=25) EncryptPKCS1v15/2048 461.3µ ± 0% 197.2µ ± 0% -57.25% (p=0.000 n=25) DecryptOAEP/2048 13.438m ± 0% 6.577m ± 0% -51.06% (p=0.000 n=25) EncryptOAEP/2048 471.5µ ± 0% 207.7µ ± 0% -55.95% (p=0.000 n=25) SignPKCS1v15/2048 13.739m ± 0% 6.687m ± 0% -51.33% (p=0.000 n=25) VerifyPKCS1v15/2048 461.3µ ± 1% 196.8µ ± 0% -57.35% (p=0.000 n=25) SignPSS/2048 13.765m ± 0% 6.686m ± 0% -51.43% (p=0.000 n=25) VerifyPSS/2048 470.8µ ± 0% 208.9µ ± 1% -55.64% (p=0.000 n=25) geomean 4.769m 2.179m -54.31% Change-Id: I97f37d8cf1e3e9756a4e03ab4e681bf04152925f Reviewed-on: https://go-review.googlesource.com/c/go/+/626957 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-11 10:04:17 -05:00
// Copyright 2024 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
crypto/internal/bigmod: apply wasm-specific implementation for only sized addMulVVW Restore generic addMulVVW for wasm (and therefore for all architectures). Apply wasm-specific implementation for only the explicitly sized functions (addMulVVW1024 etc.). Also, for the sized functions, use unsafe pointer calculations directly, without converting them back to slices. (This is what the assembly code does on other architectures.) This results in a bit more speedup for crypto/rsa benchmarks on Wasm: pkg: crypto/rsa │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 4.906m ± 0% 4.221m ± 1% -13.96% (p=0.000 n=25) DecryptPKCS1v15/3072 15.18m ± 0% 13.57m ± 0% -10.64% (p=0.000 n=25) DecryptPKCS1v15/4096 35.49m ± 0% 32.64m ± 1% -8.04% (p=0.000 n=25) EncryptPKCS1v15/2048 177.1µ ± 0% 162.3µ ± 0% -8.35% (p=0.000 n=25) DecryptOAEP/2048 4.900m ± 1% 4.233m ± 0% -13.61% (p=0.000 n=25) EncryptOAEP/2048 181.8µ ± 0% 166.8µ ± 0% -8.24% (p=0.000 n=25) SignPKCS1v15/2048 5.026m ± 1% 4.341m ± 0% -13.63% (p=0.000 n=25) VerifyPKCS1v15/2048 177.2µ ± 0% 161.3µ ± 1% -8.97% (p=0.000 n=25) SignPSS/2048 5.020m ± 0% 4.344m ± 1% -13.47% (p=0.000 n=25) VerifyPSS/2048 182.2µ ± 1% 166.6µ ± 0% -8.52% (p=0.000 n=25) geomean 1.791m 1.598m -10.78% Change-Id: I89775c46a0bbe29380889047ba393c6cfc093ff1 Reviewed-on: https://go-review.googlesource.com/c/go/+/628255 Reviewed-by: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2024-11-14 18:36:20 -05:00
//go:build !purego
crypto/internal/bigmod: optimize addMulVVW on Wasm The current implementation of addMulVVW makes heavy use of 64x64->128 bit multiplications and 64-bit add-with-carry, which are compiler intrinsics and are very efficient on many architectures. However, those are not supported on Wasm. Here we implement it with 32x32->64 bit operations, which is more efficient on Wasm. crypto/rsa benchmarks with Node: │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 7.726m ± 1% 4.895m ± 2% -36.65% (p=0.000 n=35) DecryptPKCS1v15/3072 23.52m ± 1% 15.33m ± 1% -34.83% (p=0.000 n=35) DecryptPKCS1v15/4096 52.64m ± 2% 35.40m ± 1% -32.75% (p=0.000 n=35) EncryptPKCS1v15/2048 264.2µ ± 1% 176.9µ ± 1% -33.02% (p=0.000 n=35) DecryptOAEP/2048 7.608m ± 1% 4.911m ± 1% -35.45% (p=0.000 n=35) EncryptOAEP/2048 266.2µ ± 0% 183.3µ ± 2% -31.15% (p=0.000 n=35) SignPKCS1v15/2048 7.836m ± 1% 5.009m ± 2% -36.08% (p=0.000 n=35) VerifyPKCS1v15/2048 262.9µ ± 1% 176.3µ ± 1% -32.94% (p=0.000 n=35) SignPSS/2048 7.814m ± 0% 5.020m ± 1% -35.76% (p=0.000 n=35) VerifyPSS/2048 267.0µ ± 1% 183.8µ ± 1% -31.17% (p=0.000 n=35) geomean 2.718m 1.794m -34.01% With wazero: │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 13.445m ± 0% 6.528m ± 0% -51.45% (p=0.000 n=25) DecryptPKCS1v15/3072 41.07m ± 0% 18.85m ± 0% -54.10% (p=0.000 n=25) DecryptPKCS1v15/4096 91.84m ± 1% 39.66m ± 0% -56.81% (p=0.000 n=25) EncryptPKCS1v15/2048 461.3µ ± 0% 197.2µ ± 0% -57.25% (p=0.000 n=25) DecryptOAEP/2048 13.438m ± 0% 6.577m ± 0% -51.06% (p=0.000 n=25) EncryptOAEP/2048 471.5µ ± 0% 207.7µ ± 0% -55.95% (p=0.000 n=25) SignPKCS1v15/2048 13.739m ± 0% 6.687m ± 0% -51.33% (p=0.000 n=25) VerifyPKCS1v15/2048 461.3µ ± 1% 196.8µ ± 0% -57.35% (p=0.000 n=25) SignPSS/2048 13.765m ± 0% 6.686m ± 0% -51.43% (p=0.000 n=25) VerifyPSS/2048 470.8µ ± 0% 208.9µ ± 1% -55.64% (p=0.000 n=25) geomean 4.769m 2.179m -54.31% Change-Id: I97f37d8cf1e3e9756a4e03ab4e681bf04152925f Reviewed-on: https://go-review.googlesource.com/c/go/+/626957 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-11 10:04:17 -05:00
package bigmod
crypto/internal/bigmod: apply wasm-specific implementation for only sized addMulVVW Restore generic addMulVVW for wasm (and therefore for all architectures). Apply wasm-specific implementation for only the explicitly sized functions (addMulVVW1024 etc.). Also, for the sized functions, use unsafe pointer calculations directly, without converting them back to slices. (This is what the assembly code does on other architectures.) This results in a bit more speedup for crypto/rsa benchmarks on Wasm: pkg: crypto/rsa │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 4.906m ± 0% 4.221m ± 1% -13.96% (p=0.000 n=25) DecryptPKCS1v15/3072 15.18m ± 0% 13.57m ± 0% -10.64% (p=0.000 n=25) DecryptPKCS1v15/4096 35.49m ± 0% 32.64m ± 1% -8.04% (p=0.000 n=25) EncryptPKCS1v15/2048 177.1µ ± 0% 162.3µ ± 0% -8.35% (p=0.000 n=25) DecryptOAEP/2048 4.900m ± 1% 4.233m ± 0% -13.61% (p=0.000 n=25) EncryptOAEP/2048 181.8µ ± 0% 166.8µ ± 0% -8.24% (p=0.000 n=25) SignPKCS1v15/2048 5.026m ± 1% 4.341m ± 0% -13.63% (p=0.000 n=25) VerifyPKCS1v15/2048 177.2µ ± 0% 161.3µ ± 1% -8.97% (p=0.000 n=25) SignPSS/2048 5.020m ± 0% 4.344m ± 1% -13.47% (p=0.000 n=25) VerifyPSS/2048 182.2µ ± 1% 166.6µ ± 0% -8.52% (p=0.000 n=25) geomean 1.791m 1.598m -10.78% Change-Id: I89775c46a0bbe29380889047ba393c6cfc093ff1 Reviewed-on: https://go-review.googlesource.com/c/go/+/628255 Reviewed-by: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2024-11-14 18:36:20 -05:00
import "unsafe"
crypto/internal/bigmod: optimize addMulVVW on Wasm The current implementation of addMulVVW makes heavy use of 64x64->128 bit multiplications and 64-bit add-with-carry, which are compiler intrinsics and are very efficient on many architectures. However, those are not supported on Wasm. Here we implement it with 32x32->64 bit operations, which is more efficient on Wasm. crypto/rsa benchmarks with Node: │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 7.726m ± 1% 4.895m ± 2% -36.65% (p=0.000 n=35) DecryptPKCS1v15/3072 23.52m ± 1% 15.33m ± 1% -34.83% (p=0.000 n=35) DecryptPKCS1v15/4096 52.64m ± 2% 35.40m ± 1% -32.75% (p=0.000 n=35) EncryptPKCS1v15/2048 264.2µ ± 1% 176.9µ ± 1% -33.02% (p=0.000 n=35) DecryptOAEP/2048 7.608m ± 1% 4.911m ± 1% -35.45% (p=0.000 n=35) EncryptOAEP/2048 266.2µ ± 0% 183.3µ ± 2% -31.15% (p=0.000 n=35) SignPKCS1v15/2048 7.836m ± 1% 5.009m ± 2% -36.08% (p=0.000 n=35) VerifyPKCS1v15/2048 262.9µ ± 1% 176.3µ ± 1% -32.94% (p=0.000 n=35) SignPSS/2048 7.814m ± 0% 5.020m ± 1% -35.76% (p=0.000 n=35) VerifyPSS/2048 267.0µ ± 1% 183.8µ ± 1% -31.17% (p=0.000 n=35) geomean 2.718m 1.794m -34.01% With wazero: │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 13.445m ± 0% 6.528m ± 0% -51.45% (p=0.000 n=25) DecryptPKCS1v15/3072 41.07m ± 0% 18.85m ± 0% -54.10% (p=0.000 n=25) DecryptPKCS1v15/4096 91.84m ± 1% 39.66m ± 0% -56.81% (p=0.000 n=25) EncryptPKCS1v15/2048 461.3µ ± 0% 197.2µ ± 0% -57.25% (p=0.000 n=25) DecryptOAEP/2048 13.438m ± 0% 6.577m ± 0% -51.06% (p=0.000 n=25) EncryptOAEP/2048 471.5µ ± 0% 207.7µ ± 0% -55.95% (p=0.000 n=25) SignPKCS1v15/2048 13.739m ± 0% 6.687m ± 0% -51.33% (p=0.000 n=25) VerifyPKCS1v15/2048 461.3µ ± 1% 196.8µ ± 0% -57.35% (p=0.000 n=25) SignPSS/2048 13.765m ± 0% 6.686m ± 0% -51.43% (p=0.000 n=25) VerifyPSS/2048 470.8µ ± 0% 208.9µ ± 1% -55.64% (p=0.000 n=25) geomean 4.769m 2.179m -54.31% Change-Id: I97f37d8cf1e3e9756a4e03ab4e681bf04152925f Reviewed-on: https://go-review.googlesource.com/c/go/+/626957 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-11 10:04:17 -05:00
// The generic implementation relies on 64x64->128 bit multiplication and
// 64-bit add-with-carry, which are compiler intrinsics on many architectures.
// Wasm doesn't support those. Here we implement it with 32x32->64 bit
// operations, which is more efficient on Wasm.
crypto/internal/bigmod: apply wasm-specific implementation for only sized addMulVVW Restore generic addMulVVW for wasm (and therefore for all architectures). Apply wasm-specific implementation for only the explicitly sized functions (addMulVVW1024 etc.). Also, for the sized functions, use unsafe pointer calculations directly, without converting them back to slices. (This is what the assembly code does on other architectures.) This results in a bit more speedup for crypto/rsa benchmarks on Wasm: pkg: crypto/rsa │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 4.906m ± 0% 4.221m ± 1% -13.96% (p=0.000 n=25) DecryptPKCS1v15/3072 15.18m ± 0% 13.57m ± 0% -10.64% (p=0.000 n=25) DecryptPKCS1v15/4096 35.49m ± 0% 32.64m ± 1% -8.04% (p=0.000 n=25) EncryptPKCS1v15/2048 177.1µ ± 0% 162.3µ ± 0% -8.35% (p=0.000 n=25) DecryptOAEP/2048 4.900m ± 1% 4.233m ± 0% -13.61% (p=0.000 n=25) EncryptOAEP/2048 181.8µ ± 0% 166.8µ ± 0% -8.24% (p=0.000 n=25) SignPKCS1v15/2048 5.026m ± 1% 4.341m ± 0% -13.63% (p=0.000 n=25) VerifyPKCS1v15/2048 177.2µ ± 0% 161.3µ ± 1% -8.97% (p=0.000 n=25) SignPSS/2048 5.020m ± 0% 4.344m ± 1% -13.47% (p=0.000 n=25) VerifyPSS/2048 182.2µ ± 1% 166.6µ ± 0% -8.52% (p=0.000 n=25) geomean 1.791m 1.598m -10.78% Change-Id: I89775c46a0bbe29380889047ba393c6cfc093ff1 Reviewed-on: https://go-review.googlesource.com/c/go/+/628255 Reviewed-by: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2024-11-14 18:36:20 -05:00
func idx(x *uint, i uintptr) *uint {
return (*uint)(unsafe.Pointer(uintptr(unsafe.Pointer(x)) + i*8))
}
func addMulVVWWasm(z, x *uint, y uint, n uintptr) (carry uint) {
crypto/internal/bigmod: optimize addMulVVW on Wasm The current implementation of addMulVVW makes heavy use of 64x64->128 bit multiplications and 64-bit add-with-carry, which are compiler intrinsics and are very efficient on many architectures. However, those are not supported on Wasm. Here we implement it with 32x32->64 bit operations, which is more efficient on Wasm. crypto/rsa benchmarks with Node: │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 7.726m ± 1% 4.895m ± 2% -36.65% (p=0.000 n=35) DecryptPKCS1v15/3072 23.52m ± 1% 15.33m ± 1% -34.83% (p=0.000 n=35) DecryptPKCS1v15/4096 52.64m ± 2% 35.40m ± 1% -32.75% (p=0.000 n=35) EncryptPKCS1v15/2048 264.2µ ± 1% 176.9µ ± 1% -33.02% (p=0.000 n=35) DecryptOAEP/2048 7.608m ± 1% 4.911m ± 1% -35.45% (p=0.000 n=35) EncryptOAEP/2048 266.2µ ± 0% 183.3µ ± 2% -31.15% (p=0.000 n=35) SignPKCS1v15/2048 7.836m ± 1% 5.009m ± 2% -36.08% (p=0.000 n=35) VerifyPKCS1v15/2048 262.9µ ± 1% 176.3µ ± 1% -32.94% (p=0.000 n=35) SignPSS/2048 7.814m ± 0% 5.020m ± 1% -35.76% (p=0.000 n=35) VerifyPSS/2048 267.0µ ± 1% 183.8µ ± 1% -31.17% (p=0.000 n=35) geomean 2.718m 1.794m -34.01% With wazero: │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 13.445m ± 0% 6.528m ± 0% -51.45% (p=0.000 n=25) DecryptPKCS1v15/3072 41.07m ± 0% 18.85m ± 0% -54.10% (p=0.000 n=25) DecryptPKCS1v15/4096 91.84m ± 1% 39.66m ± 0% -56.81% (p=0.000 n=25) EncryptPKCS1v15/2048 461.3µ ± 0% 197.2µ ± 0% -57.25% (p=0.000 n=25) DecryptOAEP/2048 13.438m ± 0% 6.577m ± 0% -51.06% (p=0.000 n=25) EncryptOAEP/2048 471.5µ ± 0% 207.7µ ± 0% -55.95% (p=0.000 n=25) SignPKCS1v15/2048 13.739m ± 0% 6.687m ± 0% -51.33% (p=0.000 n=25) VerifyPKCS1v15/2048 461.3µ ± 1% 196.8µ ± 0% -57.35% (p=0.000 n=25) SignPSS/2048 13.765m ± 0% 6.686m ± 0% -51.43% (p=0.000 n=25) VerifyPSS/2048 470.8µ ± 0% 208.9µ ± 1% -55.64% (p=0.000 n=25) geomean 4.769m 2.179m -54.31% Change-Id: I97f37d8cf1e3e9756a4e03ab4e681bf04152925f Reviewed-on: https://go-review.googlesource.com/c/go/+/626957 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-11 10:04:17 -05:00
const mask32 = 1<<32 - 1
y0 := y & mask32
y1 := y >> 32
crypto/internal/bigmod: apply wasm-specific implementation for only sized addMulVVW Restore generic addMulVVW for wasm (and therefore for all architectures). Apply wasm-specific implementation for only the explicitly sized functions (addMulVVW1024 etc.). Also, for the sized functions, use unsafe pointer calculations directly, without converting them back to slices. (This is what the assembly code does on other architectures.) This results in a bit more speedup for crypto/rsa benchmarks on Wasm: pkg: crypto/rsa │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 4.906m ± 0% 4.221m ± 1% -13.96% (p=0.000 n=25) DecryptPKCS1v15/3072 15.18m ± 0% 13.57m ± 0% -10.64% (p=0.000 n=25) DecryptPKCS1v15/4096 35.49m ± 0% 32.64m ± 1% -8.04% (p=0.000 n=25) EncryptPKCS1v15/2048 177.1µ ± 0% 162.3µ ± 0% -8.35% (p=0.000 n=25) DecryptOAEP/2048 4.900m ± 1% 4.233m ± 0% -13.61% (p=0.000 n=25) EncryptOAEP/2048 181.8µ ± 0% 166.8µ ± 0% -8.24% (p=0.000 n=25) SignPKCS1v15/2048 5.026m ± 1% 4.341m ± 0% -13.63% (p=0.000 n=25) VerifyPKCS1v15/2048 177.2µ ± 0% 161.3µ ± 1% -8.97% (p=0.000 n=25) SignPSS/2048 5.020m ± 0% 4.344m ± 1% -13.47% (p=0.000 n=25) VerifyPSS/2048 182.2µ ± 1% 166.6µ ± 0% -8.52% (p=0.000 n=25) geomean 1.791m 1.598m -10.78% Change-Id: I89775c46a0bbe29380889047ba393c6cfc093ff1 Reviewed-on: https://go-review.googlesource.com/c/go/+/628255 Reviewed-by: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2024-11-14 18:36:20 -05:00
for i := range n {
xi := *idx(x, i)
crypto/internal/bigmod: optimize addMulVVW on Wasm The current implementation of addMulVVW makes heavy use of 64x64->128 bit multiplications and 64-bit add-with-carry, which are compiler intrinsics and are very efficient on many architectures. However, those are not supported on Wasm. Here we implement it with 32x32->64 bit operations, which is more efficient on Wasm. crypto/rsa benchmarks with Node: │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 7.726m ± 1% 4.895m ± 2% -36.65% (p=0.000 n=35) DecryptPKCS1v15/3072 23.52m ± 1% 15.33m ± 1% -34.83% (p=0.000 n=35) DecryptPKCS1v15/4096 52.64m ± 2% 35.40m ± 1% -32.75% (p=0.000 n=35) EncryptPKCS1v15/2048 264.2µ ± 1% 176.9µ ± 1% -33.02% (p=0.000 n=35) DecryptOAEP/2048 7.608m ± 1% 4.911m ± 1% -35.45% (p=0.000 n=35) EncryptOAEP/2048 266.2µ ± 0% 183.3µ ± 2% -31.15% (p=0.000 n=35) SignPKCS1v15/2048 7.836m ± 1% 5.009m ± 2% -36.08% (p=0.000 n=35) VerifyPKCS1v15/2048 262.9µ ± 1% 176.3µ ± 1% -32.94% (p=0.000 n=35) SignPSS/2048 7.814m ± 0% 5.020m ± 1% -35.76% (p=0.000 n=35) VerifyPSS/2048 267.0µ ± 1% 183.8µ ± 1% -31.17% (p=0.000 n=35) geomean 2.718m 1.794m -34.01% With wazero: │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 13.445m ± 0% 6.528m ± 0% -51.45% (p=0.000 n=25) DecryptPKCS1v15/3072 41.07m ± 0% 18.85m ± 0% -54.10% (p=0.000 n=25) DecryptPKCS1v15/4096 91.84m ± 1% 39.66m ± 0% -56.81% (p=0.000 n=25) EncryptPKCS1v15/2048 461.3µ ± 0% 197.2µ ± 0% -57.25% (p=0.000 n=25) DecryptOAEP/2048 13.438m ± 0% 6.577m ± 0% -51.06% (p=0.000 n=25) EncryptOAEP/2048 471.5µ ± 0% 207.7µ ± 0% -55.95% (p=0.000 n=25) SignPKCS1v15/2048 13.739m ± 0% 6.687m ± 0% -51.33% (p=0.000 n=25) VerifyPKCS1v15/2048 461.3µ ± 1% 196.8µ ± 0% -57.35% (p=0.000 n=25) SignPSS/2048 13.765m ± 0% 6.686m ± 0% -51.43% (p=0.000 n=25) VerifyPSS/2048 470.8µ ± 0% 208.9µ ± 1% -55.64% (p=0.000 n=25) geomean 4.769m 2.179m -54.31% Change-Id: I97f37d8cf1e3e9756a4e03ab4e681bf04152925f Reviewed-on: https://go-review.googlesource.com/c/go/+/626957 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-11 10:04:17 -05:00
x0 := xi & mask32
x1 := xi >> 32
crypto/internal/bigmod: apply wasm-specific implementation for only sized addMulVVW Restore generic addMulVVW for wasm (and therefore for all architectures). Apply wasm-specific implementation for only the explicitly sized functions (addMulVVW1024 etc.). Also, for the sized functions, use unsafe pointer calculations directly, without converting them back to slices. (This is what the assembly code does on other architectures.) This results in a bit more speedup for crypto/rsa benchmarks on Wasm: pkg: crypto/rsa │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 4.906m ± 0% 4.221m ± 1% -13.96% (p=0.000 n=25) DecryptPKCS1v15/3072 15.18m ± 0% 13.57m ± 0% -10.64% (p=0.000 n=25) DecryptPKCS1v15/4096 35.49m ± 0% 32.64m ± 1% -8.04% (p=0.000 n=25) EncryptPKCS1v15/2048 177.1µ ± 0% 162.3µ ± 0% -8.35% (p=0.000 n=25) DecryptOAEP/2048 4.900m ± 1% 4.233m ± 0% -13.61% (p=0.000 n=25) EncryptOAEP/2048 181.8µ ± 0% 166.8µ ± 0% -8.24% (p=0.000 n=25) SignPKCS1v15/2048 5.026m ± 1% 4.341m ± 0% -13.63% (p=0.000 n=25) VerifyPKCS1v15/2048 177.2µ ± 0% 161.3µ ± 1% -8.97% (p=0.000 n=25) SignPSS/2048 5.020m ± 0% 4.344m ± 1% -13.47% (p=0.000 n=25) VerifyPSS/2048 182.2µ ± 1% 166.6µ ± 0% -8.52% (p=0.000 n=25) geomean 1.791m 1.598m -10.78% Change-Id: I89775c46a0bbe29380889047ba393c6cfc093ff1 Reviewed-on: https://go-review.googlesource.com/c/go/+/628255 Reviewed-by: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2024-11-14 18:36:20 -05:00
zi := *idx(z, i)
crypto/internal/bigmod: optimize addMulVVW on Wasm The current implementation of addMulVVW makes heavy use of 64x64->128 bit multiplications and 64-bit add-with-carry, which are compiler intrinsics and are very efficient on many architectures. However, those are not supported on Wasm. Here we implement it with 32x32->64 bit operations, which is more efficient on Wasm. crypto/rsa benchmarks with Node: │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 7.726m ± 1% 4.895m ± 2% -36.65% (p=0.000 n=35) DecryptPKCS1v15/3072 23.52m ± 1% 15.33m ± 1% -34.83% (p=0.000 n=35) DecryptPKCS1v15/4096 52.64m ± 2% 35.40m ± 1% -32.75% (p=0.000 n=35) EncryptPKCS1v15/2048 264.2µ ± 1% 176.9µ ± 1% -33.02% (p=0.000 n=35) DecryptOAEP/2048 7.608m ± 1% 4.911m ± 1% -35.45% (p=0.000 n=35) EncryptOAEP/2048 266.2µ ± 0% 183.3µ ± 2% -31.15% (p=0.000 n=35) SignPKCS1v15/2048 7.836m ± 1% 5.009m ± 2% -36.08% (p=0.000 n=35) VerifyPKCS1v15/2048 262.9µ ± 1% 176.3µ ± 1% -32.94% (p=0.000 n=35) SignPSS/2048 7.814m ± 0% 5.020m ± 1% -35.76% (p=0.000 n=35) VerifyPSS/2048 267.0µ ± 1% 183.8µ ± 1% -31.17% (p=0.000 n=35) geomean 2.718m 1.794m -34.01% With wazero: │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 13.445m ± 0% 6.528m ± 0% -51.45% (p=0.000 n=25) DecryptPKCS1v15/3072 41.07m ± 0% 18.85m ± 0% -54.10% (p=0.000 n=25) DecryptPKCS1v15/4096 91.84m ± 1% 39.66m ± 0% -56.81% (p=0.000 n=25) EncryptPKCS1v15/2048 461.3µ ± 0% 197.2µ ± 0% -57.25% (p=0.000 n=25) DecryptOAEP/2048 13.438m ± 0% 6.577m ± 0% -51.06% (p=0.000 n=25) EncryptOAEP/2048 471.5µ ± 0% 207.7µ ± 0% -55.95% (p=0.000 n=25) SignPKCS1v15/2048 13.739m ± 0% 6.687m ± 0% -51.33% (p=0.000 n=25) VerifyPKCS1v15/2048 461.3µ ± 1% 196.8µ ± 0% -57.35% (p=0.000 n=25) SignPSS/2048 13.765m ± 0% 6.686m ± 0% -51.43% (p=0.000 n=25) VerifyPSS/2048 470.8µ ± 0% 208.9µ ± 1% -55.64% (p=0.000 n=25) geomean 4.769m 2.179m -54.31% Change-Id: I97f37d8cf1e3e9756a4e03ab4e681bf04152925f Reviewed-on: https://go-review.googlesource.com/c/go/+/626957 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-11 10:04:17 -05:00
z0 := zi & mask32
z1 := zi >> 32
c0 := carry & mask32
c1 := carry >> 32
w00 := x0*y0 + z0 + c0
l00 := w00 & mask32
h00 := w00 >> 32
w01 := x0*y1 + z1 + h00
l01 := w01 & mask32
h01 := w01 >> 32
w10 := x1*y0 + c1 + l01
h10 := w10 >> 32
carry = x1*y1 + h10 + h01
crypto/internal/bigmod: apply wasm-specific implementation for only sized addMulVVW Restore generic addMulVVW for wasm (and therefore for all architectures). Apply wasm-specific implementation for only the explicitly sized functions (addMulVVW1024 etc.). Also, for the sized functions, use unsafe pointer calculations directly, without converting them back to slices. (This is what the assembly code does on other architectures.) This results in a bit more speedup for crypto/rsa benchmarks on Wasm: pkg: crypto/rsa │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 4.906m ± 0% 4.221m ± 1% -13.96% (p=0.000 n=25) DecryptPKCS1v15/3072 15.18m ± 0% 13.57m ± 0% -10.64% (p=0.000 n=25) DecryptPKCS1v15/4096 35.49m ± 0% 32.64m ± 1% -8.04% (p=0.000 n=25) EncryptPKCS1v15/2048 177.1µ ± 0% 162.3µ ± 0% -8.35% (p=0.000 n=25) DecryptOAEP/2048 4.900m ± 1% 4.233m ± 0% -13.61% (p=0.000 n=25) EncryptOAEP/2048 181.8µ ± 0% 166.8µ ± 0% -8.24% (p=0.000 n=25) SignPKCS1v15/2048 5.026m ± 1% 4.341m ± 0% -13.63% (p=0.000 n=25) VerifyPKCS1v15/2048 177.2µ ± 0% 161.3µ ± 1% -8.97% (p=0.000 n=25) SignPSS/2048 5.020m ± 0% 4.344m ± 1% -13.47% (p=0.000 n=25) VerifyPSS/2048 182.2µ ± 1% 166.6µ ± 0% -8.52% (p=0.000 n=25) geomean 1.791m 1.598m -10.78% Change-Id: I89775c46a0bbe29380889047ba393c6cfc093ff1 Reviewed-on: https://go-review.googlesource.com/c/go/+/628255 Reviewed-by: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2024-11-14 18:36:20 -05:00
*idx(z, i) = w10<<32 + l00
crypto/internal/bigmod: optimize addMulVVW on Wasm The current implementation of addMulVVW makes heavy use of 64x64->128 bit multiplications and 64-bit add-with-carry, which are compiler intrinsics and are very efficient on many architectures. However, those are not supported on Wasm. Here we implement it with 32x32->64 bit operations, which is more efficient on Wasm. crypto/rsa benchmarks with Node: │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 7.726m ± 1% 4.895m ± 2% -36.65% (p=0.000 n=35) DecryptPKCS1v15/3072 23.52m ± 1% 15.33m ± 1% -34.83% (p=0.000 n=35) DecryptPKCS1v15/4096 52.64m ± 2% 35.40m ± 1% -32.75% (p=0.000 n=35) EncryptPKCS1v15/2048 264.2µ ± 1% 176.9µ ± 1% -33.02% (p=0.000 n=35) DecryptOAEP/2048 7.608m ± 1% 4.911m ± 1% -35.45% (p=0.000 n=35) EncryptOAEP/2048 266.2µ ± 0% 183.3µ ± 2% -31.15% (p=0.000 n=35) SignPKCS1v15/2048 7.836m ± 1% 5.009m ± 2% -36.08% (p=0.000 n=35) VerifyPKCS1v15/2048 262.9µ ± 1% 176.3µ ± 1% -32.94% (p=0.000 n=35) SignPSS/2048 7.814m ± 0% 5.020m ± 1% -35.76% (p=0.000 n=35) VerifyPSS/2048 267.0µ ± 1% 183.8µ ± 1% -31.17% (p=0.000 n=35) geomean 2.718m 1.794m -34.01% With wazero: │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 13.445m ± 0% 6.528m ± 0% -51.45% (p=0.000 n=25) DecryptPKCS1v15/3072 41.07m ± 0% 18.85m ± 0% -54.10% (p=0.000 n=25) DecryptPKCS1v15/4096 91.84m ± 1% 39.66m ± 0% -56.81% (p=0.000 n=25) EncryptPKCS1v15/2048 461.3µ ± 0% 197.2µ ± 0% -57.25% (p=0.000 n=25) DecryptOAEP/2048 13.438m ± 0% 6.577m ± 0% -51.06% (p=0.000 n=25) EncryptOAEP/2048 471.5µ ± 0% 207.7µ ± 0% -55.95% (p=0.000 n=25) SignPKCS1v15/2048 13.739m ± 0% 6.687m ± 0% -51.33% (p=0.000 n=25) VerifyPKCS1v15/2048 461.3µ ± 1% 196.8µ ± 0% -57.35% (p=0.000 n=25) SignPSS/2048 13.765m ± 0% 6.686m ± 0% -51.43% (p=0.000 n=25) VerifyPSS/2048 470.8µ ± 0% 208.9µ ± 1% -55.64% (p=0.000 n=25) geomean 4.769m 2.179m -54.31% Change-Id: I97f37d8cf1e3e9756a4e03ab4e681bf04152925f Reviewed-on: https://go-review.googlesource.com/c/go/+/626957 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-11 10:04:17 -05:00
}
return carry
}
crypto/internal/bigmod: apply wasm-specific implementation for only sized addMulVVW Restore generic addMulVVW for wasm (and therefore for all architectures). Apply wasm-specific implementation for only the explicitly sized functions (addMulVVW1024 etc.). Also, for the sized functions, use unsafe pointer calculations directly, without converting them back to slices. (This is what the assembly code does on other architectures.) This results in a bit more speedup for crypto/rsa benchmarks on Wasm: pkg: crypto/rsa │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048 4.906m ± 0% 4.221m ± 1% -13.96% (p=0.000 n=25) DecryptPKCS1v15/3072 15.18m ± 0% 13.57m ± 0% -10.64% (p=0.000 n=25) DecryptPKCS1v15/4096 35.49m ± 0% 32.64m ± 1% -8.04% (p=0.000 n=25) EncryptPKCS1v15/2048 177.1µ ± 0% 162.3µ ± 0% -8.35% (p=0.000 n=25) DecryptOAEP/2048 4.900m ± 1% 4.233m ± 0% -13.61% (p=0.000 n=25) EncryptOAEP/2048 181.8µ ± 0% 166.8µ ± 0% -8.24% (p=0.000 n=25) SignPKCS1v15/2048 5.026m ± 1% 4.341m ± 0% -13.63% (p=0.000 n=25) VerifyPKCS1v15/2048 177.2µ ± 0% 161.3µ ± 1% -8.97% (p=0.000 n=25) SignPSS/2048 5.020m ± 0% 4.344m ± 1% -13.47% (p=0.000 n=25) VerifyPSS/2048 182.2µ ± 1% 166.6µ ± 0% -8.52% (p=0.000 n=25) geomean 1.791m 1.598m -10.78% Change-Id: I89775c46a0bbe29380889047ba393c6cfc093ff1 Reviewed-on: https://go-review.googlesource.com/c/go/+/628255 Reviewed-by: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2024-11-14 18:36:20 -05:00
func addMulVVW1024(z, x *uint, y uint) (c uint) {
return addMulVVWWasm(z, x, y, 1024/_W)
}
func addMulVVW1536(z, x *uint, y uint) (c uint) {
return addMulVVWWasm(z, x, y, 1536/_W)
}
func addMulVVW2048(z, x *uint, y uint) (c uint) {
return addMulVVWWasm(z, x, y, 2048/_W)
}