mirror of
https://git.ffmpeg.org/ffmpeg.git
synced 2026-06-05 23:10:29 +00:00
This adds a NEON-optimized function for computing 32x32 Sum of Absolute Differences (SAD) on AArch64, addressing a gap where x86 had SSE2/AVX2 implementations but AArch64 lacked equivalent coverage. The implementation mirrors the existing sad8 and sad16 NEON functions, employing a 4-row unrolled loop with UABAL and UABAL2 instructions for efficient load-compute interleaving, and four 8x16-bit accumulators to handle the wider 32-byte rows. Benchmarks on AWS Graviton3 (Neoverse V1, c7g.xlarge) using checkasm: sad_32x32_0: C 146.4 cycles -> NEON 98.1 cycles (1.49x speedup) sad_32x32_1: C 141.4 cycles -> NEON 98.9 cycles (1.43x speedup) sad_32x32_2: C 140.7 cycles -> NEON 95.0 cycles (1.48x speedup) Signed-off-by: Jeongkeun Kim <variety0724@gmail.com> |
||
|---|---|---|
| .. | ||
| asm.S | ||
| cpu.c | ||
| cpu.h | ||
| cpu_sme.S | ||
| cpu_sve.S | ||
| crc.h | ||
| crc.S | ||
| float_dsp_init.c | ||
| float_dsp_neon.S | ||
| intreadwrite.h | ||
| Makefile | ||
| neontest.h | ||
| pixelutils.h | ||
| pixelutils_neon.S | ||
| timer.h | ||
| tx_float_init.c | ||
| tx_float_neon.S | ||