ffmpeg/libswscale/aarch64
David Christle 2c7fe8d8ad swscale/aarch64: add NEON rgb32tobgr24 and rgb24tobgr32 conversions
Add NEON alpha drop/insert using ldp+tbl+stp instead of ld4/st3 and
ld3/st4 structure operations. Both use a 2-register sliding-window
tbl with post-indexed addressing. Instruction scheduling targets
narrow in-order cores (A55) while remaining neutral on wide OoO.

Scalar tails use coalesced loads/stores (ldr+strh+lsr+strb for alpha
drop, ldrh+ldrb+orr+str for alpha insert) to reduce per-pixel
instruction count. Independent instructions placed between loads and
dependent operations to fill load-use latency on in-order cores.

checkasm --bench on Apple M3 Max (decicycles, 1920px):
  rgb32tobgr24_c:    114.4 ( 1.00x)
  rgb32tobgr24_neon:  64.3 ( 1.78x)
  rgb24tobgr32_c:    128.9 ( 1.00x)
  rgb24tobgr32_neon:  80.9 ( 1.59x)

C baseline is clang auto-vectorized; speedup is over compiler NEON.

Signed-off-by: David Christle <dev@christle.is>
2026-03-04 10:30:08 +00:00
..
asm-offsets.h swscale: Add AArch64 Neon path for xyz12Torgb48 LE 2025-12-05 10:28:18 +00:00
hscale.S all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
input.S swscale/aarch64: dotprod implementation of rgba32_to_Y 2025-03-04 10:16:44 +02:00
Makefile swscale: Add AArch64 Neon path for xyz12Torgb48 LE 2025-12-05 10:28:18 +00:00
output.S swscale/output: Implement yuv2nv12cx neon assembly 2025-08-12 09:05:00 +00:00
range_convert_neon.S swscale/aarch64: add neon {lum,chr}ConvertRange16 2024-12-05 21:10:29 +01:00
rgb2rgb.c swscale/aarch64: add NEON rgb32tobgr24 and rgb24tobgr32 conversions 2026-03-04 10:30:08 +00:00
rgb2rgb_neon.S swscale/aarch64: add NEON rgb32tobgr24 and rgb24tobgr32 conversions 2026-03-04 10:30:08 +00:00
swscale.c swscale: Add AArch64 Neon path for xyz12Torgb48 LE 2025-12-05 10:28:18 +00:00
swscale_unscaled.c swscale/aarch64: add NEON YUV420P/YUV422P/YUVA420P to RGB conversion 2026-03-02 13:14:07 +00:00
swscale_unscaled_neon.S swscale/aarch64: cosmetics fix (spaces inside curly braces) 2024-08-26 11:07:49 +02:00
xyz2rgb_neon.S swscale: Add AArch64 Neon path for xyz12Torgb48 LE 2025-12-05 10:28:18 +00:00
yuv2rgb_neon.S swscale/aarch64: add NEON YUV420P/YUV422P/YUVA420P to RGB conversion 2026-03-02 13:14:07 +00:00