ffmpeg/libavcodec/aarch64
Jun Zhao cfa3ceac7a lavc/hevc: add aarch64 NEON for angular modes 10 and 26
Add NEON-optimized implementations for HEVC angular intra prediction
modes 10 (pure horizontal) and 26 (pure vertical) at 8-bit depth.

Mode 10 (Horizontal):
- Broadcasts left[y] to fill each row using ld2r/ld4r for efficiency
- Applies edge smoothing for luma blocks smaller than 32x32

Mode 26 (Vertical):
- Copies top reference row to all output rows
- Applies edge smoothing for luma blocks smaller than 32x32

Edge smoothing uses uhsub+usqadd to compute the filtered result
directly in 8-bit, avoiding widening to 16-bit intermediates.

The C pred_angular wrappers are made non-static with ff_ prefix to
allow the NEON dispatch to fall back to C for modes not yet optimized.
This will be reverted once all angular modes are implemented.

Note: since pred_angular[] is a per-size function pointer (not
per-mode), checkasm benchmarks will show '_neon' for all 33 modes
even though only modes 10/26 are truly accelerated; unoptimized
modes show ~1.0x speedup as they pass through the NEON wrapper to
the C fallback with negligible overhead.

Speedup over C on Apple M4 (checkasm --bench, 15-run average):

  Mode 10 (Horizontal):
    4x4: 4.66x    8x8: 5.80x    16x16: 16.86x    32x32: 24.89x

  Mode 26 (Vertical):
    4x4: 1.16x    8x8: 1.83x    16x16: 2.45x    32x32: 4.50x

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-06-07 23:29:33 +00:00
..
h26x aarch64: Add PAC sign/validation of the link register 2026-03-20 13:16:06 +02:00
vvc aarch64/vvc: Optimisations of put_chroma_hv() functions for 10/12-bit 2026-04-27 20:10:57 +00:00
aacencdsp_init.c
aacencdsp_neon.S
aacpsdsp_init_aarch64.c
aacpsdsp_neon.S
ac3dsp_init_aarch64.c
ac3dsp_neon.S avcodec/aarch64/ac3dsp_neon.S: Optimize ac3_sum_square_butterfly_int32_neon 2025-03-02 01:17:53 +02:00
cabac.h
dcadsp_init_aarch64.c avcodec/aarch64: add NEON DCA LFE FIR filter functions 2026-04-27 20:13:23 +00:00
dcadsp_neon.S avcodec/aarch64: add NEON DCA LFE FIR filter functions 2026-04-27 20:13:23 +00:00
fdct.h
fdctdsp_init_aarch64.c
fdctdsp_neon.S
fmtconvert_init.c
fmtconvert_neon.S
h264chroma_init_aarch64.c
h264cmc_neon.S
h264dsp_init_aarch64.c avcodec/h264dsp: Remove redundant h264 from H264DSPCtx member names 2026-01-25 22:53:25 +01:00
h264dsp_neon.S
h264idct_neon.S
h264pred_init.c aarch64/h264pred: disable inefficient functions 2026-02-04 09:06:37 +00:00
h264pred_neon.S lavc/aarch64: Fix addp overflow in ff_pred16x16_plane_neon_10 2025-10-24 15:32:35 +00:00
h264qpel_init_aarch64.c
h264qpel_neon.S
hevcdsp_deblock_neon.S aarch64: hevcdsp: Make returns match the call site 2026-03-17 20:37:53 +00:00
hevcdsp_dequant_neon.S lavc/hevc: add aarch64 neon for 12-bit dequant 2026-01-25 06:55:26 +00:00
hevcdsp_idct_neon.S aarch64/hevcdsp_idct_neon: Add implementation for idct dc 12 2025-03-04 17:01:58 +08:00
hevcdsp_init_aarch64.c lavc/hevc: reorder aarch64 NEON pel function assignments 2026-03-13 21:43:37 +00:00
hevcpred_init_aarch64.c lavc/hevc: add aarch64 NEON for angular modes 10 and 26 2026-06-07 23:29:33 +00:00
hevcpred_neon.S lavc/hevc: add aarch64 NEON for angular modes 10 and 26 2026-06-07 23:29:33 +00:00
hpeldsp_init_aarch64.c
hpeldsp_neon.S aarch64/hpeldsp_neon: fix out-of-bounds read 2026-01-04 03:22:55 +00:00
huffyuvdsp_init_aarch64.c libavcodec/huffyuvdsp: Add NEON optimization for the add_int16 function 2026-03-04 22:31:19 +00:00
huffyuvdsp_neon.S libavcodec/huffyuvdsp: Add NEON optimization for the add_int16 function 2026-03-04 22:31:19 +00:00
idct.h
idctdsp_init_aarch64.c
idctdsp_neon.S
Makefile avcodec/aarch64: add NEON DCA LFE FIR filter functions 2026-04-27 20:13:23 +00:00
me_cmp_init_aarch64.c avcodec/mpegvideoenc: Add MPVEncContext 2025-03-26 04:08:33 +01:00
me_cmp_neon.S aarch64: Add PAC sign/validation of the link register 2026-03-20 13:16:06 +02:00
mpegaudiodsp_init.c
mpegaudiodsp_neon.S
mpegvideoencdsp_init.c
mpegvideoencdsp_neon.S
neon.S
neontest.c
opusdsp_init.c
opusdsp_neon.S avcodec/aarch64/opusdsp_neon: Simplify opus_postfilter_neon 2025-02-10 14:55:16 +02:00
pixblockdsp_init_aarch64.c avcodec/pixblockdsp: be consistent about restrict use in ff_{get,diff}_pixels 2025-10-25 01:01:15 +02:00
pixblockdsp_neon.S
pngdsp_init.c avcodec/aarch64: add pngdsp 2026-02-04 12:05:35 +08:00
pngdsp_neon.S avcodec/aarch64: add pngdsp 2026-02-04 12:05:35 +08:00
rv40dsp_init_aarch64.c
sbrdsp_init_aarch64.c
sbrdsp_neon.S aarch64/sbrdsp: unroll sum64x5 to 16 floats/iter 2026-06-03 10:40:20 +00:00
simple_idct_neon.S
synth_filter_init.c
synth_filter_neon.S
vc1dsp_init_aarch64.c
vc1dsp_neon.S
videodsp.S
videodsp_init.c
vorbisdsp_init.c
vorbisdsp_neon.S
vp8dsp.h
vp8dsp_init_aarch64.c
vp8dsp_neon.S
vp9dsp_init.h
vp9dsp_init_10bpp_aarch64.c
vp9dsp_init_12bpp_aarch64.c
vp9dsp_init_16bpp_aarch64_template.c
vp9dsp_init_aarch64.c
vp9itxfm_16bpp_neon.S
vp9itxfm_neon.S
vp9lpf_16bpp_neon.S
vp9lpf_neon.S
vp9mc_16bpp_neon.S
vp9mc_aarch64.S
vp9mc_neon.S