mirror of
https://git.ffmpeg.org/ffmpeg.git
synced 2026-06-14 11:30:39 +00:00
Add NEON-optimized implementations for HEVC angular intra prediction
modes 10 (pure horizontal) and 26 (pure vertical) at 8-bit depth.
Mode 10 (Horizontal):
- Broadcasts left[y] to fill each row using ld2r/ld4r for efficiency
- Applies edge smoothing for luma blocks smaller than 32x32
Mode 26 (Vertical):
- Copies top reference row to all output rows
- Applies edge smoothing for luma blocks smaller than 32x32
Edge smoothing uses uhsub+usqadd to compute the filtered result
directly in 8-bit, avoiding widening to 16-bit intermediates.
The C pred_angular wrappers are made non-static with ff_ prefix to
allow the NEON dispatch to fall back to C for modes not yet optimized.
This will be reverted once all angular modes are implemented.
Note: since pred_angular[] is a per-size function pointer (not
per-mode), checkasm benchmarks will show '_neon' for all 33 modes
even though only modes 10/26 are truly accelerated; unoptimized
modes show ~1.0x speedup as they pass through the NEON wrapper to
the C fallback with negligible overhead.
Speedup over C on Apple M4 (checkasm --bench, 15-run average):
Mode 10 (Horizontal):
4x4: 4.66x 8x8: 5.80x 16x16: 16.86x 32x32: 24.89x
Mode 26 (Vertical):
4x4: 1.16x 8x8: 1.83x 16x16: 2.45x 32x32: 4.50x
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
|
||
|---|---|---|
| .. | ||
| h26x | ||
| vvc | ||
| aacencdsp_init.c | ||
| aacencdsp_neon.S | ||
| aacpsdsp_init_aarch64.c | ||
| aacpsdsp_neon.S | ||
| ac3dsp_init_aarch64.c | ||
| ac3dsp_neon.S | ||
| cabac.h | ||
| dcadsp_init_aarch64.c | ||
| dcadsp_neon.S | ||
| fdct.h | ||
| fdctdsp_init_aarch64.c | ||
| fdctdsp_neon.S | ||
| fmtconvert_init.c | ||
| fmtconvert_neon.S | ||
| h264chroma_init_aarch64.c | ||
| h264cmc_neon.S | ||
| h264dsp_init_aarch64.c | ||
| h264dsp_neon.S | ||
| h264idct_neon.S | ||
| h264pred_init.c | ||
| h264pred_neon.S | ||
| h264qpel_init_aarch64.c | ||
| h264qpel_neon.S | ||
| hevcdsp_deblock_neon.S | ||
| hevcdsp_dequant_neon.S | ||
| hevcdsp_idct_neon.S | ||
| hevcdsp_init_aarch64.c | ||
| hevcpred_init_aarch64.c | ||
| hevcpred_neon.S | ||
| hpeldsp_init_aarch64.c | ||
| hpeldsp_neon.S | ||
| huffyuvdsp_init_aarch64.c | ||
| huffyuvdsp_neon.S | ||
| idct.h | ||
| idctdsp_init_aarch64.c | ||
| idctdsp_neon.S | ||
| Makefile | ||
| me_cmp_init_aarch64.c | ||
| me_cmp_neon.S | ||
| mpegaudiodsp_init.c | ||
| mpegaudiodsp_neon.S | ||
| mpegvideoencdsp_init.c | ||
| mpegvideoencdsp_neon.S | ||
| neon.S | ||
| neontest.c | ||
| opusdsp_init.c | ||
| opusdsp_neon.S | ||
| pixblockdsp_init_aarch64.c | ||
| pixblockdsp_neon.S | ||
| pngdsp_init.c | ||
| pngdsp_neon.S | ||
| rv40dsp_init_aarch64.c | ||
| sbrdsp_init_aarch64.c | ||
| sbrdsp_neon.S | ||
| simple_idct_neon.S | ||
| synth_filter_init.c | ||
| synth_filter_neon.S | ||
| vc1dsp_init_aarch64.c | ||
| vc1dsp_neon.S | ||
| videodsp.S | ||
| videodsp_init.c | ||
| vorbisdsp_init.c | ||
| vorbisdsp_neon.S | ||
| vp8dsp.h | ||
| vp8dsp_init_aarch64.c | ||
| vp8dsp_neon.S | ||
| vp9dsp_init.h | ||
| vp9dsp_init_10bpp_aarch64.c | ||
| vp9dsp_init_12bpp_aarch64.c | ||
| vp9dsp_init_16bpp_aarch64_template.c | ||
| vp9dsp_init_aarch64.c | ||
| vp9itxfm_16bpp_neon.S | ||
| vp9itxfm_neon.S | ||
| vp9lpf_16bpp_neon.S | ||
| vp9lpf_neon.S | ||
| vp9mc_16bpp_neon.S | ||
| vp9mc_aarch64.S | ||
| vp9mc_neon.S | ||