ffmpeg/libavcodec/aarch64
Zhao Zhili 200914853d aarch64/sbrdsp: unroll sum64x5 to 16 floats/iter
The C version is faster than the previous asm with clang and gcc > 12 on
rpi5, since compiler basically does the same unroll.

sum64x5_neon:             before          after
  Cortex-A76 (gcc 12.4):  72.3 (3.63x)    47.4 (5.56x)
  Cortex-A76 (gcc 14.2):  72.3 (0.69x)    47.4 (1.05x)
  Apple M1 (clang 16):     0.2 (0.98x)     0.2 (0.99x)

Signed-off-by: Zhao Zhili <quinkblack@foxmail.com>
2026-06-03 10:40:20 +00:00
..
h26x aarch64: Add PAC sign/validation of the link register 2026-03-20 13:16:06 +02:00
vvc aarch64/vvc: Optimisations of put_chroma_hv() functions for 10/12-bit 2026-04-27 20:10:57 +00:00
aacencdsp_init.c avcodec/aarch64/aacencdsp: NEON implementation 2025-01-28 10:44:40 +02:00
aacencdsp_neon.S avcodec/aarch64/aacencdsp: NEON implementation 2025-01-28 10:44:40 +02:00
aacpsdsp_init_aarch64.c Include attributes.h directly 2021-04-19 14:34:10 +02:00
aacpsdsp_neon.S aarch64: Reindent all assembly to 8/24 column indentation 2023-10-21 23:25:54 +03:00
ac3dsp_init_aarch64.c avcodec/ac3: Implement sum_square_butterfly_float for aarch64 NEON 2024-04-08 13:36:40 +03:00
ac3dsp_neon.S avcodec/aarch64/ac3dsp_neon.S: Optimize ac3_sum_square_butterfly_int32_neon 2025-03-02 01:17:53 +02:00
cabac.h
dcadsp_init_aarch64.c avcodec/aarch64: add NEON DCA LFE FIR filter functions 2026-04-27 20:13:23 +00:00
dcadsp_neon.S avcodec/aarch64: add NEON DCA LFE FIR filter functions 2026-04-27 20:13:23 +00:00
fdct.h lavc/aarch64/fdct: add neon-optimized fdct for aarch64 2024-05-13 14:54:10 +02:00
fdctdsp_init_aarch64.c lavc/aarch64/fdct: add neon-optimized fdct for aarch64 2024-05-13 14:54:10 +02:00
fdctdsp_neon.S lavc/aarch64/fdct: add neon-optimized fdct for aarch64 2024-05-13 14:54:10 +02:00
fmtconvert_init.c avcodec/fmtconvert: Remove unused AVCodecContext parameter 2022-09-21 20:26:40 +02:00
fmtconvert_neon.S
h264chroma_init_aarch64.c avcodec/h264chroma: Constify src in h264_chroma_mc_func 2022-08-05 03:02:13 +02:00
h264cmc_neon.S aarch64: Lowercase UXTW/SXTW and similar flags 2023-10-21 23:25:23 +03:00
h264dsp_init_aarch64.c avcodec/h264dsp: Remove redundant h264 from H264DSPCtx member names 2026-01-25 22:53:25 +01:00
h264dsp_neon.S aarch64: Make the indentation more consistent 2023-10-21 23:25:29 +03:00
h264idct_neon.S aarch64: Lowercase UXTW/SXTW and similar flags 2023-10-21 23:25:23 +03:00
h264pred_init.c aarch64/h264pred: disable inefficient functions 2026-02-04 09:06:37 +00:00
h264pred_neon.S lavc/aarch64: Fix addp overflow in ff_pred16x16_plane_neon_10 2025-10-24 15:32:35 +00:00
h264qpel_init_aarch64.c lavc/aarch64: h264qpel, add 10-bit lowpass_8_10 based functions 2023-12-07 23:20:14 +02:00
h264qpel_neon.S lavc/aarch64: h264qpel, add 10-bit lowpass_8_10 based functions 2023-12-07 23:20:14 +02:00
hevcdsp_deblock_neon.S aarch64: hevcdsp: Make returns match the call site 2026-03-17 20:37:53 +00:00
hevcdsp_dequant_neon.S lavc/hevc: add aarch64 neon for 12-bit dequant 2026-01-25 06:55:26 +00:00
hevcdsp_idct_neon.S aarch64/hevcdsp_idct_neon: Add implementation for idct dc 12 2025-03-04 17:01:58 +08:00
hevcdsp_init_aarch64.c lavc/hevc: reorder aarch64 NEON pel function assignments 2026-03-13 21:43:37 +00:00
hevcpred_init_aarch64.c lavc/hevc: add aarch64 NEON for reference sample filtering 2026-04-21 07:50:49 +00:00
hevcpred_neon.S lavc/hevc: add aarch64 NEON for reference sample filtering 2026-04-21 07:50:49 +00:00
hpeldsp_init_aarch64.c
hpeldsp_neon.S aarch64/hpeldsp_neon: fix out-of-bounds read 2026-01-04 03:22:55 +00:00
huffyuvdsp_init_aarch64.c libavcodec/huffyuvdsp: Add NEON optimization for the add_int16 function 2026-03-04 22:31:19 +00:00
huffyuvdsp_neon.S libavcodec/huffyuvdsp: Add NEON optimization for the add_int16 function 2026-03-04 22:31:19 +00:00
idct.h avcodec/aarch64/idct: Add missing stddef 2022-02-21 13:10:04 +01:00
idctdsp_init_aarch64.c lavc/aarch64: fix include for cpu.h 2024-05-13 14:50:38 +02:00
idctdsp_neon.S avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths 2022-04-01 10:03:34 +03:00
Makefile avcodec/aarch64: add NEON DCA LFE FIR filter functions 2026-04-27 20:13:23 +00:00
me_cmp_init_aarch64.c avcodec/mpegvideoenc: Add MPVEncContext 2025-03-26 04:08:33 +01:00
me_cmp_neon.S aarch64: Add PAC sign/validation of the link register 2026-03-20 13:16:06 +02:00
mpegaudiodsp_init.c
mpegaudiodsp_neon.S lavc/hevcdsp_qpel_neon: using movi.16b instead of movi.2d 2023-11-28 15:54:49 +02:00
mpegvideoencdsp_init.c avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t 2024-09-01 13:42:30 +02:00
mpegvideoencdsp_neon.S avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t 2024-09-01 13:42:30 +02:00
neon.S aarch64: Consistently use lowercase for vector element specifiers 2023-10-21 23:25:18 +03:00
neontest.c avcodec: Remove deprecated old encode/decode APIs 2021-04-27 10:43:12 -03:00
opusdsp_init.c lavc/opus*: move to opus/ subdir 2024-09-02 11:56:53 +02:00
opusdsp_neon.S avcodec/aarch64/opusdsp_neon: Simplify opus_postfilter_neon 2025-02-10 14:55:16 +02:00
pixblockdsp_init_aarch64.c avcodec/pixblockdsp: be consistent about restrict use in ff_{get,diff}_pixels 2025-10-25 01:01:15 +02:00
pixblockdsp_neon.S
pngdsp_init.c avcodec/aarch64: add pngdsp 2026-02-04 12:05:35 +08:00
pngdsp_neon.S avcodec/aarch64: add pngdsp 2026-02-04 12:05:35 +08:00
rv40dsp_init_aarch64.c avcodec/h264chroma: Constify src in h264_chroma_mc_func 2022-08-05 03:02:13 +02:00
sbrdsp_init_aarch64.c
sbrdsp_neon.S aarch64/sbrdsp: unroll sum64x5 to 16 floats/iter 2026-06-03 10:40:20 +00:00
simple_idct_neon.S aarch64: Consistently use lowercase for vector element specifiers 2023-10-21 23:25:18 +03:00
synth_filter_init.c avcodec: Remove DCT, FFT, MDCT and RDFT 2023-10-01 02:25:09 +02:00
synth_filter_neon.S avcodec: Remove DCT, FFT, MDCT and RDFT 2023-10-01 02:25:09 +02:00
vc1dsp_init_aarch64.c avcodec/h264chroma: Constify src in h264_chroma_mc_func 2022-08-05 03:02:13 +02:00
vc1dsp_neon.S avcodec/vc1: Arm 64-bit NEON unescape fast path 2022-04-01 10:03:34 +03:00
videodsp.S lavc/aarch64: fix relocation out of range error 2021-09-25 21:55:29 +03:00
videodsp_init.c avcodec/videodsp: Constify buf in VideoDSPContext.prefetch 2022-07-31 03:14:34 +02:00
vorbisdsp_init.c lavc/vorbisdsp: use ptrdiff_t rather than intptr_t 2022-09-19 13:51:00 -03:00
vorbisdsp_neon.S
vp8dsp.h avcodec/vp8dsp: Constify src in vp8_mc_func 2022-09-11 20:57:51 +02:00
vp8dsp_init_aarch64.c
vp8dsp_neon.S aarch64: Make the indentation more consistent 2023-10-21 23:25:29 +03:00
vp9dsp_init.h
vp9dsp_init_10bpp_aarch64.c
vp9dsp_init_12bpp_aarch64.c
vp9dsp_init_16bpp_aarch64_template.c
vp9dsp_init_aarch64.c
vp9itxfm_16bpp_neon.S aarch64: Use ret x<n> instead of br x<n> where possible 2021-11-16 13:43:56 +02:00
vp9itxfm_neon.S aarch64: Implement stack spilling in a consistent way. 2022-10-11 09:12:02 +02:00
vp9lpf_16bpp_neon.S aarch64: Implement stack spilling in a consistent way. 2022-10-11 09:12:02 +02:00
vp9lpf_neon.S aarch64: Implement stack spilling in a consistent way. 2022-10-11 09:12:02 +02:00
vp9mc_16bpp_neon.S
vp9mc_aarch64.S
vp9mc_neon.S aarch64: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter 2025-01-03 17:53:46 -05:00