ffmpeg/libavcodec/aarch64
Georgii Zagoruiko c1be2107c9 aarch64/vvc: Optimisations of put_luma_h() functions for 10/12-bit
RPi4:
put_chroma_h_10_2x2_c:                                  63.4 ( 1.00x)
put_chroma_h_10_4x4_c:                                 151.4 ( 1.00x)
put_chroma_h_10_8x8_c:                                 555.1 ( 1.00x)
put_chroma_h_10_8x8_neon:                              113.9 ( 4.88x)
put_chroma_h_10_16x16_c:                              1068.5 ( 1.00x)
put_chroma_h_10_16x16_neon:                            439.4 ( 2.43x)
put_chroma_h_10_32x32_c:                              3432.6 ( 1.00x)
put_chroma_h_10_32x32_neon:                           1878.3 ( 1.83x)
put_chroma_h_10_64x64_c:                             12872.2 ( 1.00x)
put_chroma_h_10_64x64_neon:                           7868.2 ( 1.64x)
put_chroma_h_10_128x128_c:                           45612.2 ( 1.00x)
put_chroma_h_10_128x128_neon:                        28742.1 ( 1.59x)
put_chroma_h_12_2x2_c:                                  63.7 ( 1.00x)
put_chroma_h_12_4x4_c:                                 151.5 ( 1.00x)
put_chroma_h_12_8x8_c:                                 555.2 ( 1.00x)
put_chroma_h_12_8x8_neon:                              114.2 ( 4.86x)
put_chroma_h_12_16x16_c:                              1068.1 ( 1.00x)
put_chroma_h_12_16x16_neon:                            438.8 ( 2.43x)
put_chroma_h_12_32x32_c:                              3419.7 ( 1.00x)
put_chroma_h_12_32x32_neon:                           1878.7 ( 1.82x)
put_chroma_h_12_64x64_c:                             12862.2 ( 1.00x)
put_chroma_h_12_64x64_neon:                           7868.2 ( 1.63x)
put_chroma_h_12_128x128_c:                           45613.5 ( 1.00x)
put_chroma_h_12_128x128_neon:                        28743.3 ( 1.59x)

Apple M4:
put_chroma_h_10_2x2_c:                                   2.5 ( 1.00x)
put_chroma_h_10_4x4_c:                                   6.5 ( 1.00x)
put_chroma_h_10_8x8_c:                                  17.8 ( 1.00x)
put_chroma_h_10_8x8_neon:                                6.8 ( 2.60x)
put_chroma_h_10_16x16_c:                                53.3 ( 1.00x)
put_chroma_h_10_16x16_neon:                             30.4 ( 1.75x)
put_chroma_h_10_32x32_c:                               181.8 ( 1.00x)
put_chroma_h_10_32x32_neon:                            116.2 ( 1.56x)
put_chroma_h_10_64x64_c:                               684.2 ( 1.00x)
put_chroma_h_10_64x64_neon:                            470.3 ( 1.45x)
put_chroma_h_10_128x128_c:                            2567.6 ( 1.00x)
put_chroma_h_10_128x128_neon:                         1879.3 ( 1.37x)
put_chroma_h_12_2x2_c:                                   1.9 ( 1.00x)
put_chroma_h_12_4x4_c:                                   7.0 ( 1.00x)
put_chroma_h_12_8x8_c:                                  16.8 ( 1.00x)
put_chroma_h_12_8x8_neon:                                7.9 ( 2.12x)
put_chroma_h_12_16x16_c:                                55.0 ( 1.00x)
put_chroma_h_12_16x16_neon:                             29.0 ( 1.90x)
put_chroma_h_12_32x32_c:                               182.5 ( 1.00x)
put_chroma_h_12_32x32_neon:                            116.9 ( 1.56x)
put_chroma_h_12_64x64_c:                               666.8 ( 1.00x)
put_chroma_h_12_64x64_neon:                            474.5 ( 1.41x)
put_chroma_h_12_128x128_c:                            2588.1 ( 1.00x)
put_chroma_h_12_128x128_neon:                         1912.2 ( 1.35x)
2026-03-10 12:48:54 +00:00
..
h26x lavc/hevc: optimize qpel H-pass for width>=16 with byte-domain widening multiply 2026-03-03 12:04:14 +00:00
vvc aarch64/vvc: Optimisations of put_luma_h() functions for 10/12-bit 2026-03-10 12:48:54 +00:00
aacencdsp_init.c avcodec/aarch64/aacencdsp: NEON implementation 2025-01-28 10:44:40 +02:00
aacencdsp_neon.S avcodec/aarch64/aacencdsp: NEON implementation 2025-01-28 10:44:40 +02:00
aacpsdsp_init_aarch64.c
aacpsdsp_neon.S aarch64: Reindent all assembly to 8/24 column indentation 2023-10-21 23:25:54 +03:00
ac3dsp_init_aarch64.c avcodec/ac3: Implement sum_square_butterfly_float for aarch64 NEON 2024-04-08 13:36:40 +03:00
ac3dsp_neon.S avcodec/aarch64/ac3dsp_neon.S: Optimize ac3_sum_square_butterfly_int32_neon 2025-03-02 01:17:53 +02:00
cabac.h
fdct.h lavc/aarch64/fdct: add neon-optimized fdct for aarch64 2024-05-13 14:54:10 +02:00
fdctdsp_init_aarch64.c lavc/aarch64/fdct: add neon-optimized fdct for aarch64 2024-05-13 14:54:10 +02:00
fdctdsp_neon.S lavc/aarch64/fdct: add neon-optimized fdct for aarch64 2024-05-13 14:54:10 +02:00
fmtconvert_init.c
fmtconvert_neon.S
h264chroma_init_aarch64.c
h264cmc_neon.S aarch64: Lowercase UXTW/SXTW and similar flags 2023-10-21 23:25:23 +03:00
h264dsp_init_aarch64.c avcodec/h264dsp: Remove redundant h264 from H264DSPCtx member names 2026-01-25 22:53:25 +01:00
h264dsp_neon.S aarch64: Make the indentation more consistent 2023-10-21 23:25:29 +03:00
h264idct_neon.S aarch64: Lowercase UXTW/SXTW and similar flags 2023-10-21 23:25:23 +03:00
h264pred_init.c aarch64/h264pred: disable inefficient functions 2026-02-04 09:06:37 +00:00
h264pred_neon.S lavc/aarch64: Fix addp overflow in ff_pred16x16_plane_neon_10 2025-10-24 15:32:35 +00:00
h264qpel_init_aarch64.c lavc/aarch64: h264qpel, add 10-bit lowpass_8_10 based functions 2023-12-07 23:20:14 +02:00
h264qpel_neon.S lavc/aarch64: h264qpel, add 10-bit lowpass_8_10 based functions 2023-12-07 23:20:14 +02:00
hevcdsp_deblock_neon.S avcodec/aarch64/hevc: add luma deblock NEON 2024-02-28 10:14:58 +01:00
hevcdsp_dequant_neon.S lavc/hevc: add aarch64 neon for 12-bit dequant 2026-01-25 06:55:26 +00:00
hevcdsp_idct_neon.S aarch64/hevcdsp_idct_neon: Add implementation for idct dc 12 2025-03-04 17:01:58 +08:00
hevcdsp_init_aarch64.c lavc/hevc: add aarch64 neon for 12-bit dequant 2026-01-25 06:55:26 +00:00
hpeldsp_init_aarch64.c
hpeldsp_neon.S aarch64/hpeldsp_neon: fix out-of-bounds read 2026-01-04 03:22:55 +00:00
huffyuvdsp_init_aarch64.c libavcodec/huffyuvdsp: Add NEON optimization for the add_int16 function 2026-03-04 22:31:19 +00:00
huffyuvdsp_neon.S libavcodec/huffyuvdsp: Add NEON optimization for the add_int16 function 2026-03-04 22:31:19 +00:00
idct.h
idctdsp_init_aarch64.c lavc/aarch64: fix include for cpu.h 2024-05-13 14:50:38 +02:00
idctdsp_neon.S
Makefile libavcodec/huffyuvdsp: Add NEON optimization for the add_int16 function 2026-03-04 22:31:19 +00:00
me_cmp_init_aarch64.c avcodec/mpegvideoenc: Add MPVEncContext 2025-03-26 04:08:33 +01:00
me_cmp_neon.S all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
mpegaudiodsp_init.c
mpegaudiodsp_neon.S lavc/hevcdsp_qpel_neon: using movi.16b instead of movi.2d 2023-11-28 15:54:49 +02:00
mpegvideoencdsp_init.c avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t 2024-09-01 13:42:30 +02:00
mpegvideoencdsp_neon.S avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t 2024-09-01 13:42:30 +02:00
neon.S aarch64: Consistently use lowercase for vector element specifiers 2023-10-21 23:25:18 +03:00
neontest.c
opusdsp_init.c lavc/opus*: move to opus/ subdir 2024-09-02 11:56:53 +02:00
opusdsp_neon.S avcodec/aarch64/opusdsp_neon: Simplify opus_postfilter_neon 2025-02-10 14:55:16 +02:00
pixblockdsp_init_aarch64.c avcodec/pixblockdsp: be consistent about restrict use in ff_{get,diff}_pixels 2025-10-25 01:01:15 +02:00
pixblockdsp_neon.S
pngdsp_init.c avcodec/aarch64: add pngdsp 2026-02-04 12:05:35 +08:00
pngdsp_neon.S avcodec/aarch64: add pngdsp 2026-02-04 12:05:35 +08:00
rv40dsp_init_aarch64.c
sbrdsp_init_aarch64.c
sbrdsp_neon.S aarch64: Consistently use lowercase for vector element specifiers 2023-10-21 23:25:18 +03:00
simple_idct_neon.S aarch64: Consistently use lowercase for vector element specifiers 2023-10-21 23:25:18 +03:00
synth_filter_init.c avcodec: Remove DCT, FFT, MDCT and RDFT 2023-10-01 02:25:09 +02:00
synth_filter_neon.S avcodec: Remove DCT, FFT, MDCT and RDFT 2023-10-01 02:25:09 +02:00
vc1dsp_init_aarch64.c
vc1dsp_neon.S
videodsp.S
videodsp_init.c
vorbisdsp_init.c
vorbisdsp_neon.S
vp8dsp.h
vp8dsp_init_aarch64.c
vp8dsp_neon.S aarch64: Make the indentation more consistent 2023-10-21 23:25:29 +03:00
vp9dsp_init.h
vp9dsp_init_10bpp_aarch64.c
vp9dsp_init_12bpp_aarch64.c
vp9dsp_init_16bpp_aarch64_template.c
vp9dsp_init_aarch64.c
vp9itxfm_16bpp_neon.S
vp9itxfm_neon.S
vp9lpf_16bpp_neon.S
vp9lpf_neon.S
vp9mc_16bpp_neon.S
vp9mc_aarch64.S
vp9mc_neon.S aarch64: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter 2025-01-03 17:53:46 -05:00