ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-02-15 15:30:23 +00:00

History

Krzysztof Pyrkosz f9b8f30680 avcodec/aarch64/vvc: Optimize vvc_avg{8, 10, 12} This patch replaces integer widening with halving addition, and multi-step "emulated" rounding shift with a single asm instruction doing exactly that. Benchmarks before and after: A78 avg_8_64x64_neon: 2686.2 ( 6.12x) avg_8_128x128_neon: 10734.2 ( 5.88x) avg_10_64x64_neon: 2536.8 ( 5.40x) avg_10_128x128_neon: 10079.0 ( 5.22x) avg_12_64x64_neon: 2548.2 ( 5.38x) avg_12_128x128_neon: 10133.8 ( 5.19x) avg_8_64x64_neon: 897.8 (18.26x) avg_8_128x128_neon: 3608.5 (17.37x) avg_10_32x32_neon: 444.2 ( 8.51x) avg_10_64x64_neon: 1711.8 ( 8.00x) avg_12_64x64_neon: 1706.2 ( 8.02x) avg_12_128x128_neon: 7010.0 ( 7.46x) A72 avg_8_64x64_neon: 5823.4 ( 3.88x) avg_8_128x128_neon: 17430.5 ( 4.73x) avg_10_64x64_neon: 5228.1 ( 3.71x) avg_10_128x128_neon: 16722.2 ( 4.17x) avg_12_64x64_neon: 5379.1 ( 3.51x) avg_12_128x128_neon: 16715.7 ( 4.17x) avg_8_64x64_neon: 2006.5 (10.61x) avg_8_128x128_neon: 9158.7 ( 8.96x) avg_10_64x64_neon: 3357.7 ( 5.60x) avg_10_128x128_neon: 12411.7 ( 5.56x) avg_12_64x64_neon: 3317.5 ( 5.67x) avg_12_128x128_neon: 12358.5 ( 5.58x) A53 avg_8_64x64_neon: 8327.8 ( 5.18x) avg_8_128x128_neon: 31631.3 ( 5.34x) avg_10_64x64_neon: 8783.5 ( 4.98x) avg_10_128x128_neon: 32617.0 ( 5.25x) avg_12_64x64_neon: 8686.0 ( 5.06x) avg_12_128x128_neon: 32487.5 ( 5.25x) avg_8_64x64_neon: 6032.3 ( 7.17x) avg_8_128x128_neon: 22008.5 ( 7.69x) avg_10_64x64_neon: 7738.0 ( 5.68x) avg_10_128x128_neon: 27813.8 ( 6.14x) avg_12_64x64_neon: 7844.5 ( 5.60x) avg_12_128x128_neon: 26999.5 ( 6.34x) Signed-off-by: Martin Storsjö <martin@martin.st>		2025-03-07 15:51:20 +02:00
..
h26x	aarch64: h26x: Fix the indentation of one function	2024-09-26 13:42:11 +03:00
vvc	avcodec/aarch64/vvc: Optimize vvc_avg{8, 10, 12}	2025-03-07 15:51:20 +02:00
aacencdsp_init.c	avcodec/aarch64/aacencdsp: NEON implementation	2025-01-28 10:44:40 +02:00
aacencdsp_neon.S	avcodec/aarch64/aacencdsp: NEON implementation	2025-01-28 10:44:40 +02:00
aacpsdsp_init_aarch64.c	Include attributes.h directly	2021-04-19 14:34:10 +02:00
aacpsdsp_neon.S	aarch64: Reindent all assembly to 8/24 column indentation	2023-10-21 23:25:54 +03:00
ac3dsp_init_aarch64.c	avcodec/ac3: Implement sum_square_butterfly_float for aarch64 NEON	2024-04-08 13:36:40 +03:00
ac3dsp_neon.S	avcodec/aarch64/ac3dsp_neon.S: Optimize ac3_sum_square_butterfly_int32_neon	2025-03-02 01:17:53 +02:00
cabac.h
fdct.h	lavc/aarch64/fdct: add neon-optimized fdct for aarch64	2024-05-13 14:54:10 +02:00
fdctdsp_init_aarch64.c	lavc/aarch64/fdct: add neon-optimized fdct for aarch64	2024-05-13 14:54:10 +02:00
fdctdsp_neon.S	lavc/aarch64/fdct: add neon-optimized fdct for aarch64	2024-05-13 14:54:10 +02:00
fmtconvert_init.c	avcodec/fmtconvert: Remove unused AVCodecContext parameter	2022-09-21 20:26:40 +02:00
fmtconvert_neon.S
h264chroma_init_aarch64.c	avcodec/h264chroma: Constify src in h264_chroma_mc_func	2022-08-05 03:02:13 +02:00
h264cmc_neon.S	aarch64: Lowercase UXTW/SXTW and similar flags	2023-10-21 23:25:23 +03:00
h264dsp_init_aarch64.c	lavc/aarch64: h264, add chroma loop filters for 10bit	2021-08-21 00:06:26 +03:00
h264dsp_neon.S	aarch64: Make the indentation more consistent	2023-10-21 23:25:29 +03:00
h264idct_neon.S	aarch64: Lowercase UXTW/SXTW and similar flags	2023-10-21 23:25:23 +03:00
h264pred_init.c	lavc/aarch64: add pred functions for 10-bit	2021-08-21 00:06:26 +03:00
h264pred_neon.S	lavc/aarch64: Fix ff_pred16x16_plane_neon_10	2024-12-17 14:50:29 +02:00
h264qpel_init_aarch64.c	lavc/aarch64: h264qpel, add 10-bit lowpass_8_10 based functions	2023-12-07 23:20:14 +02:00
h264qpel_neon.S	lavc/aarch64: h264qpel, add 10-bit lowpass_8_10 based functions	2023-12-07 23:20:14 +02:00
hevcdsp_deblock_neon.S	avcodec/aarch64/hevc: add luma deblock NEON	2024-02-28 10:14:58 +01:00
hevcdsp_idct_neon.S	aarch64/hevcdsp_idct_neon: Add implementation for idct dc 12	2025-03-04 17:01:58 +08:00
hevcdsp_init_aarch64.c	aarch64/hevcdsp_idct_neon: Add implementation for idct dc 12	2025-03-04 17:01:58 +08:00
hpeldsp_init_aarch64.c
hpeldsp_neon.S	aarch64: Consistently use lowercase for vector element specifiers	2023-10-21 23:25:18 +03:00
idct.h	avcodec/aarch64/idct: Add missing stddef	2022-02-21 13:10:04 +01:00
idctdsp_init_aarch64.c	lavc/aarch64: fix include for cpu.h	2024-05-13 14:50:38 +02:00
idctdsp_neon.S	avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths	2022-04-01 10:03:34 +03:00
Makefile	avcodec/aarch64/aacencdsp: NEON implementation	2025-01-28 10:44:40 +02:00
me_cmp_init_aarch64.c	avcodec/aarch64/me_cmp: add dotprod implementations of sse16 and vsse_intra16	2024-08-17 15:31:48 +02:00
me_cmp_neon.S	avcodec/aarch64/me_cmp: add dotprod implementations of sse16 and vsse_intra16	2024-08-17 15:31:48 +02:00
mpegaudiodsp_init.c
mpegaudiodsp_neon.S	lavc/hevcdsp_qpel_neon: using movi.16b instead of movi.2d	2023-11-28 15:54:49 +02:00
mpegvideoencdsp_init.c	avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t	2024-09-01 13:42:30 +02:00
mpegvideoencdsp_neon.S	avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t	2024-09-01 13:42:30 +02:00
neon.S	aarch64: Consistently use lowercase for vector element specifiers	2023-10-21 23:25:18 +03:00
neontest.c	avcodec: Remove deprecated old encode/decode APIs	2021-04-27 10:43:12 -03:00
opusdsp_init.c	lavc/opus*: move to opus/ subdir	2024-09-02 11:56:53 +02:00
opusdsp_neon.S	avcodec/aarch64/opusdsp_neon: Simplify opus_postfilter_neon	2025-02-10 14:55:16 +02:00
pixblockdsp_init_aarch64.c	libavcodec: aarch64: Add a NEON implementation of pixblockdsp	2020-05-15 23:37:55 +03:00
pixblockdsp_neon.S	libavcodec: aarch64: Add a NEON implementation of pixblockdsp	2020-05-15 23:37:55 +03:00
rv40dsp_init_aarch64.c	avcodec/h264chroma: Constify src in h264_chroma_mc_func	2022-08-05 03:02:13 +02:00
sbrdsp_init_aarch64.c
sbrdsp_neon.S	aarch64: Consistently use lowercase for vector element specifiers	2023-10-21 23:25:18 +03:00
simple_idct_neon.S	aarch64: Consistently use lowercase for vector element specifiers	2023-10-21 23:25:18 +03:00
synth_filter_init.c	avcodec: Remove DCT, FFT, MDCT and RDFT	2023-10-01 02:25:09 +02:00
synth_filter_neon.S	avcodec: Remove DCT, FFT, MDCT and RDFT	2023-10-01 02:25:09 +02:00
vc1dsp_init_aarch64.c	avcodec/h264chroma: Constify src in h264_chroma_mc_func	2022-08-05 03:02:13 +02:00
vc1dsp_neon.S	avcodec/vc1: Arm 64-bit NEON unescape fast path	2022-04-01 10:03:34 +03:00
videodsp.S	lavc/aarch64: fix relocation out of range error	2021-09-25 21:55:29 +03:00
videodsp_init.c	avcodec/videodsp: Constify buf in VideoDSPContext.prefetch	2022-07-31 03:14:34 +02:00
vorbisdsp_init.c	lavc/vorbisdsp: use ptrdiff_t rather than intptr_t	2022-09-19 13:51:00 -03:00
vorbisdsp_neon.S
vp8dsp.h	avcodec/vp8dsp: Constify src in vp8_mc_func	2022-09-11 20:57:51 +02:00
vp8dsp_init_aarch64.c
vp8dsp_neon.S	aarch64: Make the indentation more consistent	2023-10-21 23:25:29 +03:00
vp9dsp_init.h
vp9dsp_init_10bpp_aarch64.c
vp9dsp_init_12bpp_aarch64.c
vp9dsp_init_16bpp_aarch64_template.c	lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h	2021-01-01 14:11:01 +01:00
vp9dsp_init_aarch64.c	lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h	2021-01-01 14:11:01 +01:00
vp9itxfm_16bpp_neon.S	aarch64: Use ret x<n> instead of br x<n> where possible	2021-11-16 13:43:56 +02:00
vp9itxfm_neon.S	aarch64: Implement stack spilling in a consistent way.	2022-10-11 09:12:02 +02:00
vp9lpf_16bpp_neon.S	aarch64: Implement stack spilling in a consistent way.	2022-10-11 09:12:02 +02:00
vp9lpf_neon.S	aarch64: Implement stack spilling in a consistent way.	2022-10-11 09:12:02 +02:00
vp9mc_16bpp_neon.S	lavc/aarch64: Move non-neon vp9 copy functions out of neon source file.	2020-03-11 14:16:40 +01:00
vp9mc_aarch64.S	lavc/aarch64: Fix suffix of new file vp9mc_aarch64.	2020-03-11 14:29:22 +01:00
vp9mc_neon.S	aarch64: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter	2025-01-03 17:53:46 -05:00