ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-02-13 19:05:37 +00:00

History

Martin Storsjö fd3bd5c492 aarch64: h264qpel: Do vertical filtering without transposing This gives rather big speedups on these functions: Before: put_h264_qpel_8_mc01_8_neon: 241.0 131.5 138.7 put_h264_qpel_8_mc02_8_neon: 214.7 121.2 127.5 put_h264_qpel_8_mc03_8_neon: 242.5 131.2 135.7 put_h264_qpel_8_mc11_8_neon: 421.2 218.7 251.0 put_h264_qpel_8_mc12_8_neon: 878.0 509.5 537.5 put_h264_qpel_8_mc13_8_neon: 423.7 217.0 252.0 put_h264_qpel_8_mc21_8_neon: 858.2 479.5 514.0 put_h264_qpel_8_mc22_8_neon: 649.7 385.2 403.0 put_h264_qpel_8_mc23_8_neon: 860.2 476.5 517.7 put_h264_qpel_8_mc31_8_neon: 437.2 219.5 252.5 put_h264_qpel_8_mc32_8_neon: 892.5 510.5 546.0 put_h264_qpel_8_mc33_8_neon: 438.2 218.5 257.0 put_h264_qpel_16_mc01_8_neon: 944.2 509.7 546.7 put_h264_qpel_16_mc02_8_neon: 878.7 469.5 509.7 put_h264_qpel_16_mc03_8_neon: 945.7 510.7 557.0 put_h264_qpel_16_mc11_8_neon: 1663.2 858.5 979.5 put_h264_qpel_16_mc12_8_neon: 3510.2 2027.7 2112.7 put_h264_qpel_16_mc13_8_neon: 1664.7 857.5 980.5 put_h264_qpel_16_mc21_8_neon: 3366.2 1928.5 2030.5 put_h264_qpel_16_mc22_8_neon: 2584.7 1514.7 1590.2 put_h264_qpel_16_mc23_8_neon: 3367.7 1927.7 2035.0 put_h264_qpel_16_mc31_8_neon: 1716.7 849.7 997.0 put_h264_qpel_16_mc32_8_neon: 3564.0 2044.2 3835.2 put_h264_qpel_16_mc33_8_neon: 1717.7 863.0 989.5 After: put_h264_qpel_8_mc01_8_neon: 136.0 73.7 76.0 put_h264_qpel_8_mc02_8_neon: 108.7 65.0 64.0 put_h264_qpel_8_mc03_8_neon: 137.5 72.7 73.0 put_h264_qpel_8_mc11_8_neon: 316.2 159.0 188.5 put_h264_qpel_8_mc12_8_neon: 653.0 375.5 384.7 put_h264_qpel_8_mc13_8_neon: 318.7 165.5 189.5 put_h264_qpel_8_mc21_8_neon: 739.2 385.7 432.5 put_h264_qpel_8_mc22_8_neon: 530.7 295.5 309.5 put_h264_qpel_8_mc23_8_neon: 741.2 393.7 421.0 put_h264_qpel_8_mc31_8_neon: 332.2 162.5 190.0 put_h264_qpel_8_mc32_8_neon: 667.5 378.2 390.5 put_h264_qpel_8_mc33_8_neon: 332.7 166.5 195.5 put_h264_qpel_16_mc01_8_neon: 524.2 285.2 294.0 put_h264_qpel_16_mc02_8_neon: 454.7 252.2 250.2 put_h264_qpel_16_mc03_8_neon: 525.7 286.0 283.0 put_h264_qpel_16_mc11_8_neon: 1243.2 630.7 726.7 put_h264_qpel_16_mc12_8_neon: 2610.2 1479.7 1481.2 put_h264_qpel_16_mc13_8_neon: 1250.5 631.7 727.7 put_h264_qpel_16_mc21_8_neon: 2890.2 1571.2 1679.7 put_h264_qpel_16_mc22_8_neon: 2108.7 1177.5 1223.5 put_h264_qpel_16_mc23_8_neon: 2891.7 1578.7 1667.7 put_h264_qpel_16_mc31_8_neon: 1296.7 630.5 752.5 put_h264_qpel_16_mc32_8_neon: 2664.0 1483.2 1503.5 put_h264_qpel_16_mc33_8_neon: 1297.7 632.5 747.2 I.e. overall a 20%-60% reduction in runtime of these functions. Signed-off-by: Martin Storsjö <martin@martin.st>		2021-10-18 14:27:58 +03:00
..
aacpsdsp_init_aarch64.c	Include attributes.h directly	2021-04-19 14:34:10 +02:00
aacpsdsp_neon.S	lavc/aacpsdsp: use ptrdiff_t for stride in hybrid_analysis	2017-06-28 12:22:39 +02:00
asm-offsets.h	aarch64/asm-offsets: remove old CELT offsets	2019-05-14 23:41:24 +01:00
cabac.h
fft_init_aarch64.c
fft_neon.S
fmtconvert_init.c
fmtconvert_neon.S
h264chroma_init_aarch64.c	Merge commit '`e4a94d8b36`'	2017-03-21 15:20:45 -03:00
h264cmc_neon.S	Merge commit '`e4a94d8b36`'	2017-03-21 15:20:45 -03:00
h264dsp_init_aarch64.c	lavc/aarch64: h264, add chroma loop filters for 10bit	2021-08-21 00:06:26 +03:00
h264dsp_neon.S	lavc/aarch64: h264, add chroma loop filters for 10bit	2021-08-21 00:06:26 +03:00
h264idct_neon.S	libavcodec: Remove dynamic relocs from aarch64/h264idct_neon.S	2019-01-03 20:12:07 +01:00
h264pred_init.c	lavc/aarch64: add pred functions for 10-bit	2021-08-21 00:06:26 +03:00
h264pred_neon.S	lavc/aarch64: add pred functions for 10-bit	2021-08-21 00:06:26 +03:00
h264qpel_init_aarch64.c
h264qpel_neon.S	aarch64: h264qpel: Do vertical filtering without transposing	2021-10-18 14:27:58 +03:00
hevcdsp_idct_neon.S	aarch64: hevc_idct: Fix overflows in idct_dc	2021-05-22 00:08:03 +03:00
hevcdsp_init_aarch64.c	lavc/aarch64: add HEVC sao_band NEON	2021-02-18 14:12:01 +01:00
hevcdsp_sao_neon.S	lavc/aarch64: add HEVC sao_band NEON	2021-02-18 14:12:01 +01:00
hpeldsp_init_aarch64.c
hpeldsp_neon.S
idct.h	Merge commit '`2ec9fa5ec6`'	2017-03-21 14:29:52 -03:00
idctdsp_init_aarch64.c	lavc/aarch64: Fix compilation with --disable-neon	2020-03-11 14:16:48 +01:00
Makefile	lavc/aarch64: add HEVC sao_band NEON	2021-02-18 14:12:01 +01:00
mdct_neon.S
mpegaudiodsp_init.c	Merge commit '`72a19f4013`'	2017-03-31 14:43:37 -03:00
mpegaudiodsp_neon.S	Merge commit '`732510636e`'	2017-11-11 17:47:10 -03:00
neon.S	lavc/aarch64: move transpose_4x8H to neon.S	2021-08-21 00:06:26 +03:00
neontest.c	avcodec: Remove deprecated old encode/decode APIs	2021-04-27 10:43:12 -03:00
opusdsp_init.c	Include attributes.h directly	2021-04-19 14:34:10 +02:00
opusdsp_neon.S	aarch64/opusdsp: do not clobber register v8	2019-08-15 13:29:22 +01:00
pixblockdsp_init_aarch64.c	libavcodec: aarch64: Add a NEON implementation of pixblockdsp	2020-05-15 23:37:55 +03:00
pixblockdsp_neon.S	libavcodec: aarch64: Add a NEON implementation of pixblockdsp	2020-05-15 23:37:55 +03:00
rv40dsp_init_aarch64.c	Merge commit '`e4a94d8b36`'	2017-03-21 15:20:45 -03:00
sbrdsp_init_aarch64.c	lavc/aarch64: add sbrdsp neon implementation	2017-07-03 14:29:22 +02:00
sbrdsp_neon.S	lavc/aarch64/sbrdsp_neon: fix build on old binutils	2018-01-26 02:42:01 -06:00
simple_idct_neon.S	lavc/aarch64/simple_idct: fix build with Xcode 7.2	2017-06-14 23:20:58 +02:00
synth_filter_init.c
synth_filter_neon.S	Merge commit '`2425d7329f`'	2017-04-26 16:28:57 +02:00
vc1dsp_init_aarch64.c	Merge commit '`e4a94d8b36`'	2017-03-21 15:20:45 -03:00
videodsp.S	lavc/aarch64: fix relocation out of range error	2021-09-25 21:55:29 +03:00
videodsp_init.c
vorbisdsp_init.c
vorbisdsp_neon.S
vp8dsp.h	Merge commit '`e39a9212ab`'	2019-03-14 16:18:42 -03:00
vp8dsp_init_aarch64.c	Merge commit '`e39a9212ab`'	2019-03-14 16:18:42 -03:00
vp8dsp_neon.S	Merge commit '`7e42d5f0ab`'	2019-03-14 16:22:29 -03:00
vp9dsp_init.h	vp9: re-split the decoder/format/dsp interface header files.	2017-03-28 18:04:26 -04:00
vp9dsp_init_10bpp_aarch64.c	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	2017-01-24 22:36:05 +02:00
vp9dsp_init_12bpp_aarch64.c	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	2017-01-24 22:36:05 +02:00
vp9dsp_init_16bpp_aarch64_template.c	lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h	2021-01-01 14:11:01 +01:00
vp9dsp_init_aarch64.c	lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h	2021-01-01 14:11:01 +01:00
vp9itxfm_16bpp_neon.S	aarch64: vp9 16bpp: Fix assembling with Xcode 6.2 and older	2017-06-21 09:08:14 +03:00
vp9itxfm_neon.S	aarch64: vp9: Fix assembling with Xcode 6.2 and older	2017-06-21 09:08:13 +03:00
vp9lpf_16bpp_neon.S	lavc/aarch64: move transpose_4x8H to neon.S	2021-08-21 00:06:26 +03:00
vp9lpf_neon.S	aarch64: vp9lpf: Use dup+rev16+uzp1 instead of dup+lsr+dup+trn1	2017-03-11 13:14:50 +02:00
vp9mc_16bpp_neon.S	lavc/aarch64: Move non-neon vp9 copy functions out of neon source file.	2020-03-11 14:16:40 +01:00
vp9mc_aarch64.S	lavc/aarch64: Fix suffix of new file vp9mc_aarch64.	2020-03-11 14:29:22 +01:00
vp9mc_neon.S	lavc/aarch64: Move non-neon vp9 copy functions out of neon source file.	2020-03-11 14:16:40 +01:00