ffmpeg/libavcodec/x86
Andreas Rheinhardt 79080a547a avcodec/x86/h264_chromamc: Use xmm regs in chroma_mc4 SSSE3 functions
Doubling the register size allowed to avoid two pmaddubsw.
It is also ABI compliant (the old version lacked an emms)
and the average versions no longer rely on padding (the old versions
used pavgb with a memory operand reading eight bytes,
although only four are needed).

Old benchmarks (the latter four refer to RV40):
avg_h264_chroma_mc4_8_c:                               145.7 ( 1.00x)
avg_h264_chroma_mc4_8_ssse3:                            32.3 ( 4.51x)
put_h264_chroma_mc4_8_c:                               136.1 ( 1.00x)
put_h264_chroma_mc4_8_ssse3:                            29.0 ( 4.70x)
avg_chroma_mc4_c:                                      162.1 ( 1.00x)
avg_chroma_mc4_ssse3:                                   31.1 ( 5.22x)
put_chroma_mc4_c:                                      137.5 ( 1.00x)
put_chroma_mc4_ssse3:                                   28.6 ( 4.81x)

New benchmarks:
avg_h264_chroma_mc4_8_c:                               146.7 ( 1.00x)
avg_h264_chroma_mc4_8_ssse3:                            26.5 ( 5.53x)
put_h264_chroma_mc4_8_c:                               136.8 ( 1.00x)
put_h264_chroma_mc4_8_ssse3:                            22.5 ( 6.09x)
avg_chroma_mc4_c:                                      165.5 ( 1.00x)
avg_chroma_mc4_ssse3:                                   27.2 ( 6.08x)
put_chroma_mc4_c:                                      138.1 ( 1.00x)
put_chroma_mc4_ssse3:                                   23.2 ( 5.96x)

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-06 02:16:28 +01:00
..
h26x x86/vvcdec: sao, add avx2 support 2025-05-14 20:55:39 +08:00
hevc avcodec/x86/hevc/add_res: Avoid unnecessary modification 2025-11-02 09:46:15 +01:00
vvc avcodec/x86/vvc/sao_10bit: Remove unused functions 2025-09-26 06:21:26 +02:00
aacencdsp.asm x86: aacencdsp: Fix negating signed values in aac_quantize_bands 2025-02-10 14:03:24 +02:00
aacencdsp_init.c x86/aacencdsp: add AVX version of quantize_bands 2024-06-09 12:29:49 -03:00
aacpsdsp.asm x86: replace explicit REP_RETs with RETs 2023-02-01 04:23:55 +01:00
aacpsdsp_init.c avcodec/aacpsdsp: add restrict to function pointers to match declarations 2025-10-25 01:01:14 +02:00
ac3dsp.asm x86/ac3dsp: clear the upper 32 bits for input arguments where needed 2024-04-08 13:45:58 -03:00
ac3dsp_downmix.asm
ac3dsp_init.c x86/ac3dsp: add ff_float_to_fixed24_avx() 2023-11-25 21:50:56 -03:00
alacdsp.asm x86: replace explicit REP_RETs with RETs 2023-02-01 04:23:55 +01:00
alacdsp_init.c
apv_dsp.asm avcodec/x86/apv_dsp: Don't export arrays unnecessarily 2025-09-24 01:21:32 +00:00
apv_dsp_init.c lavc/apv: AVX2 transquant for x86-64 2025-04-27 15:52:30 +01:00
audiodsp.asm x86: replace explicit REP_RETs with RETs 2023-02-01 04:23:55 +01:00
audiodsp_init.c avcodec/x86/audiodsp: add scalarproduct avx2 2022-09-13 17:43:16 +02:00
blockdsp.asm x86/blockdsp: add sse2 and avx2 versions of fill_block_tab 2024-05-08 21:13:23 -03:00
blockdsp_init.c x86/blockdsp: add sse2 and avx2 versions of fill_block_tab 2024-05-08 21:13:23 -03:00
bswapdsp.asm
bswapdsp_init.c
cabac.h get_cabac_inline_x86: Don't inline the assembly function on 32 bit 2023-04-02 00:34:10 +03:00
cavs_qpel.asm avcodec/x86/cavs_qpel: Add SSE2 vertical motion compensation 2025-10-08 20:40:08 +02:00
cavsdsp.c avcodec/x86/fpel: Port ff_put_pixels8_mmx() to SSE2 2025-10-17 13:27:56 +02:00
cavsidct.asm avcodec/x86/cavsdsp: Remove obsolete MMX(EXT), 3dnow functions 2022-06-22 13:31:40 +02:00
celt_pvq_init.c lavc/opus*: move to opus/ subdir 2024-09-02 11:56:53 +02:00
celt_pvq_search.asm all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
cfhddsp.asm
cfhddsp_init.c
cfhdencdsp.asm
cfhdencdsp_init.c avcodec/cfhdencdsp: Constify input pointers 2022-07-31 03:18:19 +02:00
constants.c avcodec/x86/cavs_qpel: Add SSE2 vertical motion compensation 2025-10-08 20:40:08 +02:00
constants.h avcodec/x86/cavs_qpel: Add SSE2 vertical motion compensation 2025-10-08 20:40:08 +02:00
dcadsp.asm avcodec/x86/dcadsp: Remove obsolete SSE function 2022-06-22 13:39:44 +02:00
dcadsp_init.c avcodec/dcadsp: constify lfe_samples parameter 2025-10-04 14:18:30 -03:00
dct32.asm avcodec/x86/dct32: Remove obsolete SSE function 2022-06-22 13:39:06 +02:00
dirac_dwt.asm all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
dirac_dwt_init.c avcodec/dirac_dwt: Avoid conversions between function pointers and void* 2022-09-28 23:37:12 +02:00
diracdsp.asm avcodec/x86/cavs_qpel: Add SSE2 vertical motion compensation 2025-10-08 20:40:08 +02:00
diracdsp_init.c avcodec/x86/diracdsp_init: remove unused macro 2024-11-15 13:46:05 -05:00
dnxhdenc.asm
dnxhdenc_init.c avcodec/pixblockdsp: be consistent about restrict use in ff_{get,diff}_pixels 2025-10-25 01:01:15 +02:00
exrdsp.asm
exrdsp_init.c
fdct.c avcodec/x86/fdct: Remove obsolete comment 2025-11-04 11:41:32 +01:00
fdct.h avcodec/x86/fdct: Remove obsolete MMX(EXT) functions 2022-06-22 13:30:59 +02:00
fdctdsp_init.c avcodec/x86/fdct: guard usage of undefined functions with preprocessor 2025-07-25 21:10:16 +02:00
flac_dsp_gpl.asm
flacdsp.asm x86/flacdsp: remove unused parameters to pmacsdql macro 2024-05-13 12:18:38 -03:00
flacdsp_init.c x86/flacdsp: add an SSE4 version of wasted33 2024-05-13 12:18:10 -03:00
flacencdsp_init.c avcodec/flacdsp: Split encoder-only parts into a ctx of its own 2022-08-05 03:28:45 +02:00
fmtconvert.asm all: fix whitespace/new-line issues 2025-08-03 13:48:47 +02:00
fmtconvert_init.c avcodec/fmtconvert: Remove unused AVCodecContext parameter 2022-09-21 20:26:40 +02:00
fpel.asm avcodec/x86/fpel: Port ff_put_pixels8_mmx() to SSE2 2025-10-17 13:27:56 +02:00
fpel.h avcodec/x86/fpel: Port ff_put_pixels8_mmx() to SSE2 2025-10-17 13:27:56 +02:00
g722dsp.asm
g722dsp_init.c
h263_loopfilter.asm avcodec/x86/h263_loopfilter: Port loop filter to SSE2 2025-10-03 17:05:46 +00:00
h263dsp_init.c avcodec/x86/h263_loopfilter: Port loop filter to SSE2 2025-10-03 17:05:46 +00:00
h264_cabac.c
h264_chromamc.asm avcodec/x86/h264_chromamc: Use xmm regs in chroma_mc4 SSSE3 functions 2025-11-06 02:16:28 +01:00
h264_chromamc_10bit.asm x86: replace explicit REP_RETs with RETs 2023-02-01 04:23:55 +01:00
h264_deblock.asm avcodec/x86/h264dsp_init: Remove obsolete MMX(EXT) functions 2022-06-22 13:32:47 +02:00
h264_deblock_10bit.asm x86: replace explicit REP_RETs with RETs 2023-02-01 04:23:55 +01:00
h264_idct.asm avcodec/x86/h264_idct: Fix incorrect xmm spilling on win64 2024-03-25 21:17:47 +01:00
h264_idct_10bit.asm x86: replace explicit REP_RETs with RETs 2023-02-01 04:23:55 +01:00
h264_intrapred.asm x86: Update x86inc.asm 2024-03-24 14:53:57 +01:00
h264_intrapred_10bit.asm x86: replace explicit REP_RETs with RETs 2023-02-01 04:23:55 +01:00
h264_intrapred_init.c x86/h264_pred: Convert ff_pred8x8_vertical_8_mmx to ff_pred8x8_vertical_8_sse2 2024-02-13 21:17:06 +00:00
h264_qpel.c avcodec/x86/h264_qpel: Add and use ff_{avg,put}_pixels16x16_l2_sse2() 2025-11-01 15:17:05 +01:00
h264_qpel_8bit.asm avcodec/x86/qpel: Add specializations for put_l2 functions 2025-11-01 15:17:05 +01:00
h264_qpel_10bit.asm avcodec/x86/h264_qpel_10bit: Remove SSE2 "cache64" duplicates 2025-10-04 07:06:33 +02:00
h264_weight.asm x86/h264_weight: don't do arithmetic right shift of a 32bit values in 64bit registers 2024-09-01 15:43:18 -03:00
h264_weight_10bit.asm x86: replace explicit REP_RETs with RETs 2023-02-01 04:23:55 +01:00
h264chroma_init.c avcodec/x86/h264_chromamc: Remove MMX(EXT) funcs overridden by SSSE3 2025-11-01 13:34:23 +01:00
h264dsp_init.c avcodec/x86/h264dsp_init: Remove obsolete MMX(EXT) functions 2022-06-22 13:32:47 +02:00
hpeldsp.asm avcodec/x86/hpeldsp: Don't use PAVGB macro 2025-11-02 12:05:52 +01:00
hpeldsp.h avcodec/x86/rv40dsp_init: Remove MMX(EXT) funcs overridden by SSSE3 2025-09-26 06:21:23 +02:00
hpeldsp_init.c avcodec/x86/fpel: Port ff_put_pixels8_mmx() to SSE2 2025-10-17 13:27:56 +02:00
huffyuvdsp.asm x86: replace explicit REP_RETs with RETs 2023-02-01 04:23:55 +01:00
huffyuvdsp_init.c avcodec/x86/huffyuvdsp: Remove obsolete MMX functions 2022-06-22 13:40:10 +02:00
huffyuvdsp_template.asm
huffyuvencdsp.asm avcodec/x86/huffyuvencdsp: Remove obsolete MMX function 2022-06-22 13:40:36 +02:00
huffyuvencdsp_init.c avcodec/huffyuvencdsp: Pass pix_fmt directly when initing dsp 2022-10-09 09:15:39 +02:00
idctdsp.asm avcodec/x86/idctdsp: Remove obsolete MMX(EXT) functions 2022-06-22 13:33:27 +02:00
idctdsp.h avcodec/x86/idctdsp: add restrict to match function pointer types 2025-10-25 01:01:15 +02:00
idctdsp_init.c avcodec/x86/mpegvideoenc_template: Disable dead code 2023-09-15 13:08:55 +02:00
imdct36.asm avcodec/x86/mpegaudiodsp: Remove obsolete SSE function 2022-06-22 13:38:52 +02:00
jpeg2000dsp.asm x86: replace explicit REP_RETs with RETs 2023-02-01 04:23:55 +01:00
jpeg2000dsp_init.c
lossless_audiodsp.asm x86/: clear the high bits for order in scalarproduct_and_madd functions 2023-11-22 14:18:42 -03:00
lossless_audiodsp_init.c avcodec/x86/lossless_audiodsp: Remove obsolete MMXEXT function 2022-06-22 13:34:06 +02:00
lossless_videodsp.asm x86: replace explicit REP_RETs with RETs 2023-02-01 04:23:55 +01:00
lossless_videodsp_init.c avcodec/x86/lossless_videodsp: Remove obsolete MMX(EXT) functions 2022-06-22 13:41:02 +02:00
lossless_videoencdsp.asm x86: replace explicit REP_RETs with RETs 2023-02-01 04:23:55 +01:00
lossless_videoencdsp_init.c avcodec/lossless_videoencdsp: Constify src sub_left_predict 2022-07-31 03:16:35 +02:00
lpc.asm x86/lpc: remove HAVE_AVX2_EXTERNAL checks 2024-10-06 01:32:49 +02:00
lpc_init.c avcodec/lpc: use ptrdiff_t for length parameters 2022-09-22 18:17:26 -03:00
Makefile avcodec/x86/Makefile: Don't use MMX-OBJS for fdct.o 2025-11-04 11:41:29 +01:00
mathops.h avcodec/x86/mathops: clip constants used with shift instructions within inline assembly 2023-07-20 16:51:53 -03:00
me_cmp.asm avcodec/x86/me_cmp: Remove MMX sse functions 2025-11-04 11:41:11 +01:00
me_cmp_init.c avcodec/x86/me_cmp: Remove MMX sse functions 2025-11-04 11:41:11 +01:00
mlpdsp.asm
mlpdsp_init.c
mpeg4videodsp.c avcodec/mpegvideodsp: Make MpegVideoDSP MPEG-4 only 2022-10-20 07:56:17 +02:00
mpegaudiodsp.c avcodec/mpegaudiodsp: Init dct32 directly 2023-10-01 01:53:32 +02:00
mpegvideo.c avcodec/mpegvideo: Add missing headers 2025-07-03 20:35:31 +02:00
mpegvideoenc.c avcodec/mpegvideo: Add missing headers 2025-07-03 20:35:31 +02:00
mpegvideoenc_template.c avcodec/mpegvideo: Add missing headers 2025-07-03 20:35:31 +02:00
mpegvideoencdsp.asm all: fix whitespace/new-line issues 2025-08-03 13:48:47 +02:00
mpegvideoencdsp_init.c avcodec/x86/mpegvideoencdsp_init: Fix left shift of negative number 2025-10-21 12:11:55 +02:00
opusdsp.asm opusdsp: add ability to modify deemphasis constant 2024-04-27 11:12:07 +02:00
opusdsp_init.c lavc/opus*: move to opus/ subdir 2024-09-02 11:56:53 +02:00
pixblockdsp.asm avcodec/x86/pixblockdsp: Remove obsolete MMX functions 2022-06-22 13:33:54 +02:00
pixblockdsp_init.c avcodec/pixblockdsp: be consistent about restrict use in ff_{get,diff}_pixels 2025-10-25 01:01:15 +02:00
pngdsp.asm avcodec/x86/pngdsp: add missing emms at the end of add_png_paeth_prediction 2025-09-15 22:18:52 -03:00
pngdsp_init.c avcodec/x86/pngdsp: Remove obsolete ff_add_bytes_l2_mmx() 2022-07-25 16:00:57 +02:00
proresdsp.asm
proresdsp_init.c avcodec/proresdsp: Pass necessary parameter directly 2023-09-11 00:26:34 +02:00
qpel.asm avcodec/x86/h264_qpel: Add and use ff_{avg,put}_pixels16x16_l2_sse2() 2025-11-01 15:17:05 +01:00
qpel.h avcodec/x86/h264_qpel: Add and use ff_{avg,put}_pixels16x16_l2_sse2() 2025-11-01 15:17:05 +01:00
qpeldsp.asm avcodec/x86/qpel: Add specializations for put_l2 functions 2025-11-01 15:17:05 +01:00
qpeldsp_init.c avcodec/x86/qpel: Add specializations for put_l2 functions 2025-11-01 15:17:05 +01:00
rv34dsp.asm x86: Avoid using 'd' as an argument name 2024-03-24 14:53:57 +01:00
rv34dsp_init.c avcodec/x86/rv34dsp: Remove obsolete MMX function 2022-06-22 13:39:31 +02:00
rv40dsp.asm x86: Update x86inc.asm 2024-03-24 14:53:57 +01:00
rv40dsp_init.c avcodec/x86/h264_chromamc: Remove MMX(EXT) funcs overridden by SSSE3 2025-11-01 13:34:23 +01:00
sbcdsp.asm
sbcdsp_init.c
sbrdsp.asm x86: Update x86inc.asm 2024-03-24 14:53:57 +01:00
sbrdsp_init.c avcodec/x86/sbrdsp: Remove obsolete SSE function 2022-06-22 13:33:01 +02:00
simple_idct.asm avcodec/x86/rv40dsp, simple_idct: Remove remnants of MMX 2024-03-02 02:54:12 +01:00
simple_idct.h
simple_idct10.asm
simple_idct10_template.asm
snowdsp.c avcodec/snow: Move dsp helper functions to snow_dwt.h 2023-10-02 12:23:16 +02:00
svq1enc.asm avcodec/x86/svq1enc: Remove obsolete MMXEXT function 2022-06-22 13:34:19 +02:00
svq1enc_init.c avcodec/svq1enc: Add SVQ1EncDSPContext, make codec context private 2022-10-14 16:14:24 +02:00
synth_filter.asm avcodec/x86/synth_filter: Remove obsolete SSE function 2022-06-22 13:39:18 +02:00
synth_filter_init.c dca_core: convert to lavu/tx 2022-11-06 14:39:36 +01:00
takdsp.asm x86/takdsp: add missing wrappers to AVX2 functions 2023-12-25 22:31:15 -03:00
takdsp_init.c x86/takdsp: add avx2 versions of all functions 2023-12-23 08:39:22 -03:00
ttadsp.asm
ttadsp_init.c
ttaencdsp.asm
ttaencdsp_init.c
utvideodsp.asm x86: replace explicit REP_RETs with RETs 2023-02-01 04:23:55 +01:00
utvideodsp_init.c
v210-init.c avcodec/x86: add avx512icl function for v210dec 2022-12-20 15:02:45 +01:00
v210.asm x86: replace explicit REP_RETs with RETs 2023-02-01 04:23:55 +01:00
v210enc.asm avcodec/x86/v210enc: change '0b' binary constant prefix to 'b' suffix 2022-12-03 16:44:24 +01:00
v210enc_init.c avcodec/v210enc: add new 10-bit function for avx512 avx512icl 2022-12-01 18:19:03 +01:00
vc1dsp.h avcodec/x86/vc1dsp: add missing header for HAVE_6REGS 2025-08-14 00:08:10 +00:00
vc1dsp_init.c avcodec/x86/h264_chromamc: Remove MMX(EXT) funcs overridden by SSSE3 2025-11-01 13:34:23 +01:00
vc1dsp_loopfilter.asm avcodec/x86/vc1dsp_init: Remove obsolete 3dnow, MMX(EXT) functions 2022-06-22 13:28:57 +02:00
vc1dsp_mc.asm x86: replace explicit REP_RETs with RETs 2023-02-01 04:23:55 +01:00
vc1dsp_mmx.c
videodsp.asm lavc/x86/videodsp: Drop MMX usage 2024-12-01 13:26:34 +08:00
videodsp_init.c lavc/x86/videodsp: Drop MMX usage 2024-12-01 13:26:34 +08:00
vorbisdsp.asm avcodec/x86/vorbisdsp: Remove obsolete 3dnow functions 2022-06-22 13:37:10 +02:00
vorbisdsp_init.c lavc/vorbisdsp: use ptrdiff_t rather than intptr_t 2022-09-19 13:51:00 -03:00
vp3dsp.asm avcodec/x86/vp3dsp: Remove remnants of MMX 2025-11-02 12:01:52 +01:00
vp3dsp_init.c avcodec/vp3dsp: Remove unused flags parameter from ff_vp3dsp_init() 2025-10-13 18:59:24 +02:00
vp6dsp.asm vp6dsp: Remove MMX code 2024-02-13 20:47:16 +00:00
vp6dsp_init.c avcodec/x86/vp6dsp: Remove obsolete MMX ff_vp6_filter_diag4_mmx 2022-06-22 13:38:40 +02:00
vp8dsp.asm x86: Update x86inc.asm 2024-03-24 14:53:57 +01:00
vp8dsp_init.c avcodec/vp8dsp: Constify src in vp8_mc_func 2022-09-11 20:57:51 +02:00
vp8dsp_loopfilter.asm avcodec/x86/vp8dsp: Remove obsolete MMX(EXT) functions 2022-06-22 13:39:57 +02:00
vp9dsp_init.c vp9: Remove 8bpc AVX asm for inverse transforms 2025-09-19 23:12:59 +00:00
vp9dsp_init.h vp9: Add AVX-512ICL asm for 8bpc subpel mc 2025-08-28 12:45:52 +00:00
vp9dsp_init_10bpp.c
vp9dsp_init_12bpp.c
vp9dsp_init_16bpp.c avcodec/vp9: ipred_hd_16x16_16 avx2 implementation 2022-05-31 08:07:57 -04:00
vp9dsp_init_16bpp_template.c avcodec/x86/vp9: Add AVX-512ICL for 16x16 and 32x32 10bpc inverse transforms 2025-05-26 15:26:11 +02:00
vp9intrapred.asm vp9: Add 8bpc intra prediction AVX2 asm 2025-09-01 13:54:52 +00:00
vp9intrapred_16bpp.asm avcodec/vp9: ipred_hd_16x16_16 avx2 implementation 2022-05-31 08:07:57 -04:00
vp9itxfm.asm vp9: Remove 8bpc AVX asm for inverse transforms 2025-09-19 23:12:59 +00:00
vp9itxfm_16bpp.asm x86: Update x86inc.asm 2024-03-24 14:53:57 +01:00
vp9itxfm_16bpp_avx512.asm all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
vp9itxfm_avx2.asm vp9: Add 8bpc AVX2 asm for inverse transforms 2025-09-19 23:12:59 +00:00
vp9itxfm_avx512.asm avcodec/x86/vp9: Add AVX-512ICL for 16x16 and 32x32 8bpc inverse transforms 2025-05-19 15:56:27 +02:00
vp9itxfm_template.asm
vp9lpf.asm
vp9lpf_16bpp.asm
vp9mc.asm vp9: Add AVX-512ICL asm for 8bpc subpel mc 2025-08-28 12:45:52 +00:00
vp9mc_16bpp.asm avcodec/x86/constants: add pd_64 2025-04-25 23:20:58 -03:00
vpx_arith.h avcodec/vp56: Move VP5-9 range coder functions to a header of their own 2022-07-28 03:49:54 +02:00
w64xmmtest.c
xvididct.asm avcodec/x86/xvididct: Remove obsolete MMX(EXT) functions 2022-06-22 13:33:14 +02:00
xvididct.h avcodec/x86/xvididct: Remove obsolete MMX(EXT) functions 2022-06-22 13:33:14 +02:00
xvididct_init.c avcodec/{x86,mips}/xvididct_init: Remove redundant checks 2025-05-16 01:37:35 +02:00