ffmpeg/libavcodec/x86
Andreas Rheinhardt 79080a547a avcodec/x86/h264_chromamc: Use xmm regs in chroma_mc4 SSSE3 functions
Doubling the register size allowed to avoid two pmaddubsw.
It is also ABI compliant (the old version lacked an emms)
and the average versions no longer rely on padding (the old versions
used pavgb with a memory operand reading eight bytes,
although only four are needed).

Old benchmarks (the latter four refer to RV40):
avg_h264_chroma_mc4_8_c:                               145.7 ( 1.00x)
avg_h264_chroma_mc4_8_ssse3:                            32.3 ( 4.51x)
put_h264_chroma_mc4_8_c:                               136.1 ( 1.00x)
put_h264_chroma_mc4_8_ssse3:                            29.0 ( 4.70x)
avg_chroma_mc4_c:                                      162.1 ( 1.00x)
avg_chroma_mc4_ssse3:                                   31.1 ( 5.22x)
put_chroma_mc4_c:                                      137.5 ( 1.00x)
put_chroma_mc4_ssse3:                                   28.6 ( 4.81x)

New benchmarks:
avg_h264_chroma_mc4_8_c:                               146.7 ( 1.00x)
avg_h264_chroma_mc4_8_ssse3:                            26.5 ( 5.53x)
put_h264_chroma_mc4_8_c:                               136.8 ( 1.00x)
put_h264_chroma_mc4_8_ssse3:                            22.5 ( 6.09x)
avg_chroma_mc4_c:                                      165.5 ( 1.00x)
avg_chroma_mc4_ssse3:                                   27.2 ( 6.08x)
put_chroma_mc4_c:                                      138.1 ( 1.00x)
put_chroma_mc4_ssse3:                                   23.2 ( 5.96x)

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-06 02:16:28 +01:00
..
h26x x86/vvcdec: sao, add avx2 support 2025-05-14 20:55:39 +08:00
hevc avcodec/x86/hevc/add_res: Avoid unnecessary modification 2025-11-02 09:46:15 +01:00
vvc avcodec/x86/vvc/sao_10bit: Remove unused functions 2025-09-26 06:21:26 +02:00
aacencdsp.asm x86: aacencdsp: Fix negating signed values in aac_quantize_bands 2025-02-10 14:03:24 +02:00
aacencdsp_init.c x86/aacencdsp: add AVX version of quantize_bands 2024-06-09 12:29:49 -03:00
aacpsdsp.asm
aacpsdsp_init.c avcodec/aacpsdsp: add restrict to function pointers to match declarations 2025-10-25 01:01:14 +02:00
ac3dsp.asm x86/ac3dsp: clear the upper 32 bits for input arguments where needed 2024-04-08 13:45:58 -03:00
ac3dsp_downmix.asm
ac3dsp_init.c x86/ac3dsp: add ff_float_to_fixed24_avx() 2023-11-25 21:50:56 -03:00
alacdsp.asm
alacdsp_init.c
apv_dsp.asm avcodec/x86/apv_dsp: Don't export arrays unnecessarily 2025-09-24 01:21:32 +00:00
apv_dsp_init.c lavc/apv: AVX2 transquant for x86-64 2025-04-27 15:52:30 +01:00
audiodsp.asm
audiodsp_init.c
blockdsp.asm x86/blockdsp: add sse2 and avx2 versions of fill_block_tab 2024-05-08 21:13:23 -03:00
blockdsp_init.c x86/blockdsp: add sse2 and avx2 versions of fill_block_tab 2024-05-08 21:13:23 -03:00
bswapdsp.asm
bswapdsp_init.c
cabac.h
cavs_qpel.asm avcodec/x86/cavs_qpel: Add SSE2 vertical motion compensation 2025-10-08 20:40:08 +02:00
cavsdsp.c avcodec/x86/fpel: Port ff_put_pixels8_mmx() to SSE2 2025-10-17 13:27:56 +02:00
cavsidct.asm
celt_pvq_init.c lavc/opus*: move to opus/ subdir 2024-09-02 11:56:53 +02:00
celt_pvq_search.asm all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
cfhddsp.asm
cfhddsp_init.c
cfhdencdsp.asm
cfhdencdsp_init.c
constants.c avcodec/x86/cavs_qpel: Add SSE2 vertical motion compensation 2025-10-08 20:40:08 +02:00
constants.h avcodec/x86/cavs_qpel: Add SSE2 vertical motion compensation 2025-10-08 20:40:08 +02:00
dcadsp.asm
dcadsp_init.c avcodec/dcadsp: constify lfe_samples parameter 2025-10-04 14:18:30 -03:00
dct32.asm
dirac_dwt.asm all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
dirac_dwt_init.c
diracdsp.asm avcodec/x86/cavs_qpel: Add SSE2 vertical motion compensation 2025-10-08 20:40:08 +02:00
diracdsp_init.c avcodec/x86/diracdsp_init: remove unused macro 2024-11-15 13:46:05 -05:00
dnxhdenc.asm
dnxhdenc_init.c avcodec/pixblockdsp: be consistent about restrict use in ff_{get,diff}_pixels 2025-10-25 01:01:15 +02:00
exrdsp.asm
exrdsp_init.c
fdct.c avcodec/x86/fdct: Remove obsolete comment 2025-11-04 11:41:32 +01:00
fdct.h
fdctdsp_init.c avcodec/x86/fdct: guard usage of undefined functions with preprocessor 2025-07-25 21:10:16 +02:00
flac_dsp_gpl.asm
flacdsp.asm x86/flacdsp: remove unused parameters to pmacsdql macro 2024-05-13 12:18:38 -03:00
flacdsp_init.c x86/flacdsp: add an SSE4 version of wasted33 2024-05-13 12:18:10 -03:00
flacencdsp_init.c
fmtconvert.asm all: fix whitespace/new-line issues 2025-08-03 13:48:47 +02:00
fmtconvert_init.c
fpel.asm avcodec/x86/fpel: Port ff_put_pixels8_mmx() to SSE2 2025-10-17 13:27:56 +02:00
fpel.h avcodec/x86/fpel: Port ff_put_pixels8_mmx() to SSE2 2025-10-17 13:27:56 +02:00
g722dsp.asm
g722dsp_init.c
h263_loopfilter.asm avcodec/x86/h263_loopfilter: Port loop filter to SSE2 2025-10-03 17:05:46 +00:00
h263dsp_init.c avcodec/x86/h263_loopfilter: Port loop filter to SSE2 2025-10-03 17:05:46 +00:00
h264_cabac.c
h264_chromamc.asm avcodec/x86/h264_chromamc: Use xmm regs in chroma_mc4 SSSE3 functions 2025-11-06 02:16:28 +01:00
h264_chromamc_10bit.asm
h264_deblock.asm
h264_deblock_10bit.asm
h264_idct.asm avcodec/x86/h264_idct: Fix incorrect xmm spilling on win64 2024-03-25 21:17:47 +01:00
h264_idct_10bit.asm
h264_intrapred.asm x86: Update x86inc.asm 2024-03-24 14:53:57 +01:00
h264_intrapred_10bit.asm
h264_intrapred_init.c x86/h264_pred: Convert ff_pred8x8_vertical_8_mmx to ff_pred8x8_vertical_8_sse2 2024-02-13 21:17:06 +00:00
h264_qpel.c avcodec/x86/h264_qpel: Add and use ff_{avg,put}_pixels16x16_l2_sse2() 2025-11-01 15:17:05 +01:00
h264_qpel_8bit.asm avcodec/x86/qpel: Add specializations for put_l2 functions 2025-11-01 15:17:05 +01:00
h264_qpel_10bit.asm avcodec/x86/h264_qpel_10bit: Remove SSE2 "cache64" duplicates 2025-10-04 07:06:33 +02:00
h264_weight.asm x86/h264_weight: don't do arithmetic right shift of a 32bit values in 64bit registers 2024-09-01 15:43:18 -03:00
h264_weight_10bit.asm
h264chroma_init.c avcodec/x86/h264_chromamc: Remove MMX(EXT) funcs overridden by SSSE3 2025-11-01 13:34:23 +01:00
h264dsp_init.c
hpeldsp.asm avcodec/x86/hpeldsp: Don't use PAVGB macro 2025-11-02 12:05:52 +01:00
hpeldsp.h avcodec/x86/rv40dsp_init: Remove MMX(EXT) funcs overridden by SSSE3 2025-09-26 06:21:23 +02:00
hpeldsp_init.c avcodec/x86/fpel: Port ff_put_pixels8_mmx() to SSE2 2025-10-17 13:27:56 +02:00
huffyuvdsp.asm
huffyuvdsp_init.c
huffyuvdsp_template.asm
huffyuvencdsp.asm
huffyuvencdsp_init.c
idctdsp.asm
idctdsp.h avcodec/x86/idctdsp: add restrict to match function pointer types 2025-10-25 01:01:15 +02:00
idctdsp_init.c
imdct36.asm
jpeg2000dsp.asm
jpeg2000dsp_init.c
lossless_audiodsp.asm
lossless_audiodsp_init.c
lossless_videodsp.asm
lossless_videodsp_init.c
lossless_videoencdsp.asm
lossless_videoencdsp_init.c
lpc.asm x86/lpc: remove HAVE_AVX2_EXTERNAL checks 2024-10-06 01:32:49 +02:00
lpc_init.c
Makefile avcodec/x86/Makefile: Don't use MMX-OBJS for fdct.o 2025-11-04 11:41:29 +01:00
mathops.h
me_cmp.asm avcodec/x86/me_cmp: Remove MMX sse functions 2025-11-04 11:41:11 +01:00
me_cmp_init.c avcodec/x86/me_cmp: Remove MMX sse functions 2025-11-04 11:41:11 +01:00
mlpdsp.asm
mlpdsp_init.c
mpeg4videodsp.c
mpegaudiodsp.c
mpegvideo.c avcodec/mpegvideo: Add missing headers 2025-07-03 20:35:31 +02:00
mpegvideoenc.c avcodec/mpegvideo: Add missing headers 2025-07-03 20:35:31 +02:00
mpegvideoenc_template.c avcodec/mpegvideo: Add missing headers 2025-07-03 20:35:31 +02:00
mpegvideoencdsp.asm all: fix whitespace/new-line issues 2025-08-03 13:48:47 +02:00
mpegvideoencdsp_init.c avcodec/x86/mpegvideoencdsp_init: Fix left shift of negative number 2025-10-21 12:11:55 +02:00
opusdsp.asm opusdsp: add ability to modify deemphasis constant 2024-04-27 11:12:07 +02:00
opusdsp_init.c lavc/opus*: move to opus/ subdir 2024-09-02 11:56:53 +02:00
pixblockdsp.asm
pixblockdsp_init.c avcodec/pixblockdsp: be consistent about restrict use in ff_{get,diff}_pixels 2025-10-25 01:01:15 +02:00
pngdsp.asm avcodec/x86/pngdsp: add missing emms at the end of add_png_paeth_prediction 2025-09-15 22:18:52 -03:00
pngdsp_init.c
proresdsp.asm
proresdsp_init.c
qpel.asm avcodec/x86/h264_qpel: Add and use ff_{avg,put}_pixels16x16_l2_sse2() 2025-11-01 15:17:05 +01:00
qpel.h avcodec/x86/h264_qpel: Add and use ff_{avg,put}_pixels16x16_l2_sse2() 2025-11-01 15:17:05 +01:00
qpeldsp.asm avcodec/x86/qpel: Add specializations for put_l2 functions 2025-11-01 15:17:05 +01:00
qpeldsp_init.c avcodec/x86/qpel: Add specializations for put_l2 functions 2025-11-01 15:17:05 +01:00
rv34dsp.asm x86: Avoid using 'd' as an argument name 2024-03-24 14:53:57 +01:00
rv34dsp_init.c
rv40dsp.asm x86: Update x86inc.asm 2024-03-24 14:53:57 +01:00
rv40dsp_init.c avcodec/x86/h264_chromamc: Remove MMX(EXT) funcs overridden by SSSE3 2025-11-01 13:34:23 +01:00
sbcdsp.asm
sbcdsp_init.c
sbrdsp.asm x86: Update x86inc.asm 2024-03-24 14:53:57 +01:00
sbrdsp_init.c
simple_idct.asm avcodec/x86/rv40dsp, simple_idct: Remove remnants of MMX 2024-03-02 02:54:12 +01:00
simple_idct.h
simple_idct10.asm
simple_idct10_template.asm
snowdsp.c
svq1enc.asm
svq1enc_init.c
synth_filter.asm
synth_filter_init.c
takdsp.asm x86/takdsp: add missing wrappers to AVX2 functions 2023-12-25 22:31:15 -03:00
takdsp_init.c x86/takdsp: add avx2 versions of all functions 2023-12-23 08:39:22 -03:00
ttadsp.asm
ttadsp_init.c
ttaencdsp.asm
ttaencdsp_init.c
utvideodsp.asm
utvideodsp_init.c
v210-init.c
v210.asm
v210enc.asm
v210enc_init.c
vc1dsp.h avcodec/x86/vc1dsp: add missing header for HAVE_6REGS 2025-08-14 00:08:10 +00:00
vc1dsp_init.c avcodec/x86/h264_chromamc: Remove MMX(EXT) funcs overridden by SSSE3 2025-11-01 13:34:23 +01:00
vc1dsp_loopfilter.asm
vc1dsp_mc.asm
vc1dsp_mmx.c
videodsp.asm lavc/x86/videodsp: Drop MMX usage 2024-12-01 13:26:34 +08:00
videodsp_init.c lavc/x86/videodsp: Drop MMX usage 2024-12-01 13:26:34 +08:00
vorbisdsp.asm
vorbisdsp_init.c
vp3dsp.asm avcodec/x86/vp3dsp: Remove remnants of MMX 2025-11-02 12:01:52 +01:00
vp3dsp_init.c avcodec/vp3dsp: Remove unused flags parameter from ff_vp3dsp_init() 2025-10-13 18:59:24 +02:00
vp6dsp.asm vp6dsp: Remove MMX code 2024-02-13 20:47:16 +00:00
vp6dsp_init.c
vp8dsp.asm x86: Update x86inc.asm 2024-03-24 14:53:57 +01:00
vp8dsp_init.c
vp8dsp_loopfilter.asm
vp9dsp_init.c vp9: Remove 8bpc AVX asm for inverse transforms 2025-09-19 23:12:59 +00:00
vp9dsp_init.h vp9: Add AVX-512ICL asm for 8bpc subpel mc 2025-08-28 12:45:52 +00:00
vp9dsp_init_10bpp.c
vp9dsp_init_12bpp.c
vp9dsp_init_16bpp.c
vp9dsp_init_16bpp_template.c avcodec/x86/vp9: Add AVX-512ICL for 16x16 and 32x32 10bpc inverse transforms 2025-05-26 15:26:11 +02:00
vp9intrapred.asm vp9: Add 8bpc intra prediction AVX2 asm 2025-09-01 13:54:52 +00:00
vp9intrapred_16bpp.asm
vp9itxfm.asm vp9: Remove 8bpc AVX asm for inverse transforms 2025-09-19 23:12:59 +00:00
vp9itxfm_16bpp.asm x86: Update x86inc.asm 2024-03-24 14:53:57 +01:00
vp9itxfm_16bpp_avx512.asm all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
vp9itxfm_avx2.asm vp9: Add 8bpc AVX2 asm for inverse transforms 2025-09-19 23:12:59 +00:00
vp9itxfm_avx512.asm avcodec/x86/vp9: Add AVX-512ICL for 16x16 and 32x32 8bpc inverse transforms 2025-05-19 15:56:27 +02:00
vp9itxfm_template.asm
vp9lpf.asm
vp9lpf_16bpp.asm
vp9mc.asm vp9: Add AVX-512ICL asm for 8bpc subpel mc 2025-08-28 12:45:52 +00:00
vp9mc_16bpp.asm avcodec/x86/constants: add pd_64 2025-04-25 23:20:58 -03:00
vpx_arith.h
w64xmmtest.c
xvididct.asm
xvididct.h
xvididct_init.c avcodec/{x86,mips}/xvididct_init: Remove redundant checks 2025-05-16 01:37:35 +02:00