Commit graph

54081 commits

Author SHA1 Message Date
Gyan Doshi
4a2b643646 avcodec/mediacodecdec: declare correct class for audio decoders
The class for video decoders had been assigned till date.
2026-05-03 05:58:13 +00:00
Michael Niedermayer
23227a444d avcodec/wmaenc: Fix missing padding in extradata
Reported-by: Kenan Alghythee <kalghy2@uic.edu>
2026-05-03 02:36:54 +00:00
Michael Niedermayer
242ff799c7 avcodec/tdsc: remove double stride adjustment
Fixes: out of array access

Found-by: Seung Min Shin
Patch based on suggested fix by Seung Min Shin
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-02 23:11:24 +00:00
Michael Niedermayer
05817dc7dd avcodec/notchlc: Check 255 loops
Fixes: integer overflow

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-02 21:39:02 +00:00
Michael Niedermayer
bf4eb194cf avcodec/tdsc: Better input size check
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-02 21:13:01 +00:00
Michael Niedermayer
bb69a090a7 avcodec/tdsc: Check jpeg size
Fixes: out of array read
Fixes: tdsc_tile_dim_mismatch.avi

Found-by: Ante Silovic <asilovic155@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-02 21:13:01 +00:00
Michael Niedermayer
af87d77514 avcodec/tdsc: Prettier uncompress() check
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-02 21:13:01 +00:00
Michael Niedermayer
e9e6fb8798 avcodec/tdsc: Check tile_size
Fixes: out of array read
Fixes: tdsc_war_groom_far4096.avi

Found by: Ante Silovic <asilovic155@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-02 21:13:01 +00:00
Michael Niedermayer
9572ab7f45 avcodec/decode: Better documentation for ff_set_dimensions()
Clarify what is checked and that it avoids explicit generic overflow checks

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-02 21:11:47 +00:00
Kacper Michajłow
dba0b078c8 avcodec/vaapi_av1: reorder functions to avoid fwd decl 2026-05-01 23:59:06 +00:00
Kacper Michajłow
688f68bffa avcodec/vaapi_av1: fix leak of ref frames on init failure
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2026-05-01 23:59:06 +00:00
Leo Izen
739fc9249c
avcodec/libjxlenc: fix frame->linesize raw pointer read
These should say frame->linesize[0] as it does everywhere else this
variable is referenced. Fixes a typo bug.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:25 -04:00
Leo Izen
05b5add006
avcodec/libjxlenc: check orientation tag metadata before reading
We need to check that entry->count is nonzero and that entry->type is
AV_TIFF_SHORT before reading from the buffer, in case a maliciously
constructed IFD uses a zero-count or an unusual type (e.g. IFD) for it.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:25 -04:00
Leo Izen
f1cab2d018
avcodec/exif_internal.h: improve return docs for ff_exif_get_buffer
This commit improves the documentation for the return value of the
function ff_exif_get_buffer.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:25 -04:00
Leo Izen
087ec68451
avcodec/exif.c: synthesize EXIF data from frame metadata and matrix
If the displaymatrix is present, we should synthesize EXIF data from
the values there even if there is no EXIF attached to the frame.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:25 -04:00
Leo Izen
1d36c4d8ae
avcodec/exif.c: reset ifd->size when freeing ifd->entries
If we free ifd->entries then we need to set ifd->size to 0 so another
call to av_fast_realloc doesn't get confused.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:24 -04:00
Leo Izen
326808ad2f
avcodec/exif.c: add check for singular displaymatrix data
If av_exif_matrix_to_orientation returns 0, then the display matrix
is singular. In this case we should treat it as 1 and print a warning.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:24 -04:00
Leo Izen
317d660281
avcodec/exif.c: account for header_mode difference on rewrite
When determining if we need to rewrite the exif buffer or can pass
through as-is, account for a difference in header_mode requested from
the one that is used internally.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:24 -04:00
Leo Izen
4f5dfce5a8
avcodec/exif.c: use less than or equal for max width and height
The max width and height for PIXEL_X_TAG and PIXEL_Y_TAG is 0xFFFFu
because these are unsigned shorts, but we used < instead of <=
erroneously. Fix that.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:24 -04:00
Leo Izen
2cddfe7d0c
avcodec/exif.c: pop entry off IFD if allocation fails
In av_exif_set_entry, if cloning the entry fails because of an alloc
failed, then we remove the entry from the IFD. If that entry exists
in the middle of ifd->entries we need to shift everything to the left
which this commit implements.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:24 -04:00
Leo Izen
0c39b1bccd
avcodec/exif.h: fix documentation on av_exif_get_entry and similar
Add additional documentation to av_exif_get_entry and also to
av_exif_set_entry that was already part of the existing ABI but was
insufficiently documented before this commit. Also clarifies that
av_fast_realloc is used, instead of av_realloc on av_exif_set_entry.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:24 -04:00
Andreas Rheinhardt
cc3ca17127 avcodec/x86/qpeldsp{,_init}: Use proper prefix
E.g. rename ff_put_mpeg4_qpel8_h_lowpass_ssse3 to
ff_mpeg4_put_qpel8_h_lowpass_ssse3.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
ca43bc6202 avcodec/x86/qpeldsp_init: Mark functions as hidden
It allows pic 32bit code to call the underlying
assembly functions directly, without loading
the GOT first; this saves 1245B of .text here
(for 32bit pic code).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
23d3116af9 avcodec/x86/qpeldsp: Add combination of h_lowpass + l2
If the subpel part of the horizontal component of
the motion vector is 1/4 or 3/4, the MPEG-4 qpel motion compensation
first computes the mc for the corresponding motion vector
with 1/2 horizontal subpel part and then averages this
with the left (for 1/4) or the right (for 3/4) source pixel.
These two stages are currently performed in two different functions,
involving a stack buffer as intermediate.

This means that horizontal prediction for every function with
a 1/4 or 3/4 horizontal subpel mv is more expensive code-size wise
(and also performance-wise) as it involves two calls. Given that
the horizontal lowpass functions are not that long, adding combinations
of h_lowpass+l2 actually reduces binary size: An increase of 1136B
in the asm files is more than offset by size reductions in
the wrappers: 1968B here when not using stack protection,
2256B when using stack protection.

Of course it also improves performance. Old benchmarks:
avg_qpel_pixels_tab[0][1]_ssse3:                       106.9 ( 8.69x)
avg_qpel_pixels_tab[0][3]_ssse3:                       105.5 ( 8.84x)
avg_qpel_pixels_tab[0][5]_ssse3:                       226.9 ( 8.57x)
avg_qpel_pixels_tab[0][7]_ssse3:                       231.1 ( 8.38x)
avg_qpel_pixels_tab[0][9]_ssse3:                       217.8 ( 9.04x)
avg_qpel_pixels_tab[0][11]_ssse3:                      214.9 ( 9.32x)
avg_qpel_pixels_tab[0][13]_ssse3:                      227.1 ( 8.48x)
avg_qpel_pixels_tab[0][15]_ssse3:                      236.1 ( 8.02x)

New benchmarks:
avg_qpel_pixels_tab[0][1]_ssse3:                        96.7 ( 9.65x)
avg_qpel_pixels_tab[0][3]_ssse3:                        96.6 ( 9.73x)
avg_qpel_pixels_tab[0][5]_ssse3:                       225.8 ( 8.61x)
avg_qpel_pixels_tab[0][7]_ssse3:                       228.4 ( 8.51x)
avg_qpel_pixels_tab[0][9]_ssse3:                       217.1 ( 9.05x)
avg_qpel_pixels_tab[0][11]_ssse3:                      217.8 ( 9.32x)
avg_qpel_pixels_tab[0][13]_ssse3:                      227.2 ( 8.54x)
avg_qpel_pixels_tab[0][15]_ssse3:                      220.5 ( 8.72x)

Note: The l2 functions are also used for vertical lowpass
functions, yet given that they are much bigger, duplicating
them would lead to massive code size increase.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
f946cac2d9 avcodec/x86/qpeldsp: Remove horizontal mmxext mc functions
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
1d040c527d avcodec/x86/qpeldsp: Add SSSE3 size 8 horizontal filter
Beats the mmxext version by a lot (in the following,
[1][1-3] refers to horizontal-only size 8 mc;
the _sse2 comparators for the other cases use mmxext
horizontal mc coupled with vertical SSE2 mc):

avg_qpel_pixels_tab[1][1]_c:                           223.9 ( 1.00x)
avg_qpel_pixels_tab[1][1]_mmxext:                       66.2 ( 3.38x)
avg_qpel_pixels_tab[1][1]_ssse3:                        36.8 ( 6.08x)
avg_qpel_pixels_tab[1][2]_c:                           251.0 ( 1.00x)
avg_qpel_pixels_tab[1][2]_mmxext:                       58.5 ( 4.29x)
avg_qpel_pixels_tab[1][2]_ssse3:                        25.5 ( 9.84x)
avg_qpel_pixels_tab[1][3]_c:                           226.9 ( 1.00x)
avg_qpel_pixels_tab[1][3]_mmxext:                       66.3 ( 3.42x)
avg_qpel_pixels_tab[1][3]_ssse3:                        35.8 ( 6.34x)
avg_qpel_pixels_tab[1][5]_c:                           473.9 ( 1.00x)
avg_qpel_pixels_tab[1][5]_sse2:                        110.7 ( 4.28x)
avg_qpel_pixels_tab[1][5]_ssse3:                        76.0 ( 6.24x)
avg_qpel_pixels_tab[1][6]_c:                           440.9 ( 1.00x)
avg_qpel_pixels_tab[1][6]_sse2:                        102.1 ( 4.32x)
avg_qpel_pixels_tab[1][6]_ssse3:                        67.1 ( 6.58x)
avg_qpel_pixels_tab[1][7]_c:                           473.8 ( 1.00x)
avg_qpel_pixels_tab[1][7]_sse2:                        108.0 ( 4.39x)
avg_qpel_pixels_tab[1][7]_ssse3:                        74.6 ( 6.35x)
avg_qpel_pixels_tab[1][9]_c:                           492.9 ( 1.00x)
avg_qpel_pixels_tab[1][9]_sse2:                        102.1 ( 4.83x)
avg_qpel_pixels_tab[1][9]_ssse3:                        67.1 ( 7.35x)
avg_qpel_pixels_tab[1][10]_c:                          465.6 ( 1.00x)
avg_qpel_pixels_tab[1][10]_sse2:                        94.9 ( 4.91x)
avg_qpel_pixels_tab[1][10]_ssse3:                       57.5 ( 8.10x)
avg_qpel_pixels_tab[1][11]_c:                          492.8 ( 1.00x)
avg_qpel_pixels_tab[1][11]_sse2:                       102.4 ( 4.81x)
avg_qpel_pixels_tab[1][11]_ssse3:                       68.7 ( 7.17x)
avg_qpel_pixels_tab[1][13]_c:                          476.6 ( 1.00x)
avg_qpel_pixels_tab[1][13]_sse2:                       108.6 ( 4.39x)
avg_qpel_pixels_tab[1][13]_ssse3:                       74.7 ( 6.38x)
avg_qpel_pixels_tab[1][14]_c:                          434.9 ( 1.00x)
avg_qpel_pixels_tab[1][14]_sse2:                       102.2 ( 4.25x)
avg_qpel_pixels_tab[1][14]_ssse3:                       66.6 ( 6.53x)
avg_qpel_pixels_tab[1][15]_c:                          474.1 ( 1.00x)
avg_qpel_pixels_tab[1][15]_sse2:                       107.9 ( 4.39x)
avg_qpel_pixels_tab[1][15]_ssse3:                       74.3 ( 6.38x)
put_no_rnd_qpel_pixels_tab[1][1]_c:                    222.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][1]_mmxext:                66.0 ( 3.37x)
put_no_rnd_qpel_pixels_tab[1][1]_ssse3:                 35.2 ( 6.31x)
put_no_rnd_qpel_pixels_tab[1][2]_c:                    212.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][2]_mmxext:                56.8 ( 3.74x)
put_no_rnd_qpel_pixels_tab[1][2]_ssse3:                 25.0 ( 8.48x)
put_no_rnd_qpel_pixels_tab[1][3]_c:                    224.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][3]_mmxext:                65.8 ( 3.41x)
put_no_rnd_qpel_pixels_tab[1][3]_ssse3:                 35.8 ( 6.26x)
put_no_rnd_qpel_pixels_tab[1][5]_c:                    460.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][5]_sse2:                 114.6 ( 4.01x)
put_no_rnd_qpel_pixels_tab[1][5]_ssse3:                 83.1 ( 5.53x)
put_no_rnd_qpel_pixels_tab[1][6]_c:                    438.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][6]_sse2:                 104.2 ( 4.21x)
put_no_rnd_qpel_pixels_tab[1][6]_ssse3:                 67.5 ( 6.50x)
put_no_rnd_qpel_pixels_tab[1][7]_c:                    458.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][7]_sse2:                 113.8 ( 4.02x)
put_no_rnd_qpel_pixels_tab[1][7]_ssse3:                 79.9 ( 5.73x)
put_no_rnd_qpel_pixels_tab[1][9]_c:                    439.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][9]_sse2:                 103.7 ( 4.23x)
put_no_rnd_qpel_pixels_tab[1][9]_ssse3:                 68.9 ( 6.37x)
put_no_rnd_qpel_pixels_tab[1][10]_c:                   427.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][10]_sse2:                 93.2 ( 4.58x)
put_no_rnd_qpel_pixels_tab[1][10]_ssse3:                57.9 ( 7.37x)
put_no_rnd_qpel_pixels_tab[1][11]_c:                   439.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][11]_sse2:                104.0 ( 4.23x)
put_no_rnd_qpel_pixels_tab[1][11]_ssse3:                69.2 ( 6.36x)
put_no_rnd_qpel_pixels_tab[1][13]_c:                   459.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][13]_sse2:                113.2 ( 4.06x)
put_no_rnd_qpel_pixels_tab[1][13]_ssse3:                83.8 ( 5.48x)
put_no_rnd_qpel_pixels_tab[1][14]_c:                   439.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][14]_sse2:                103.3 ( 4.25x)
put_no_rnd_qpel_pixels_tab[1][14]_ssse3:                67.9 ( 6.47x)
put_no_rnd_qpel_pixels_tab[1][15]_c:                   453.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][15]_sse2:                113.7 ( 3.99x)
put_no_rnd_qpel_pixels_tab[1][15]_ssse3:                80.0 ( 5.67x)
put_qpel_pixels_tab[1][1]_c:                           229.0 ( 1.00x)
put_qpel_pixels_tab[1][1]_mmxext:                       65.5 ( 3.50x)
put_qpel_pixels_tab[1][1]_ssse3:                        33.8 ( 6.77x)
put_qpel_pixels_tab[1][2]_c:                           212.5 ( 1.00x)
put_qpel_pixels_tab[1][2]_mmxext:                       56.6 ( 3.75x)
put_qpel_pixels_tab[1][2]_ssse3:                        23.4 ( 9.08x)
put_qpel_pixels_tab[1][3]_c:                           227.5 ( 1.00x)
put_qpel_pixels_tab[1][3]_mmxext:                       64.4 ( 3.53x)
put_qpel_pixels_tab[1][3]_ssse3:                        33.5 ( 6.79x)
put_qpel_pixels_tab[1][5]_c:                           466.5 ( 1.00x)
put_qpel_pixels_tab[1][5]_sse2:                        106.8 ( 4.37x)
put_qpel_pixels_tab[1][5]_ssse3:                        71.8 ( 6.50x)
put_qpel_pixels_tab[1][6]_c:                           438.7 ( 1.00x)
put_qpel_pixels_tab[1][6]_sse2:                        102.0 ( 4.30x)
put_qpel_pixels_tab[1][6]_ssse3:                        65.3 ( 6.72x)
put_qpel_pixels_tab[1][7]_c:                           466.0 ( 1.00x)
put_qpel_pixels_tab[1][7]_sse2:                        106.3 ( 4.38x)
put_qpel_pixels_tab[1][7]_ssse3:                        70.9 ( 6.57x)
put_qpel_pixels_tab[1][9]_c:                           456.0 ( 1.00x)
put_qpel_pixels_tab[1][9]_sse2:                        100.1 ( 4.55x)
put_qpel_pixels_tab[1][9]_ssse3:                        64.0 ( 7.13x)
put_qpel_pixels_tab[1][10]_c:                          425.1 ( 1.00x)
put_qpel_pixels_tab[1][10]_sse2:                        92.6 ( 4.59x)
put_qpel_pixels_tab[1][10]_ssse3:                       55.1 ( 7.71x)
put_qpel_pixels_tab[1][11]_c:                          452.7 ( 1.00x)
put_qpel_pixels_tab[1][11]_sse2:                        99.6 ( 4.55x)
put_qpel_pixels_tab[1][11]_ssse3:                       63.8 ( 7.09x)
put_qpel_pixels_tab[1][13]_c:                          471.2 ( 1.00x)
put_qpel_pixels_tab[1][13]_sse2:                       106.4 ( 4.43x)
put_qpel_pixels_tab[1][13]_ssse3:                       71.4 ( 6.60x)
put_qpel_pixels_tab[1][14]_c:                          439.7 ( 1.00x)
put_qpel_pixels_tab[1][14]_sse2:                       101.8 ( 4.32x)
put_qpel_pixels_tab[1][14]_ssse3:                       64.8 ( 6.79x)
put_qpel_pixels_tab[1][15]_c:                          467.8 ( 1.00x)
put_qpel_pixels_tab[1][15]_sse2:                       106.1 ( 4.41x)
put_qpel_pixels_tab[1][15]_ssse3:                       72.6 ( 6.44x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
c0e1c1d6b3 avcodec/x86/qpeldsp: Add SSSE3 size 16 horizontal filter
Beats the mmxext version by a lot (in the following,
[0][1-3] refers to horizontal-only size 16 mc;
the _sse2 comparators for the other cases use mmxext
horizontal mc coupled with vertical SSE2 mc):

avg_qpel_pixels_tab[0][1]_c:                           945.5 ( 1.00x)
avg_qpel_pixels_tab[0][1]_mmxext:                      262.6 ( 3.60x)
avg_qpel_pixels_tab[0][1]_ssse3:                       110.4 ( 8.57x)
avg_qpel_pixels_tab[0][2]_c:                          1042.1 ( 1.00x)
avg_qpel_pixels_tab[0][2]_mmxext:                      245.1 ( 4.25x)
avg_qpel_pixels_tab[0][2]_ssse3:                        91.7 (11.37x)
avg_qpel_pixels_tab[0][3]_c:                           941.8 ( 1.00x)
avg_qpel_pixels_tab[0][3]_mmxext:                      260.1 ( 3.62x)
avg_qpel_pixels_tab[0][3]_ssse3:                       110.1 ( 8.56x)
avg_qpel_pixels_tab[0][5]_c:                          1939.5 ( 1.00x)
avg_qpel_pixels_tab[0][5]_sse2:                        394.3 ( 4.92x)
avg_qpel_pixels_tab[0][5]_ssse3:                       247.4 ( 7.84x)
avg_qpel_pixels_tab[0][6]_c:                          1785.8 ( 1.00x)
avg_qpel_pixels_tab[0][6]_sse2:                        380.6 ( 4.69x)
avg_qpel_pixels_tab[0][6]_ssse3:                       221.1 ( 8.08x)
avg_qpel_pixels_tab[0][7]_c:                          1932.5 ( 1.00x)
avg_qpel_pixels_tab[0][7]_sse2:                        393.4 ( 4.91x)
avg_qpel_pixels_tab[0][7]_ssse3:                       238.8 ( 8.09x)
avg_qpel_pixels_tab[0][9]_c:                          1976.9 ( 1.00x)
avg_qpel_pixels_tab[0][9]_sse2:                        380.8 ( 5.19x)
avg_qpel_pixels_tab[0][9]_ssse3:                       223.3 ( 8.85x)
avg_qpel_pixels_tab[0][10]_c:                         1911.9 ( 1.00x)
avg_qpel_pixels_tab[0][10]_sse2:                       366.9 ( 5.21x)
avg_qpel_pixels_tab[0][10]_ssse3:                      207.0 ( 9.24x)
avg_qpel_pixels_tab[0][11]_c:                         2046.9 ( 1.00x)
avg_qpel_pixels_tab[0][11]_sse2:                       385.5 ( 5.31x)
avg_qpel_pixels_tab[0][11]_ssse3:                      227.9 ( 8.98x)
avg_qpel_pixels_tab[0][13]_c:                         1940.8 ( 1.00x)
avg_qpel_pixels_tab[0][13]_sse2:                       389.7 ( 4.98x)
avg_qpel_pixels_tab[0][13]_ssse3:                      244.2 ( 7.95x)
avg_qpel_pixels_tab[0][14]_c:                         1778.4 ( 1.00x)
avg_qpel_pixels_tab[0][14]_sse2:                       379.2 ( 4.69x)
avg_qpel_pixels_tab[0][14]_ssse3:                      223.5 ( 7.96x)
avg_qpel_pixels_tab[0][15]_c:                         1905.9 ( 1.00x)
avg_qpel_pixels_tab[0][15]_sse2:                       398.9 ( 4.78x)
avg_qpel_pixels_tab[0][15]_ssse3:                      238.3 ( 8.00x)
put_no_rnd_qpel_pixels_tab[0][1]_c:                    922.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][1]_mmxext:               275.0 ( 3.35x)
put_no_rnd_qpel_pixels_tab[0][1]_ssse3:                108.4 ( 8.51x)
put_no_rnd_qpel_pixels_tab[0][2]_c:                    889.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][2]_mmxext:               236.7 ( 3.76x)
put_no_rnd_qpel_pixels_tab[0][2]_ssse3:                 86.8 (10.25x)
put_no_rnd_qpel_pixels_tab[0][3]_c:                    915.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][3]_mmxext:               274.3 ( 3.34x)
put_no_rnd_qpel_pixels_tab[0][3]_ssse3:                108.2 ( 8.46x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2:                 400.0 ( 4.63x)
put_no_rnd_qpel_pixels_tab[0][5]_ssse3:                246.0 ( 7.53x)
put_no_rnd_qpel_pixels_tab[0][6]_c:                   1753.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2:                 382.5 ( 4.59x)
put_no_rnd_qpel_pixels_tab[0][6]_ssse3:                226.4 ( 7.75x)
put_no_rnd_qpel_pixels_tab[0][7]_c:                   1854.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2:                 393.5 ( 4.71x)
put_no_rnd_qpel_pixels_tab[0][7]_ssse3:                248.6 ( 7.46x)
put_no_rnd_qpel_pixels_tab[0][9]_c:                   1794.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2:                 382.2 ( 4.70x)
put_no_rnd_qpel_pixels_tab[0][9]_ssse3:                228.0 ( 7.87x)
put_no_rnd_qpel_pixels_tab[0][10]_c:                  1724.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][10]_sse2:                353.8 ( 4.88x)
put_no_rnd_qpel_pixels_tab[0][10]_ssse3:               206.5 ( 8.35x)
put_no_rnd_qpel_pixels_tab[0][11]_c:                  1796.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2:                378.1 ( 4.75x)
put_no_rnd_qpel_pixels_tab[0][11]_ssse3:               227.1 ( 7.91x)
put_no_rnd_qpel_pixels_tab[0][13]_c:                  1834.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2:                400.7 ( 4.58x)
put_no_rnd_qpel_pixels_tab[0][13]_ssse3:               244.2 ( 7.51x)
put_no_rnd_qpel_pixels_tab[0][14]_c:                  1755.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2:                387.2 ( 4.53x)
put_no_rnd_qpel_pixels_tab[0][14]_ssse3:               226.8 ( 7.74x)
put_no_rnd_qpel_pixels_tab[0][15]_c:                  1847.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2:                400.6 ( 4.61x)
put_no_rnd_qpel_pixels_tab[0][15]_ssse3:               246.1 ( 7.51x)
put_qpel_pixels_tab[0][1]_c:                           919.6 ( 1.00x)
put_qpel_pixels_tab[0][1]_mmxext:                      255.5 ( 3.60x)
put_qpel_pixels_tab[0][1]_ssse3:                       108.3 ( 8.49x)
put_qpel_pixels_tab[0][2]_c:                           883.9 ( 1.00x)
put_qpel_pixels_tab[0][2]_mmxext:                      238.1 ( 3.71x)
put_qpel_pixels_tab[0][2]_ssse3:                        86.7 (10.19x)
put_qpel_pixels_tab[0][3]_c:                           921.9 ( 1.00x)
put_qpel_pixels_tab[0][3]_mmxext:                      258.9 ( 3.56x)
put_qpel_pixels_tab[0][3]_ssse3:                       108.1 ( 8.53x)
put_qpel_pixels_tab[0][5]_c:                          1907.5 ( 1.00x)
put_qpel_pixels_tab[0][5]_sse2:                        384.2 ( 4.96x)
put_qpel_pixels_tab[0][5]_ssse3:                       234.8 ( 8.13x)
put_qpel_pixels_tab[0][6]_c:                          1757.4 ( 1.00x)
put_qpel_pixels_tab[0][6]_sse2:                        382.8 ( 4.59x)
put_qpel_pixels_tab[0][6]_ssse3:                       217.6 ( 8.08x)
put_qpel_pixels_tab[0][7]_c:                          1927.5 ( 1.00x)
put_qpel_pixels_tab[0][7]_sse2:                        384.6 ( 5.01x)
put_qpel_pixels_tab[0][7]_ssse3:                       231.2 ( 8.34x)
put_qpel_pixels_tab[0][9]_c:                          1832.1 ( 1.00x)
put_qpel_pixels_tab[0][9]_sse2:                        374.8 ( 4.89x)
put_qpel_pixels_tab[0][9]_ssse3:                       219.4 ( 8.35x)
put_qpel_pixels_tab[0][10]_c:                         1710.3 ( 1.00x)
put_qpel_pixels_tab[0][10]_sse2:                       384.5 ( 4.45x)
put_qpel_pixels_tab[0][10]_ssse3:                      202.9 ( 8.43x)
put_qpel_pixels_tab[0][11]_c:                         1825.0 ( 1.00x)
put_qpel_pixels_tab[0][11]_sse2:                       369.6 ( 4.94x)
put_qpel_pixels_tab[0][11]_ssse3:                      216.8 ( 8.42x)
put_qpel_pixels_tab[0][13]_c:                         1898.4 ( 1.00x)
put_qpel_pixels_tab[0][13]_sse2:                       384.9 ( 4.93x)
put_qpel_pixels_tab[0][13]_ssse3:                      238.6 ( 7.96x)
put_qpel_pixels_tab[0][14]_c:                         1779.1 ( 1.00x)
put_qpel_pixels_tab[0][14]_sse2:                       373.3 ( 4.77x)
put_qpel_pixels_tab[0][14]_ssse3:                      218.1 ( 8.16x)
put_qpel_pixels_tab[0][15]_c:                         1918.2 ( 1.00x)
put_qpel_pixels_tab[0][15]_sse2:                       385.3 ( 4.98x)
put_qpel_pixels_tab[0][15]_ssse3:                      236.8 ( 8.10x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
a3d747f344 avcodec/x86/qpeldsp{,_init}: Use SSE2 pixels16x16_l2 functions
put and avg versions have been added and used in H264
in b91081274f. This commit
adds the size 16 version of put_no_rnd and uses all three
of them in the SSE2 size 16 qpel functions (i.e. it uses
them in the ones that have a vertical component); it also
removes the 16x17 MMXEXT versions (which are no longer used).

This is particularly beneficial for put_no_rnd:
avg_qpel_pixels_tab[0][5]_c:                          1910.9 ( 1.00x)
avg_qpel_pixels_tab[0][5]_sse2 (old):                  405.1 ( 4.72x)
avg_qpel_pixels_tab[0][5]_sse2:                        392.9 ( 4.86x)
avg_qpel_pixels_tab[0][6]_c:                          1778.9 ( 1.00x)
avg_qpel_pixels_tab[0][6]_sse2 (old):                  385.5 ( 4.61x)
avg_qpel_pixels_tab[0][6]_sse2:                        374.9 ( 4.75x)
avg_qpel_pixels_tab[0][7]_c:                          1935.3 ( 1.00x)
avg_qpel_pixels_tab[0][7]_sse2 (old):                  403.1 ( 4.80x)
avg_qpel_pixels_tab[0][7]_sse2:                        391.6 ( 4.94x)
avg_qpel_pixels_tab[0][9]_c:                          1969.0 ( 1.00x)
avg_qpel_pixels_tab[0][9]_sse2 (old):                  384.1 ( 5.13x)
avg_qpel_pixels_tab[0][9]_sse2:                        380.3 ( 5.18x)
avg_qpel_pixels_tab[0][11]_c:                         2014.9 ( 1.00x)
avg_qpel_pixels_tab[0][11]_sse2 (old):                 385.6 ( 5.23x)
avg_qpel_pixels_tab[0][11]_sse2:                       380.2 ( 5.30x)
avg_qpel_pixels_tab[0][13]_c:                         1925.7 ( 1.00x)
avg_qpel_pixels_tab[0][13]_sse2 (old):                 406.1 ( 4.74x)
avg_qpel_pixels_tab[0][13]_sse2:                       390.4 ( 4.93x)
avg_qpel_pixels_tab[0][14]_c:                         1793.0 ( 1.00x)
avg_qpel_pixels_tab[0][14]_sse2 (old):                 389.6 ( 4.60x)
avg_qpel_pixels_tab[0][14]_sse2:                       377.1 ( 4.75x)
avg_qpel_pixels_tab[0][15]_c:                         1913.0 ( 1.00x)
avg_qpel_pixels_tab[0][15]_sse2 (old):                 404.2 ( 4.73x)
avg_qpel_pixels_tab[0][15]_sse2:                       390.8 ( 4.89x)
put_no_rnd_qpel_pixels_tab[0][5]_c:                   1864.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2 (old):           425.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2:                 396.2 ( 4.71x)
put_no_rnd_qpel_pixels_tab[0][6]_c:                   1767.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2 (old):           388.4 ( 4.55x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2:                 377.7 ( 4.68x)
put_no_rnd_qpel_pixels_tab[0][7]_c:                   1874.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2 (old):           427.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2:                 400.0 ( 4.69x)
put_no_rnd_qpel_pixels_tab[0][9]_c:                   1759.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2 (old):           393.0 ( 4.48x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2:                 379.7 ( 4.63x)
put_no_rnd_qpel_pixels_tab[0][11]_c:                  1820.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2 (old):          392.7 ( 4.64x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2:                377.4 ( 4.82x)
put_no_rnd_qpel_pixels_tab[0][13]_c:                  1841.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2 (old):          427.1 ( 4.31x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2:                395.9 ( 4.65x)
put_no_rnd_qpel_pixels_tab[0][14]_c:                  1761.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2 (old):          392.3 ( 4.49x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2:                375.9 ( 4.69x)
put_no_rnd_qpel_pixels_tab[0][15]_c:                  1869.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2 (old):          425.6 ( 4.39x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2:                397.3 ( 4.70x)
put_qpel_pixels_tab[0][5]_c:                          1888.2 ( 1.00x)
put_qpel_pixels_tab[0][5]_sse2 (old):                  396.5 ( 4.76x)
put_qpel_pixels_tab[0][5]_sse2:                        382.5 ( 4.94x)
put_qpel_pixels_tab[0][6]_c:                          1760.4 ( 1.00x)
put_qpel_pixels_tab[0][6]_sse2 (old):                  377.0 ( 4.67x)
put_qpel_pixels_tab[0][6]_sse2:                        372.1 ( 4.73x)
put_qpel_pixels_tab[0][7]_c:                          1927.6 ( 1.00x)
put_qpel_pixels_tab[0][7]_sse2 (old):                  396.5 ( 4.86x)
put_qpel_pixels_tab[0][7]_sse2:                        383.4 ( 5.03x)
put_qpel_pixels_tab[0][9]_c:                          1775.9 ( 1.00x)
put_qpel_pixels_tab[0][9]_sse2 (old):                  377.9 ( 4.70x)
put_qpel_pixels_tab[0][9]_sse2:                        372.3 ( 4.77x)
put_qpel_pixels_tab[0][11]_c:                         1809.0 ( 1.00x)
put_qpel_pixels_tab[0][11]_sse2 (old):                 374.6 ( 4.83x)
put_qpel_pixels_tab[0][11]_sse2:                       380.3 ( 4.76x)
put_qpel_pixels_tab[0][13]_c:                         1893.2 ( 1.00x)
put_qpel_pixels_tab[0][13]_sse2 (old):                 399.2 ( 4.74x)
put_qpel_pixels_tab[0][13]_sse2:                       384.7 ( 4.92x)
put_qpel_pixels_tab[0][14]_c:                         1756.2 ( 1.00x)
put_qpel_pixels_tab[0][14]_sse2 (old):                 377.9 ( 4.65x)
put_qpel_pixels_tab[0][14]_sse2:                       374.4 ( 4.69x)
put_qpel_pixels_tab[0][15]_c:                         1922.8 ( 1.00x)
put_qpel_pixels_tab[0][15]_sse2 (old):                 399.0 ( 4.82x)
put_qpel_pixels_tab[0][15]_sse2:                       387.8 ( 4.96x)

The purely vertical size 16 mc functions now no longer use any MMX.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
dad0c01076 avcodec/x86/qpeldsp: Remove vertical MMXEXT mc functions
Superseded by SSE2.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
9beecb2670 avcodec/x86/qpeldsp: Add SSE2 vertical lowpass functions
Benchmarks ([4], [8] and [12] are pure vertical functions
and therefore show the biggest improvements):

avg_qpel_pixels_tab[0][4]_c:                           844.5 ( 1.00x)
avg_qpel_pixels_tab[0][4]_mmxext:                      225.5 ( 3.74x)
avg_qpel_pixels_tab[0][4]_sse2:                        146.6 ( 5.76x)
avg_qpel_pixels_tab[0][5]_c:                          1915.9 ( 1.00x)
avg_qpel_pixels_tab[0][5]_mmxext:                      499.6 ( 3.83x)
avg_qpel_pixels_tab[0][5]_sse2:                        405.5 ( 4.72x)
avg_qpel_pixels_tab[0][6]_c:                          1775.9 ( 1.00x)
avg_qpel_pixels_tab[0][6]_mmxext:                      484.9 ( 3.66x)
avg_qpel_pixels_tab[0][6]_sse2:                        385.4 ( 4.61x)
avg_qpel_pixels_tab[0][7]_c:                          1937.0 ( 1.00x)
avg_qpel_pixels_tab[0][7]_mmxext:                      501.3 ( 3.86x)
avg_qpel_pixels_tab[0][7]_sse2:                        403.6 ( 4.80x)
avg_qpel_pixels_tab[0][8]_c:                           976.7 ( 1.00x)
avg_qpel_pixels_tab[0][8]_mmxext:                      216.9 ( 4.50x)
avg_qpel_pixels_tab[0][8]_sse2:                        113.1 ( 8.64x)
avg_qpel_pixels_tab[0][9]_c:                          1971.8 ( 1.00x)
avg_qpel_pixels_tab[0][9]_mmxext:                      494.9 ( 3.98x)
avg_qpel_pixels_tab[0][9]_sse2:                        388.3 ( 5.08x)
avg_qpel_pixels_tab[0][10]_c:                         1900.8 ( 1.00x)
avg_qpel_pixels_tab[0][10]_mmxext:                     476.4 ( 3.99x)
avg_qpel_pixels_tab[0][10]_sse2:                       362.4 ( 5.24x)
avg_qpel_pixels_tab[0][11]_c:                         2003.3 ( 1.00x)
avg_qpel_pixels_tab[0][11]_mmxext:                     496.5 ( 4.04x)
avg_qpel_pixels_tab[0][11]_sse2:                       385.9 ( 5.19x)
avg_qpel_pixels_tab[0][12]_c:                          841.8 ( 1.00x)
avg_qpel_pixels_tab[0][12]_mmxext:                     226.7 ( 3.71x)
avg_qpel_pixels_tab[0][12]_sse2:                       143.3 ( 5.87x)
avg_qpel_pixels_tab[0][13]_c:                         1929.0 ( 1.00x)
avg_qpel_pixels_tab[0][13]_mmxext:                     499.6 ( 3.86x)
avg_qpel_pixels_tab[0][13]_sse2:                       412.1 ( 4.68x)
avg_qpel_pixels_tab[0][14]_c:                         1777.9 ( 1.00x)
avg_qpel_pixels_tab[0][14]_mmxext:                     484.8 ( 3.67x)
avg_qpel_pixels_tab[0][14]_sse2:                       385.9 ( 4.61x)
avg_qpel_pixels_tab[0][15]_c:                         1914.8 ( 1.00x)
avg_qpel_pixels_tab[0][15]_mmxext:                     501.8 ( 3.82x)
avg_qpel_pixels_tab[0][15]_sse2:                       405.0 ( 4.73x)
avg_qpel_pixels_tab[1][4]_c:                           203.4 ( 1.00x)
avg_qpel_pixels_tab[1][4]_mmxext:                       64.7 ( 3.14x)
avg_qpel_pixels_tab[1][4]_sse2:                         40.3 ( 5.05x)
avg_qpel_pixels_tab[1][5]_c:                           488.8 ( 1.00x)
avg_qpel_pixels_tab[1][5]_mmxext:                      134.6 ( 3.63x)
avg_qpel_pixels_tab[1][5]_sse2:                        108.5 ( 4.50x)
avg_qpel_pixels_tab[1][6]_c:                           448.2 ( 1.00x)
avg_qpel_pixels_tab[1][6]_mmxext:                      128.8 ( 3.48x)
avg_qpel_pixels_tab[1][6]_sse2:                        102.5 ( 4.37x)
avg_qpel_pixels_tab[1][7]_c:                           489.6 ( 1.00x)
avg_qpel_pixels_tab[1][7]_mmxext:                      134.5 ( 3.64x)
avg_qpel_pixels_tab[1][7]_sse2:                        108.8 ( 4.50x)
avg_qpel_pixels_tab[1][8]_c:                           223.8 ( 1.00x)
avg_qpel_pixels_tab[1][8]_mmxext:                       57.5 ( 3.89x)
avg_qpel_pixels_tab[1][8]_sse2:                         36.3 ( 6.16x)
avg_qpel_pixels_tab[1][9]_c:                           496.6 ( 1.00x)
avg_qpel_pixels_tab[1][9]_mmxext:                      129.8 ( 3.82x)
avg_qpel_pixels_tab[1][9]_sse2:                        105.1 ( 4.72x)
avg_qpel_pixels_tab[1][10]_c:                          466.1 ( 1.00x)
avg_qpel_pixels_tab[1][10]_mmxext:                     123.2 ( 3.78x)
avg_qpel_pixels_tab[1][10]_sse2:                        99.1 ( 4.70x)
avg_qpel_pixels_tab[1][11]_c:                          497.9 ( 1.00x)
avg_qpel_pixels_tab[1][11]_mmxext:                     129.9 ( 3.83x)
avg_qpel_pixels_tab[1][11]_sse2:                       105.4 ( 4.72x)
avg_qpel_pixels_tab[1][12]_c:                          203.5 ( 1.00x)
avg_qpel_pixels_tab[1][12]_mmxext:                      63.8 ( 3.19x)
avg_qpel_pixels_tab[1][12]_sse2:                        38.8 ( 5.25x)
avg_qpel_pixels_tab[1][13]_c:                          487.9 ( 1.00x)
avg_qpel_pixels_tab[1][13]_mmxext:                     134.7 ( 3.62x)
avg_qpel_pixels_tab[1][13]_sse2:                       108.4 ( 4.50x)
avg_qpel_pixels_tab[1][14]_c:                          447.4 ( 1.00x)
avg_qpel_pixels_tab[1][14]_mmxext:                     128.2 ( 3.49x)
avg_qpel_pixels_tab[1][14]_sse2:                       102.4 ( 4.37x)
avg_qpel_pixels_tab[1][15]_c:                          487.5 ( 1.00x)
avg_qpel_pixels_tab[1][15]_mmxext:                     134.0 ( 3.64x)
avg_qpel_pixels_tab[1][15]_sse2:                       109.9 ( 4.44x)

put_no_rnd_qpel_pixels_tab[0][4]_c:                    825.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][4]_mmxext:               242.5 ( 3.40x)
put_no_rnd_qpel_pixels_tab[0][4]_sse2:                 136.0 ( 6.07x)
put_no_rnd_qpel_pixels_tab[0][5]_c:                   1837.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][5]_mmxext:               542.5 ( 3.39x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2:                 446.5 ( 4.11x)
put_no_rnd_qpel_pixels_tab[0][6]_c:                   1766.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_mmxext:               493.6 ( 3.58x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2:                 394.6 ( 4.48x)
put_no_rnd_qpel_pixels_tab[0][7]_c:                   1877.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_mmxext:               541.9 ( 3.46x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2:                 447.6 ( 4.19x)
put_no_rnd_qpel_pixels_tab[0][8]_c:                    785.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][8]_mmxext:               206.2 ( 3.81x)
put_no_rnd_qpel_pixels_tab[0][8]_sse2:                 101.6 ( 7.73x)
put_no_rnd_qpel_pixels_tab[0][9]_c:                   1772.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_mmxext:               489.5 ( 3.62x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2:                 394.8 ( 4.49x)
put_no_rnd_qpel_pixels_tab[0][10]_c:                  1711.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][10]_mmxext:              461.2 ( 3.71x)
put_no_rnd_qpel_pixels_tab[0][10]_sse2:                357.9 ( 4.78x)
put_no_rnd_qpel_pixels_tab[0][11]_c:                  1815.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_mmxext:              490.8 ( 3.70x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2:                394.0 ( 4.61x)
put_no_rnd_qpel_pixels_tab[0][12]_c:                   824.8 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][12]_mmxext:              242.9 ( 3.40x)
put_no_rnd_qpel_pixels_tab[0][12]_sse2:                135.3 ( 6.10x)
put_no_rnd_qpel_pixels_tab[0][13]_c:                  1843.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_mmxext:              545.4 ( 3.38x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2:                444.9 ( 4.14x)
put_no_rnd_qpel_pixels_tab[0][14]_c:                  1758.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_mmxext:              497.7 ( 3.53x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2:                393.5 ( 4.47x)
put_no_rnd_qpel_pixels_tab[0][15]_c:                  1861.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_mmxext:              545.0 ( 3.42x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2:                445.7 ( 4.18x)
put_no_rnd_qpel_pixels_tab[1][4]_c:                    198.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][4]_mmxext:                64.3 ( 3.08x)
put_no_rnd_qpel_pixels_tab[1][4]_sse2:                  39.8 ( 4.98x)
put_no_rnd_qpel_pixels_tab[1][5]_c:                    460.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][5]_mmxext:               137.2 ( 3.36x)
put_no_rnd_qpel_pixels_tab[1][5]_sse2:                 113.5 ( 4.06x)
put_no_rnd_qpel_pixels_tab[1][6]_c:                    441.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][6]_mmxext:               126.7 ( 3.49x)
put_no_rnd_qpel_pixels_tab[1][6]_sse2:                 103.7 ( 4.26x)
put_no_rnd_qpel_pixels_tab[1][7]_c:                    465.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][7]_mmxext:               137.7 ( 3.38x)
put_no_rnd_qpel_pixels_tab[1][7]_sse2:                 114.0 ( 4.09x)
put_no_rnd_qpel_pixels_tab[1][8]_c:                    193.8 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][8]_mmxext:                52.1 ( 3.72x)
put_no_rnd_qpel_pixels_tab[1][8]_sse2:                  27.8 ( 6.97x)
put_no_rnd_qpel_pixels_tab[1][9]_c:                    450.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][9]_mmxext:               126.2 ( 3.57x)
put_no_rnd_qpel_pixels_tab[1][9]_sse2:                 104.3 ( 4.32x)
put_no_rnd_qpel_pixels_tab[1][10]_c:                   436.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][10]_mmxext:              118.1 ( 3.69x)
put_no_rnd_qpel_pixels_tab[1][10]_sse2:                 92.4 ( 4.73x)
put_no_rnd_qpel_pixels_tab[1][11]_c:                   453.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][11]_mmxext:              128.7 ( 3.52x)
put_no_rnd_qpel_pixels_tab[1][11]_sse2:                103.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[1][12]_c:                   201.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][12]_mmxext:               64.2 ( 3.13x)
put_no_rnd_qpel_pixels_tab[1][12]_sse2:                 39.6 ( 5.08x)
put_no_rnd_qpel_pixels_tab[1][13]_c:                   461.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][13]_mmxext:              137.6 ( 3.36x)
put_no_rnd_qpel_pixels_tab[1][13]_sse2:                113.4 ( 4.07x)
put_no_rnd_qpel_pixels_tab[1][14]_c:                   442.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][14]_mmxext:              127.0 ( 3.49x)
put_no_rnd_qpel_pixels_tab[1][14]_sse2:                102.2 ( 4.33x)
put_no_rnd_qpel_pixels_tab[1][15]_c:                   462.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][15]_mmxext:              139.5 ( 3.32x)
put_no_rnd_qpel_pixels_tab[1][15]_sse2:                113.3 ( 4.09x)

put_qpel_pixels_tab[0][4]_c:                           824.6 ( 1.00x)
put_qpel_pixels_tab[0][4]_mmxext:                      220.1 ( 3.75x)
put_qpel_pixels_tab[0][4]_sse2:                        137.8 ( 5.98x)
put_qpel_pixels_tab[0][5]_c:                          1892.0 ( 1.00x)
put_qpel_pixels_tab[0][5]_mmxext:                      508.0 ( 3.72x)
put_qpel_pixels_tab[0][5]_sse2:                        408.6 ( 4.63x)
put_qpel_pixels_tab[0][6]_c:                          1758.0 ( 1.00x)
put_qpel_pixels_tab[0][6]_mmxext:                      476.7 ( 3.69x)
put_qpel_pixels_tab[0][6]_sse2:                        381.4 ( 4.61x)
put_qpel_pixels_tab[0][7]_c:                          1924.3 ( 1.00x)
put_qpel_pixels_tab[0][7]_mmxext:                      495.1 ( 3.89x)
put_qpel_pixels_tab[0][7]_sse2:                        417.2 ( 4.61x)
put_qpel_pixels_tab[0][8]_c:                           772.1 ( 1.00x)
put_qpel_pixels_tab[0][8]_mmxext:                      197.5 ( 3.91x)
put_qpel_pixels_tab[0][8]_sse2:                        118.4 ( 6.52x)
put_qpel_pixels_tab[0][9]_c:                          1778.2 ( 1.00x)
put_qpel_pixels_tab[0][9]_mmxext:                      476.7 ( 3.73x)
put_qpel_pixels_tab[0][9]_sse2:                        379.6 ( 4.68x)
put_qpel_pixels_tab[0][10]_c:                         1714.6 ( 1.00x)
put_qpel_pixels_tab[0][10]_mmxext:                     460.7 ( 3.72x)
put_qpel_pixels_tab[0][10]_sse2:                       386.8 ( 4.43x)
put_qpel_pixels_tab[0][11]_c:                         1819.1 ( 1.00x)
put_qpel_pixels_tab[0][11]_mmxext:                     474.9 ( 3.83x)
put_qpel_pixels_tab[0][11]_sse2:                       404.5 ( 4.50x)
put_qpel_pixels_tab[0][12]_c:                          829.7 ( 1.00x)
put_qpel_pixels_tab[0][12]_mmxext:                     221.5 ( 3.75x)
put_qpel_pixels_tab[0][12]_sse2:                       138.7 ( 5.98x)
put_qpel_pixels_tab[0][13]_c:                         1892.8 ( 1.00x)
put_qpel_pixels_tab[0][13]_mmxext:                     494.4 ( 3.83x)
put_qpel_pixels_tab[0][13]_sse2:                       413.9 ( 4.57x)
put_qpel_pixels_tab[0][14]_c:                         1763.1 ( 1.00x)
put_qpel_pixels_tab[0][14]_mmxext:                     473.4 ( 3.72x)
put_qpel_pixels_tab[0][14]_sse2:                       377.8 ( 4.67x)
put_qpel_pixels_tab[0][15]_c:                         1896.4 ( 1.00x)
put_qpel_pixels_tab[0][15]_mmxext:                     492.5 ( 3.85x)
put_qpel_pixels_tab[0][15]_sse2:                       399.0 ( 4.75x)
put_qpel_pixels_tab[1][4]_c:                           198.6 ( 1.00x)
put_qpel_pixels_tab[1][4]_mmxext:                       60.9 ( 3.26x)
put_qpel_pixels_tab[1][4]_sse2:                         40.1 ( 4.95x)
put_qpel_pixels_tab[1][5]_c:                           471.4 ( 1.00x)
put_qpel_pixels_tab[1][5]_mmxext:                      131.8 ( 3.58x)
put_qpel_pixels_tab[1][5]_sse2:                        107.2 ( 4.40x)
put_qpel_pixels_tab[1][6]_c:                           440.3 ( 1.00x)
put_qpel_pixels_tab[1][6]_mmxext:                      126.3 ( 3.49x)
put_qpel_pixels_tab[1][6]_sse2:                        100.6 ( 4.38x)
put_qpel_pixels_tab[1][7]_c:                           469.2 ( 1.00x)
put_qpel_pixels_tab[1][7]_mmxext:                      131.7 ( 3.56x)
put_qpel_pixels_tab[1][7]_sse2:                        106.9 ( 4.39x)
put_qpel_pixels_tab[1][8]_c:                           194.2 ( 1.00x)
put_qpel_pixels_tab[1][8]_mmxext:                       52.9 ( 3.67x)
put_qpel_pixels_tab[1][8]_sse2:                         28.0 ( 6.95x)
put_qpel_pixels_tab[1][9]_c:                           464.6 ( 1.00x)
put_qpel_pixels_tab[1][9]_mmxext:                      125.1 ( 3.71x)
put_qpel_pixels_tab[1][9]_sse2:                        100.9 ( 4.60x)
put_qpel_pixels_tab[1][10]_c:                          433.8 ( 1.00x)
put_qpel_pixels_tab[1][10]_mmxext:                     118.2 ( 3.67x)
put_qpel_pixels_tab[1][10]_sse2:                        94.5 ( 4.59x)
put_qpel_pixels_tab[1][11]_c:                          463.9 ( 1.00x)
put_qpel_pixels_tab[1][11]_mmxext:                     125.5 ( 3.70x)
put_qpel_pixels_tab[1][11]_sse2:                       102.6 ( 4.52x)
put_qpel_pixels_tab[1][12]_c:                          199.2 ( 1.00x)
put_qpel_pixels_tab[1][12]_mmxext:                      63.7 ( 3.12x)
put_qpel_pixels_tab[1][12]_sse2:                        36.2 ( 5.50x)
put_qpel_pixels_tab[1][13]_c:                          475.6 ( 1.00x)
put_qpel_pixels_tab[1][13]_mmxext:                     139.5 ( 3.41x)
put_qpel_pixels_tab[1][13]_sse2:                       107.3 ( 4.43x)
put_qpel_pixels_tab[1][14]_c:                          441.9 ( 1.00x)
put_qpel_pixels_tab[1][14]_mmxext:                     126.9 ( 3.48x)
put_qpel_pixels_tab[1][14]_sse2:                       101.3 ( 4.36x)
put_qpel_pixels_tab[1][15]_c:                          475.9 ( 1.00x)
put_qpel_pixels_tab[1][15]_mmxext:                     131.9 ( 3.61x)
put_qpel_pixels_tab[1][15]_sse2:                       107.0 ( 4.45x)

The new functions (in qpeldsp.asm) occupy 8244B (the MMXEXT functions
which they will replace occupy only 6720B).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
405465700c avcodec/x86/qpeldsp: Don't allocate stack unnecessarily
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
188df9549c avcodec/x86/qpeldsp: Don't use too much stack
We only need (SIZE+1)*SIZE words.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
bcf7293a21 avcodec/x86/qpeldsp: Remove unused declaration
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
7b56259dd5 avcodec/x86/constants: Move ff_pw_{15,20} to qpeldsp.asm
Only used there.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
c2685234a6 avcodec/x86/qpeldsp_init: Deduplicate 8x8 and 16x16 code
Also split the big macro into smaller ones for the pure horizontal vs
the pure vertical and the mixed directions.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
cf79d8052d avcodec/x86/qpeldsp_init: Specify alignment properly
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
69906d31c5 avcodec/x86/qpeldsp_init: Don't use unnecessarily big stack buffer
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
d3bd1318b3 avcodec/x86/qpeldsp: Don't zero unnecessarily
This value is write-only.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
d46414b46b avcodec/x86/qpeldsp: Simplify resetting output pointer
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Martin Storsjö
963ea707e3 arm/rv40dsp: Add * on comment continuation lines in prototypes
This avoids that the assembly indenter script tries to indent these
lines as assembly code.
2026-04-29 13:53:07 +03:00
Martin Storsjö
0a86aead82 arm/vc1dsp: Fix a few cases of inconsistent indentation
The function ff_vc1_unescape_buffer_helper_neon intentionally
uses unusual indentation, to indicate different levels of
unrolling in the function.
2026-04-29 13:53:07 +03:00
Martin Storsjö
10a45072fc arm/jrevdct: Indent previously unindented assembly
The comments have been manually tweaked to line up properly.
2026-04-29 13:53:07 +03:00
Martin Storsjö
5e0f1b1eda arm/hevcdsp_qpel: Reindent code that seem to lack consistent indentation 2026-04-29 13:53:07 +03:00
Martin Storsjö
65d4c5bbe2 arm: Reindent asm that used consistent but differing styles
The qpel_filter macros in hevcdsp_qpel_neon.S have been
manually tweaked to keep reasonable indentation of the
comments.
2026-04-29 13:53:07 +03:00
Martin Storsjö
2325421904 arm/synth_filter_vfp: Fix indentation
This was done with manual adjustments; the reindentation
script doesn't handle the VFP/NOVFP macros at the start of
lines.
2026-04-29 13:53:07 +03:00
Ramiro Polla
8d9c1db95d arm/simple_idct_arm: Reindent previously unindented code 2026-04-29 13:53:07 +03:00
Martin Storsjö
a65ed248fd arm/simple_idct_armv6: Reindent previously consistent assembly to shared style
This has manual fixups, as the indenting script wants to
lowercase constants like W46 to w46, which breaks things.
2026-04-29 13:49:27 +03:00
Martin Storsjö
b27fd61020 arm/simple_idct_armv5te: Reindent previously consistent code to common style
This has manual fixups, as the indenting script wants to
lowercase constants like W26 to w26, which breaks things.
2026-04-29 13:49:27 +03:00
Martin Storsjö
8e199a2a9f arm/rv34dsp: Adjust macro argument indentation slightly
The previous form did neatly align with the lines above, but doesn't
match general indentation rules from our indentation script.
2026-04-29 13:49:27 +03:00
Martin Storsjö
d94e2b0f7c arm/hevcdsp: Fix misindented instructions in some macros 2026-04-29 13:49:27 +03:00