Gyan Doshi
4a2b643646
avcodec/mediacodecdec: declare correct class for audio decoders
...
The class for video decoders had been assigned till date.
2026-05-03 05:58:13 +00:00
Michael Niedermayer
23227a444d
avcodec/wmaenc: Fix missing padding in extradata
...
Reported-by: Kenan Alghythee <kalghy2@uic.edu>
2026-05-03 02:36:54 +00:00
Michael Niedermayer
242ff799c7
avcodec/tdsc: remove double stride adjustment
...
Fixes: out of array access
Found-by: Seung Min Shin
Patch based on suggested fix by Seung Min Shin
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-02 23:11:24 +00:00
Michael Niedermayer
05817dc7dd
avcodec/notchlc: Check 255 loops
...
Fixes: integer overflow
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-02 21:39:02 +00:00
Michael Niedermayer
bf4eb194cf
avcodec/tdsc: Better input size check
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-02 21:13:01 +00:00
Michael Niedermayer
bb69a090a7
avcodec/tdsc: Check jpeg size
...
Fixes: out of array read
Fixes: tdsc_tile_dim_mismatch.avi
Found-by: Ante Silovic <asilovic155@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-02 21:13:01 +00:00
Michael Niedermayer
af87d77514
avcodec/tdsc: Prettier uncompress() check
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-02 21:13:01 +00:00
Michael Niedermayer
e9e6fb8798
avcodec/tdsc: Check tile_size
...
Fixes: out of array read
Fixes: tdsc_war_groom_far4096.avi
Found by: Ante Silovic <asilovic155@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-02 21:13:01 +00:00
Michael Niedermayer
9572ab7f45
avcodec/decode: Better documentation for ff_set_dimensions()
...
Clarify what is checked and that it avoids explicit generic overflow checks
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-02 21:11:47 +00:00
Kacper Michajłow
dba0b078c8
avcodec/vaapi_av1: reorder functions to avoid fwd decl
2026-05-01 23:59:06 +00:00
Kacper Michajłow
688f68bffa
avcodec/vaapi_av1: fix leak of ref frames on init failure
...
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2026-05-01 23:59:06 +00:00
Leo Izen
739fc9249c
avcodec/libjxlenc: fix frame->linesize raw pointer read
...
These should say frame->linesize[0] as it does everywhere else this
variable is referenced. Fixes a typo bug.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:25 -04:00
Leo Izen
05b5add006
avcodec/libjxlenc: check orientation tag metadata before reading
...
We need to check that entry->count is nonzero and that entry->type is
AV_TIFF_SHORT before reading from the buffer, in case a maliciously
constructed IFD uses a zero-count or an unusual type (e.g. IFD) for it.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:25 -04:00
Leo Izen
f1cab2d018
avcodec/exif_internal.h: improve return docs for ff_exif_get_buffer
...
This commit improves the documentation for the return value of the
function ff_exif_get_buffer.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:25 -04:00
Leo Izen
087ec68451
avcodec/exif.c: synthesize EXIF data from frame metadata and matrix
...
If the displaymatrix is present, we should synthesize EXIF data from
the values there even if there is no EXIF attached to the frame.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:25 -04:00
Leo Izen
1d36c4d8ae
avcodec/exif.c: reset ifd->size when freeing ifd->entries
...
If we free ifd->entries then we need to set ifd->size to 0 so another
call to av_fast_realloc doesn't get confused.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:24 -04:00
Leo Izen
326808ad2f
avcodec/exif.c: add check for singular displaymatrix data
...
If av_exif_matrix_to_orientation returns 0, then the display matrix
is singular. In this case we should treat it as 1 and print a warning.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:24 -04:00
Leo Izen
317d660281
avcodec/exif.c: account for header_mode difference on rewrite
...
When determining if we need to rewrite the exif buffer or can pass
through as-is, account for a difference in header_mode requested from
the one that is used internally.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:24 -04:00
Leo Izen
4f5dfce5a8
avcodec/exif.c: use less than or equal for max width and height
...
The max width and height for PIXEL_X_TAG and PIXEL_Y_TAG is 0xFFFFu
because these are unsigned shorts, but we used < instead of <=
erroneously. Fix that.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:24 -04:00
Leo Izen
2cddfe7d0c
avcodec/exif.c: pop entry off IFD if allocation fails
...
In av_exif_set_entry, if cloning the entry fails because of an alloc
failed, then we remove the entry from the IFD. If that entry exists
in the middle of ifd->entries we need to shift everything to the left
which this commit implements.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:24 -04:00
Leo Izen
0c39b1bccd
avcodec/exif.h: fix documentation on av_exif_get_entry and similar
...
Add additional documentation to av_exif_get_entry and also to
av_exif_set_entry that was already part of the existing ABI but was
insufficiently documented before this commit. Also clarifies that
av_fast_realloc is used, instead of av_realloc on av_exif_set_entry.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
2026-05-01 07:40:24 -04:00
Andreas Rheinhardt
cc3ca17127
avcodec/x86/qpeldsp{,_init}: Use proper prefix
...
E.g. rename ff_put_mpeg4_qpel8_h_lowpass_ssse3 to
ff_mpeg4_put_qpel8_h_lowpass_ssse3.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
ca43bc6202
avcodec/x86/qpeldsp_init: Mark functions as hidden
...
It allows pic 32bit code to call the underlying
assembly functions directly, without loading
the GOT first; this saves 1245B of .text here
(for 32bit pic code).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
23d3116af9
avcodec/x86/qpeldsp: Add combination of h_lowpass + l2
...
If the subpel part of the horizontal component of
the motion vector is 1/4 or 3/4, the MPEG-4 qpel motion compensation
first computes the mc for the corresponding motion vector
with 1/2 horizontal subpel part and then averages this
with the left (for 1/4) or the right (for 3/4) source pixel.
These two stages are currently performed in two different functions,
involving a stack buffer as intermediate.
This means that horizontal prediction for every function with
a 1/4 or 3/4 horizontal subpel mv is more expensive code-size wise
(and also performance-wise) as it involves two calls. Given that
the horizontal lowpass functions are not that long, adding combinations
of h_lowpass+l2 actually reduces binary size: An increase of 1136B
in the asm files is more than offset by size reductions in
the wrappers: 1968B here when not using stack protection,
2256B when using stack protection.
Of course it also improves performance. Old benchmarks:
avg_qpel_pixels_tab[0][1]_ssse3: 106.9 ( 8.69x)
avg_qpel_pixels_tab[0][3]_ssse3: 105.5 ( 8.84x)
avg_qpel_pixels_tab[0][5]_ssse3: 226.9 ( 8.57x)
avg_qpel_pixels_tab[0][7]_ssse3: 231.1 ( 8.38x)
avg_qpel_pixels_tab[0][9]_ssse3: 217.8 ( 9.04x)
avg_qpel_pixels_tab[0][11]_ssse3: 214.9 ( 9.32x)
avg_qpel_pixels_tab[0][13]_ssse3: 227.1 ( 8.48x)
avg_qpel_pixels_tab[0][15]_ssse3: 236.1 ( 8.02x)
New benchmarks:
avg_qpel_pixels_tab[0][1]_ssse3: 96.7 ( 9.65x)
avg_qpel_pixels_tab[0][3]_ssse3: 96.6 ( 9.73x)
avg_qpel_pixels_tab[0][5]_ssse3: 225.8 ( 8.61x)
avg_qpel_pixels_tab[0][7]_ssse3: 228.4 ( 8.51x)
avg_qpel_pixels_tab[0][9]_ssse3: 217.1 ( 9.05x)
avg_qpel_pixels_tab[0][11]_ssse3: 217.8 ( 9.32x)
avg_qpel_pixels_tab[0][13]_ssse3: 227.2 ( 8.54x)
avg_qpel_pixels_tab[0][15]_ssse3: 220.5 ( 8.72x)
Note: The l2 functions are also used for vertical lowpass
functions, yet given that they are much bigger, duplicating
them would lead to massive code size increase.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
f946cac2d9
avcodec/x86/qpeldsp: Remove horizontal mmxext mc functions
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
1d040c527d
avcodec/x86/qpeldsp: Add SSSE3 size 8 horizontal filter
...
Beats the mmxext version by a lot (in the following,
[1][1-3] refers to horizontal-only size 8 mc;
the _sse2 comparators for the other cases use mmxext
horizontal mc coupled with vertical SSE2 mc):
avg_qpel_pixels_tab[1][1]_c: 223.9 ( 1.00x)
avg_qpel_pixels_tab[1][1]_mmxext: 66.2 ( 3.38x)
avg_qpel_pixels_tab[1][1]_ssse3: 36.8 ( 6.08x)
avg_qpel_pixels_tab[1][2]_c: 251.0 ( 1.00x)
avg_qpel_pixels_tab[1][2]_mmxext: 58.5 ( 4.29x)
avg_qpel_pixels_tab[1][2]_ssse3: 25.5 ( 9.84x)
avg_qpel_pixels_tab[1][3]_c: 226.9 ( 1.00x)
avg_qpel_pixels_tab[1][3]_mmxext: 66.3 ( 3.42x)
avg_qpel_pixels_tab[1][3]_ssse3: 35.8 ( 6.34x)
avg_qpel_pixels_tab[1][5]_c: 473.9 ( 1.00x)
avg_qpel_pixels_tab[1][5]_sse2: 110.7 ( 4.28x)
avg_qpel_pixels_tab[1][5]_ssse3: 76.0 ( 6.24x)
avg_qpel_pixels_tab[1][6]_c: 440.9 ( 1.00x)
avg_qpel_pixels_tab[1][6]_sse2: 102.1 ( 4.32x)
avg_qpel_pixels_tab[1][6]_ssse3: 67.1 ( 6.58x)
avg_qpel_pixels_tab[1][7]_c: 473.8 ( 1.00x)
avg_qpel_pixels_tab[1][7]_sse2: 108.0 ( 4.39x)
avg_qpel_pixels_tab[1][7]_ssse3: 74.6 ( 6.35x)
avg_qpel_pixels_tab[1][9]_c: 492.9 ( 1.00x)
avg_qpel_pixels_tab[1][9]_sse2: 102.1 ( 4.83x)
avg_qpel_pixels_tab[1][9]_ssse3: 67.1 ( 7.35x)
avg_qpel_pixels_tab[1][10]_c: 465.6 ( 1.00x)
avg_qpel_pixels_tab[1][10]_sse2: 94.9 ( 4.91x)
avg_qpel_pixels_tab[1][10]_ssse3: 57.5 ( 8.10x)
avg_qpel_pixels_tab[1][11]_c: 492.8 ( 1.00x)
avg_qpel_pixels_tab[1][11]_sse2: 102.4 ( 4.81x)
avg_qpel_pixels_tab[1][11]_ssse3: 68.7 ( 7.17x)
avg_qpel_pixels_tab[1][13]_c: 476.6 ( 1.00x)
avg_qpel_pixels_tab[1][13]_sse2: 108.6 ( 4.39x)
avg_qpel_pixels_tab[1][13]_ssse3: 74.7 ( 6.38x)
avg_qpel_pixels_tab[1][14]_c: 434.9 ( 1.00x)
avg_qpel_pixels_tab[1][14]_sse2: 102.2 ( 4.25x)
avg_qpel_pixels_tab[1][14]_ssse3: 66.6 ( 6.53x)
avg_qpel_pixels_tab[1][15]_c: 474.1 ( 1.00x)
avg_qpel_pixels_tab[1][15]_sse2: 107.9 ( 4.39x)
avg_qpel_pixels_tab[1][15]_ssse3: 74.3 ( 6.38x)
put_no_rnd_qpel_pixels_tab[1][1]_c: 222.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][1]_mmxext: 66.0 ( 3.37x)
put_no_rnd_qpel_pixels_tab[1][1]_ssse3: 35.2 ( 6.31x)
put_no_rnd_qpel_pixels_tab[1][2]_c: 212.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][2]_mmxext: 56.8 ( 3.74x)
put_no_rnd_qpel_pixels_tab[1][2]_ssse3: 25.0 ( 8.48x)
put_no_rnd_qpel_pixels_tab[1][3]_c: 224.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][3]_mmxext: 65.8 ( 3.41x)
put_no_rnd_qpel_pixels_tab[1][3]_ssse3: 35.8 ( 6.26x)
put_no_rnd_qpel_pixels_tab[1][5]_c: 460.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][5]_sse2: 114.6 ( 4.01x)
put_no_rnd_qpel_pixels_tab[1][5]_ssse3: 83.1 ( 5.53x)
put_no_rnd_qpel_pixels_tab[1][6]_c: 438.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][6]_sse2: 104.2 ( 4.21x)
put_no_rnd_qpel_pixels_tab[1][6]_ssse3: 67.5 ( 6.50x)
put_no_rnd_qpel_pixels_tab[1][7]_c: 458.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][7]_sse2: 113.8 ( 4.02x)
put_no_rnd_qpel_pixels_tab[1][7]_ssse3: 79.9 ( 5.73x)
put_no_rnd_qpel_pixels_tab[1][9]_c: 439.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][9]_sse2: 103.7 ( 4.23x)
put_no_rnd_qpel_pixels_tab[1][9]_ssse3: 68.9 ( 6.37x)
put_no_rnd_qpel_pixels_tab[1][10]_c: 427.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][10]_sse2: 93.2 ( 4.58x)
put_no_rnd_qpel_pixels_tab[1][10]_ssse3: 57.9 ( 7.37x)
put_no_rnd_qpel_pixels_tab[1][11]_c: 439.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][11]_sse2: 104.0 ( 4.23x)
put_no_rnd_qpel_pixels_tab[1][11]_ssse3: 69.2 ( 6.36x)
put_no_rnd_qpel_pixels_tab[1][13]_c: 459.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][13]_sse2: 113.2 ( 4.06x)
put_no_rnd_qpel_pixels_tab[1][13]_ssse3: 83.8 ( 5.48x)
put_no_rnd_qpel_pixels_tab[1][14]_c: 439.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][14]_sse2: 103.3 ( 4.25x)
put_no_rnd_qpel_pixels_tab[1][14]_ssse3: 67.9 ( 6.47x)
put_no_rnd_qpel_pixels_tab[1][15]_c: 453.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][15]_sse2: 113.7 ( 3.99x)
put_no_rnd_qpel_pixels_tab[1][15]_ssse3: 80.0 ( 5.67x)
put_qpel_pixels_tab[1][1]_c: 229.0 ( 1.00x)
put_qpel_pixels_tab[1][1]_mmxext: 65.5 ( 3.50x)
put_qpel_pixels_tab[1][1]_ssse3: 33.8 ( 6.77x)
put_qpel_pixels_tab[1][2]_c: 212.5 ( 1.00x)
put_qpel_pixels_tab[1][2]_mmxext: 56.6 ( 3.75x)
put_qpel_pixels_tab[1][2]_ssse3: 23.4 ( 9.08x)
put_qpel_pixels_tab[1][3]_c: 227.5 ( 1.00x)
put_qpel_pixels_tab[1][3]_mmxext: 64.4 ( 3.53x)
put_qpel_pixels_tab[1][3]_ssse3: 33.5 ( 6.79x)
put_qpel_pixels_tab[1][5]_c: 466.5 ( 1.00x)
put_qpel_pixels_tab[1][5]_sse2: 106.8 ( 4.37x)
put_qpel_pixels_tab[1][5]_ssse3: 71.8 ( 6.50x)
put_qpel_pixels_tab[1][6]_c: 438.7 ( 1.00x)
put_qpel_pixels_tab[1][6]_sse2: 102.0 ( 4.30x)
put_qpel_pixels_tab[1][6]_ssse3: 65.3 ( 6.72x)
put_qpel_pixels_tab[1][7]_c: 466.0 ( 1.00x)
put_qpel_pixels_tab[1][7]_sse2: 106.3 ( 4.38x)
put_qpel_pixels_tab[1][7]_ssse3: 70.9 ( 6.57x)
put_qpel_pixels_tab[1][9]_c: 456.0 ( 1.00x)
put_qpel_pixels_tab[1][9]_sse2: 100.1 ( 4.55x)
put_qpel_pixels_tab[1][9]_ssse3: 64.0 ( 7.13x)
put_qpel_pixels_tab[1][10]_c: 425.1 ( 1.00x)
put_qpel_pixels_tab[1][10]_sse2: 92.6 ( 4.59x)
put_qpel_pixels_tab[1][10]_ssse3: 55.1 ( 7.71x)
put_qpel_pixels_tab[1][11]_c: 452.7 ( 1.00x)
put_qpel_pixels_tab[1][11]_sse2: 99.6 ( 4.55x)
put_qpel_pixels_tab[1][11]_ssse3: 63.8 ( 7.09x)
put_qpel_pixels_tab[1][13]_c: 471.2 ( 1.00x)
put_qpel_pixels_tab[1][13]_sse2: 106.4 ( 4.43x)
put_qpel_pixels_tab[1][13]_ssse3: 71.4 ( 6.60x)
put_qpel_pixels_tab[1][14]_c: 439.7 ( 1.00x)
put_qpel_pixels_tab[1][14]_sse2: 101.8 ( 4.32x)
put_qpel_pixels_tab[1][14]_ssse3: 64.8 ( 6.79x)
put_qpel_pixels_tab[1][15]_c: 467.8 ( 1.00x)
put_qpel_pixels_tab[1][15]_sse2: 106.1 ( 4.41x)
put_qpel_pixels_tab[1][15]_ssse3: 72.6 ( 6.44x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
c0e1c1d6b3
avcodec/x86/qpeldsp: Add SSSE3 size 16 horizontal filter
...
Beats the mmxext version by a lot (in the following,
[0][1-3] refers to horizontal-only size 16 mc;
the _sse2 comparators for the other cases use mmxext
horizontal mc coupled with vertical SSE2 mc):
avg_qpel_pixels_tab[0][1]_c: 945.5 ( 1.00x)
avg_qpel_pixels_tab[0][1]_mmxext: 262.6 ( 3.60x)
avg_qpel_pixels_tab[0][1]_ssse3: 110.4 ( 8.57x)
avg_qpel_pixels_tab[0][2]_c: 1042.1 ( 1.00x)
avg_qpel_pixels_tab[0][2]_mmxext: 245.1 ( 4.25x)
avg_qpel_pixels_tab[0][2]_ssse3: 91.7 (11.37x)
avg_qpel_pixels_tab[0][3]_c: 941.8 ( 1.00x)
avg_qpel_pixels_tab[0][3]_mmxext: 260.1 ( 3.62x)
avg_qpel_pixels_tab[0][3]_ssse3: 110.1 ( 8.56x)
avg_qpel_pixels_tab[0][5]_c: 1939.5 ( 1.00x)
avg_qpel_pixels_tab[0][5]_sse2: 394.3 ( 4.92x)
avg_qpel_pixels_tab[0][5]_ssse3: 247.4 ( 7.84x)
avg_qpel_pixels_tab[0][6]_c: 1785.8 ( 1.00x)
avg_qpel_pixels_tab[0][6]_sse2: 380.6 ( 4.69x)
avg_qpel_pixels_tab[0][6]_ssse3: 221.1 ( 8.08x)
avg_qpel_pixels_tab[0][7]_c: 1932.5 ( 1.00x)
avg_qpel_pixels_tab[0][7]_sse2: 393.4 ( 4.91x)
avg_qpel_pixels_tab[0][7]_ssse3: 238.8 ( 8.09x)
avg_qpel_pixels_tab[0][9]_c: 1976.9 ( 1.00x)
avg_qpel_pixels_tab[0][9]_sse2: 380.8 ( 5.19x)
avg_qpel_pixels_tab[0][9]_ssse3: 223.3 ( 8.85x)
avg_qpel_pixels_tab[0][10]_c: 1911.9 ( 1.00x)
avg_qpel_pixels_tab[0][10]_sse2: 366.9 ( 5.21x)
avg_qpel_pixels_tab[0][10]_ssse3: 207.0 ( 9.24x)
avg_qpel_pixels_tab[0][11]_c: 2046.9 ( 1.00x)
avg_qpel_pixels_tab[0][11]_sse2: 385.5 ( 5.31x)
avg_qpel_pixels_tab[0][11]_ssse3: 227.9 ( 8.98x)
avg_qpel_pixels_tab[0][13]_c: 1940.8 ( 1.00x)
avg_qpel_pixels_tab[0][13]_sse2: 389.7 ( 4.98x)
avg_qpel_pixels_tab[0][13]_ssse3: 244.2 ( 7.95x)
avg_qpel_pixels_tab[0][14]_c: 1778.4 ( 1.00x)
avg_qpel_pixels_tab[0][14]_sse2: 379.2 ( 4.69x)
avg_qpel_pixels_tab[0][14]_ssse3: 223.5 ( 7.96x)
avg_qpel_pixels_tab[0][15]_c: 1905.9 ( 1.00x)
avg_qpel_pixels_tab[0][15]_sse2: 398.9 ( 4.78x)
avg_qpel_pixels_tab[0][15]_ssse3: 238.3 ( 8.00x)
put_no_rnd_qpel_pixels_tab[0][1]_c: 922.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][1]_mmxext: 275.0 ( 3.35x)
put_no_rnd_qpel_pixels_tab[0][1]_ssse3: 108.4 ( 8.51x)
put_no_rnd_qpel_pixels_tab[0][2]_c: 889.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][2]_mmxext: 236.7 ( 3.76x)
put_no_rnd_qpel_pixels_tab[0][2]_ssse3: 86.8 (10.25x)
put_no_rnd_qpel_pixels_tab[0][3]_c: 915.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][3]_mmxext: 274.3 ( 3.34x)
put_no_rnd_qpel_pixels_tab[0][3]_ssse3: 108.2 ( 8.46x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2: 400.0 ( 4.63x)
put_no_rnd_qpel_pixels_tab[0][5]_ssse3: 246.0 ( 7.53x)
put_no_rnd_qpel_pixels_tab[0][6]_c: 1753.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2: 382.5 ( 4.59x)
put_no_rnd_qpel_pixels_tab[0][6]_ssse3: 226.4 ( 7.75x)
put_no_rnd_qpel_pixels_tab[0][7]_c: 1854.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2: 393.5 ( 4.71x)
put_no_rnd_qpel_pixels_tab[0][7]_ssse3: 248.6 ( 7.46x)
put_no_rnd_qpel_pixels_tab[0][9]_c: 1794.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2: 382.2 ( 4.70x)
put_no_rnd_qpel_pixels_tab[0][9]_ssse3: 228.0 ( 7.87x)
put_no_rnd_qpel_pixels_tab[0][10]_c: 1724.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][10]_sse2: 353.8 ( 4.88x)
put_no_rnd_qpel_pixels_tab[0][10]_ssse3: 206.5 ( 8.35x)
put_no_rnd_qpel_pixels_tab[0][11]_c: 1796.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2: 378.1 ( 4.75x)
put_no_rnd_qpel_pixels_tab[0][11]_ssse3: 227.1 ( 7.91x)
put_no_rnd_qpel_pixels_tab[0][13]_c: 1834.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2: 400.7 ( 4.58x)
put_no_rnd_qpel_pixels_tab[0][13]_ssse3: 244.2 ( 7.51x)
put_no_rnd_qpel_pixels_tab[0][14]_c: 1755.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2: 387.2 ( 4.53x)
put_no_rnd_qpel_pixels_tab[0][14]_ssse3: 226.8 ( 7.74x)
put_no_rnd_qpel_pixels_tab[0][15]_c: 1847.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2: 400.6 ( 4.61x)
put_no_rnd_qpel_pixels_tab[0][15]_ssse3: 246.1 ( 7.51x)
put_qpel_pixels_tab[0][1]_c: 919.6 ( 1.00x)
put_qpel_pixels_tab[0][1]_mmxext: 255.5 ( 3.60x)
put_qpel_pixels_tab[0][1]_ssse3: 108.3 ( 8.49x)
put_qpel_pixels_tab[0][2]_c: 883.9 ( 1.00x)
put_qpel_pixels_tab[0][2]_mmxext: 238.1 ( 3.71x)
put_qpel_pixels_tab[0][2]_ssse3: 86.7 (10.19x)
put_qpel_pixels_tab[0][3]_c: 921.9 ( 1.00x)
put_qpel_pixels_tab[0][3]_mmxext: 258.9 ( 3.56x)
put_qpel_pixels_tab[0][3]_ssse3: 108.1 ( 8.53x)
put_qpel_pixels_tab[0][5]_c: 1907.5 ( 1.00x)
put_qpel_pixels_tab[0][5]_sse2: 384.2 ( 4.96x)
put_qpel_pixels_tab[0][5]_ssse3: 234.8 ( 8.13x)
put_qpel_pixels_tab[0][6]_c: 1757.4 ( 1.00x)
put_qpel_pixels_tab[0][6]_sse2: 382.8 ( 4.59x)
put_qpel_pixels_tab[0][6]_ssse3: 217.6 ( 8.08x)
put_qpel_pixels_tab[0][7]_c: 1927.5 ( 1.00x)
put_qpel_pixels_tab[0][7]_sse2: 384.6 ( 5.01x)
put_qpel_pixels_tab[0][7]_ssse3: 231.2 ( 8.34x)
put_qpel_pixels_tab[0][9]_c: 1832.1 ( 1.00x)
put_qpel_pixels_tab[0][9]_sse2: 374.8 ( 4.89x)
put_qpel_pixels_tab[0][9]_ssse3: 219.4 ( 8.35x)
put_qpel_pixels_tab[0][10]_c: 1710.3 ( 1.00x)
put_qpel_pixels_tab[0][10]_sse2: 384.5 ( 4.45x)
put_qpel_pixels_tab[0][10]_ssse3: 202.9 ( 8.43x)
put_qpel_pixels_tab[0][11]_c: 1825.0 ( 1.00x)
put_qpel_pixels_tab[0][11]_sse2: 369.6 ( 4.94x)
put_qpel_pixels_tab[0][11]_ssse3: 216.8 ( 8.42x)
put_qpel_pixels_tab[0][13]_c: 1898.4 ( 1.00x)
put_qpel_pixels_tab[0][13]_sse2: 384.9 ( 4.93x)
put_qpel_pixels_tab[0][13]_ssse3: 238.6 ( 7.96x)
put_qpel_pixels_tab[0][14]_c: 1779.1 ( 1.00x)
put_qpel_pixels_tab[0][14]_sse2: 373.3 ( 4.77x)
put_qpel_pixels_tab[0][14]_ssse3: 218.1 ( 8.16x)
put_qpel_pixels_tab[0][15]_c: 1918.2 ( 1.00x)
put_qpel_pixels_tab[0][15]_sse2: 385.3 ( 4.98x)
put_qpel_pixels_tab[0][15]_ssse3: 236.8 ( 8.10x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
a3d747f344
avcodec/x86/qpeldsp{,_init}: Use SSE2 pixels16x16_l2 functions
...
put and avg versions have been added and used in H264
in b91081274f . This commit
adds the size 16 version of put_no_rnd and uses all three
of them in the SSE2 size 16 qpel functions (i.e. it uses
them in the ones that have a vertical component); it also
removes the 16x17 MMXEXT versions (which are no longer used).
This is particularly beneficial for put_no_rnd:
avg_qpel_pixels_tab[0][5]_c: 1910.9 ( 1.00x)
avg_qpel_pixels_tab[0][5]_sse2 (old): 405.1 ( 4.72x)
avg_qpel_pixels_tab[0][5]_sse2: 392.9 ( 4.86x)
avg_qpel_pixels_tab[0][6]_c: 1778.9 ( 1.00x)
avg_qpel_pixels_tab[0][6]_sse2 (old): 385.5 ( 4.61x)
avg_qpel_pixels_tab[0][6]_sse2: 374.9 ( 4.75x)
avg_qpel_pixels_tab[0][7]_c: 1935.3 ( 1.00x)
avg_qpel_pixels_tab[0][7]_sse2 (old): 403.1 ( 4.80x)
avg_qpel_pixels_tab[0][7]_sse2: 391.6 ( 4.94x)
avg_qpel_pixels_tab[0][9]_c: 1969.0 ( 1.00x)
avg_qpel_pixels_tab[0][9]_sse2 (old): 384.1 ( 5.13x)
avg_qpel_pixels_tab[0][9]_sse2: 380.3 ( 5.18x)
avg_qpel_pixels_tab[0][11]_c: 2014.9 ( 1.00x)
avg_qpel_pixels_tab[0][11]_sse2 (old): 385.6 ( 5.23x)
avg_qpel_pixels_tab[0][11]_sse2: 380.2 ( 5.30x)
avg_qpel_pixels_tab[0][13]_c: 1925.7 ( 1.00x)
avg_qpel_pixels_tab[0][13]_sse2 (old): 406.1 ( 4.74x)
avg_qpel_pixels_tab[0][13]_sse2: 390.4 ( 4.93x)
avg_qpel_pixels_tab[0][14]_c: 1793.0 ( 1.00x)
avg_qpel_pixels_tab[0][14]_sse2 (old): 389.6 ( 4.60x)
avg_qpel_pixels_tab[0][14]_sse2: 377.1 ( 4.75x)
avg_qpel_pixels_tab[0][15]_c: 1913.0 ( 1.00x)
avg_qpel_pixels_tab[0][15]_sse2 (old): 404.2 ( 4.73x)
avg_qpel_pixels_tab[0][15]_sse2: 390.8 ( 4.89x)
put_no_rnd_qpel_pixels_tab[0][5]_c: 1864.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2 (old): 425.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2: 396.2 ( 4.71x)
put_no_rnd_qpel_pixels_tab[0][6]_c: 1767.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2 (old): 388.4 ( 4.55x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2: 377.7 ( 4.68x)
put_no_rnd_qpel_pixels_tab[0][7]_c: 1874.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2 (old): 427.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2: 400.0 ( 4.69x)
put_no_rnd_qpel_pixels_tab[0][9]_c: 1759.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2 (old): 393.0 ( 4.48x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2: 379.7 ( 4.63x)
put_no_rnd_qpel_pixels_tab[0][11]_c: 1820.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2 (old): 392.7 ( 4.64x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2: 377.4 ( 4.82x)
put_no_rnd_qpel_pixels_tab[0][13]_c: 1841.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2 (old): 427.1 ( 4.31x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2: 395.9 ( 4.65x)
put_no_rnd_qpel_pixels_tab[0][14]_c: 1761.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2 (old): 392.3 ( 4.49x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2: 375.9 ( 4.69x)
put_no_rnd_qpel_pixels_tab[0][15]_c: 1869.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2 (old): 425.6 ( 4.39x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2: 397.3 ( 4.70x)
put_qpel_pixels_tab[0][5]_c: 1888.2 ( 1.00x)
put_qpel_pixels_tab[0][5]_sse2 (old): 396.5 ( 4.76x)
put_qpel_pixels_tab[0][5]_sse2: 382.5 ( 4.94x)
put_qpel_pixels_tab[0][6]_c: 1760.4 ( 1.00x)
put_qpel_pixels_tab[0][6]_sse2 (old): 377.0 ( 4.67x)
put_qpel_pixels_tab[0][6]_sse2: 372.1 ( 4.73x)
put_qpel_pixels_tab[0][7]_c: 1927.6 ( 1.00x)
put_qpel_pixels_tab[0][7]_sse2 (old): 396.5 ( 4.86x)
put_qpel_pixels_tab[0][7]_sse2: 383.4 ( 5.03x)
put_qpel_pixels_tab[0][9]_c: 1775.9 ( 1.00x)
put_qpel_pixels_tab[0][9]_sse2 (old): 377.9 ( 4.70x)
put_qpel_pixels_tab[0][9]_sse2: 372.3 ( 4.77x)
put_qpel_pixels_tab[0][11]_c: 1809.0 ( 1.00x)
put_qpel_pixels_tab[0][11]_sse2 (old): 374.6 ( 4.83x)
put_qpel_pixels_tab[0][11]_sse2: 380.3 ( 4.76x)
put_qpel_pixels_tab[0][13]_c: 1893.2 ( 1.00x)
put_qpel_pixels_tab[0][13]_sse2 (old): 399.2 ( 4.74x)
put_qpel_pixels_tab[0][13]_sse2: 384.7 ( 4.92x)
put_qpel_pixels_tab[0][14]_c: 1756.2 ( 1.00x)
put_qpel_pixels_tab[0][14]_sse2 (old): 377.9 ( 4.65x)
put_qpel_pixels_tab[0][14]_sse2: 374.4 ( 4.69x)
put_qpel_pixels_tab[0][15]_c: 1922.8 ( 1.00x)
put_qpel_pixels_tab[0][15]_sse2 (old): 399.0 ( 4.82x)
put_qpel_pixels_tab[0][15]_sse2: 387.8 ( 4.96x)
The purely vertical size 16 mc functions now no longer use any MMX.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
dad0c01076
avcodec/x86/qpeldsp: Remove vertical MMXEXT mc functions
...
Superseded by SSE2.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
9beecb2670
avcodec/x86/qpeldsp: Add SSE2 vertical lowpass functions
...
Benchmarks ([4], [8] and [12] are pure vertical functions
and therefore show the biggest improvements):
avg_qpel_pixels_tab[0][4]_c: 844.5 ( 1.00x)
avg_qpel_pixels_tab[0][4]_mmxext: 225.5 ( 3.74x)
avg_qpel_pixels_tab[0][4]_sse2: 146.6 ( 5.76x)
avg_qpel_pixels_tab[0][5]_c: 1915.9 ( 1.00x)
avg_qpel_pixels_tab[0][5]_mmxext: 499.6 ( 3.83x)
avg_qpel_pixels_tab[0][5]_sse2: 405.5 ( 4.72x)
avg_qpel_pixels_tab[0][6]_c: 1775.9 ( 1.00x)
avg_qpel_pixels_tab[0][6]_mmxext: 484.9 ( 3.66x)
avg_qpel_pixels_tab[0][6]_sse2: 385.4 ( 4.61x)
avg_qpel_pixels_tab[0][7]_c: 1937.0 ( 1.00x)
avg_qpel_pixels_tab[0][7]_mmxext: 501.3 ( 3.86x)
avg_qpel_pixels_tab[0][7]_sse2: 403.6 ( 4.80x)
avg_qpel_pixels_tab[0][8]_c: 976.7 ( 1.00x)
avg_qpel_pixels_tab[0][8]_mmxext: 216.9 ( 4.50x)
avg_qpel_pixels_tab[0][8]_sse2: 113.1 ( 8.64x)
avg_qpel_pixels_tab[0][9]_c: 1971.8 ( 1.00x)
avg_qpel_pixels_tab[0][9]_mmxext: 494.9 ( 3.98x)
avg_qpel_pixels_tab[0][9]_sse2: 388.3 ( 5.08x)
avg_qpel_pixels_tab[0][10]_c: 1900.8 ( 1.00x)
avg_qpel_pixels_tab[0][10]_mmxext: 476.4 ( 3.99x)
avg_qpel_pixels_tab[0][10]_sse2: 362.4 ( 5.24x)
avg_qpel_pixels_tab[0][11]_c: 2003.3 ( 1.00x)
avg_qpel_pixels_tab[0][11]_mmxext: 496.5 ( 4.04x)
avg_qpel_pixels_tab[0][11]_sse2: 385.9 ( 5.19x)
avg_qpel_pixels_tab[0][12]_c: 841.8 ( 1.00x)
avg_qpel_pixels_tab[0][12]_mmxext: 226.7 ( 3.71x)
avg_qpel_pixels_tab[0][12]_sse2: 143.3 ( 5.87x)
avg_qpel_pixels_tab[0][13]_c: 1929.0 ( 1.00x)
avg_qpel_pixels_tab[0][13]_mmxext: 499.6 ( 3.86x)
avg_qpel_pixels_tab[0][13]_sse2: 412.1 ( 4.68x)
avg_qpel_pixels_tab[0][14]_c: 1777.9 ( 1.00x)
avg_qpel_pixels_tab[0][14]_mmxext: 484.8 ( 3.67x)
avg_qpel_pixels_tab[0][14]_sse2: 385.9 ( 4.61x)
avg_qpel_pixels_tab[0][15]_c: 1914.8 ( 1.00x)
avg_qpel_pixels_tab[0][15]_mmxext: 501.8 ( 3.82x)
avg_qpel_pixels_tab[0][15]_sse2: 405.0 ( 4.73x)
avg_qpel_pixels_tab[1][4]_c: 203.4 ( 1.00x)
avg_qpel_pixels_tab[1][4]_mmxext: 64.7 ( 3.14x)
avg_qpel_pixels_tab[1][4]_sse2: 40.3 ( 5.05x)
avg_qpel_pixels_tab[1][5]_c: 488.8 ( 1.00x)
avg_qpel_pixels_tab[1][5]_mmxext: 134.6 ( 3.63x)
avg_qpel_pixels_tab[1][5]_sse2: 108.5 ( 4.50x)
avg_qpel_pixels_tab[1][6]_c: 448.2 ( 1.00x)
avg_qpel_pixels_tab[1][6]_mmxext: 128.8 ( 3.48x)
avg_qpel_pixels_tab[1][6]_sse2: 102.5 ( 4.37x)
avg_qpel_pixels_tab[1][7]_c: 489.6 ( 1.00x)
avg_qpel_pixels_tab[1][7]_mmxext: 134.5 ( 3.64x)
avg_qpel_pixels_tab[1][7]_sse2: 108.8 ( 4.50x)
avg_qpel_pixels_tab[1][8]_c: 223.8 ( 1.00x)
avg_qpel_pixels_tab[1][8]_mmxext: 57.5 ( 3.89x)
avg_qpel_pixels_tab[1][8]_sse2: 36.3 ( 6.16x)
avg_qpel_pixels_tab[1][9]_c: 496.6 ( 1.00x)
avg_qpel_pixels_tab[1][9]_mmxext: 129.8 ( 3.82x)
avg_qpel_pixels_tab[1][9]_sse2: 105.1 ( 4.72x)
avg_qpel_pixels_tab[1][10]_c: 466.1 ( 1.00x)
avg_qpel_pixels_tab[1][10]_mmxext: 123.2 ( 3.78x)
avg_qpel_pixels_tab[1][10]_sse2: 99.1 ( 4.70x)
avg_qpel_pixels_tab[1][11]_c: 497.9 ( 1.00x)
avg_qpel_pixels_tab[1][11]_mmxext: 129.9 ( 3.83x)
avg_qpel_pixels_tab[1][11]_sse2: 105.4 ( 4.72x)
avg_qpel_pixels_tab[1][12]_c: 203.5 ( 1.00x)
avg_qpel_pixels_tab[1][12]_mmxext: 63.8 ( 3.19x)
avg_qpel_pixels_tab[1][12]_sse2: 38.8 ( 5.25x)
avg_qpel_pixels_tab[1][13]_c: 487.9 ( 1.00x)
avg_qpel_pixels_tab[1][13]_mmxext: 134.7 ( 3.62x)
avg_qpel_pixels_tab[1][13]_sse2: 108.4 ( 4.50x)
avg_qpel_pixels_tab[1][14]_c: 447.4 ( 1.00x)
avg_qpel_pixels_tab[1][14]_mmxext: 128.2 ( 3.49x)
avg_qpel_pixels_tab[1][14]_sse2: 102.4 ( 4.37x)
avg_qpel_pixels_tab[1][15]_c: 487.5 ( 1.00x)
avg_qpel_pixels_tab[1][15]_mmxext: 134.0 ( 3.64x)
avg_qpel_pixels_tab[1][15]_sse2: 109.9 ( 4.44x)
put_no_rnd_qpel_pixels_tab[0][4]_c: 825.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][4]_mmxext: 242.5 ( 3.40x)
put_no_rnd_qpel_pixels_tab[0][4]_sse2: 136.0 ( 6.07x)
put_no_rnd_qpel_pixels_tab[0][5]_c: 1837.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][5]_mmxext: 542.5 ( 3.39x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2: 446.5 ( 4.11x)
put_no_rnd_qpel_pixels_tab[0][6]_c: 1766.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_mmxext: 493.6 ( 3.58x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2: 394.6 ( 4.48x)
put_no_rnd_qpel_pixels_tab[0][7]_c: 1877.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_mmxext: 541.9 ( 3.46x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2: 447.6 ( 4.19x)
put_no_rnd_qpel_pixels_tab[0][8]_c: 785.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][8]_mmxext: 206.2 ( 3.81x)
put_no_rnd_qpel_pixels_tab[0][8]_sse2: 101.6 ( 7.73x)
put_no_rnd_qpel_pixels_tab[0][9]_c: 1772.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_mmxext: 489.5 ( 3.62x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2: 394.8 ( 4.49x)
put_no_rnd_qpel_pixels_tab[0][10]_c: 1711.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][10]_mmxext: 461.2 ( 3.71x)
put_no_rnd_qpel_pixels_tab[0][10]_sse2: 357.9 ( 4.78x)
put_no_rnd_qpel_pixels_tab[0][11]_c: 1815.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_mmxext: 490.8 ( 3.70x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2: 394.0 ( 4.61x)
put_no_rnd_qpel_pixels_tab[0][12]_c: 824.8 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][12]_mmxext: 242.9 ( 3.40x)
put_no_rnd_qpel_pixels_tab[0][12]_sse2: 135.3 ( 6.10x)
put_no_rnd_qpel_pixels_tab[0][13]_c: 1843.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_mmxext: 545.4 ( 3.38x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2: 444.9 ( 4.14x)
put_no_rnd_qpel_pixels_tab[0][14]_c: 1758.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_mmxext: 497.7 ( 3.53x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2: 393.5 ( 4.47x)
put_no_rnd_qpel_pixels_tab[0][15]_c: 1861.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_mmxext: 545.0 ( 3.42x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2: 445.7 ( 4.18x)
put_no_rnd_qpel_pixels_tab[1][4]_c: 198.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][4]_mmxext: 64.3 ( 3.08x)
put_no_rnd_qpel_pixels_tab[1][4]_sse2: 39.8 ( 4.98x)
put_no_rnd_qpel_pixels_tab[1][5]_c: 460.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][5]_mmxext: 137.2 ( 3.36x)
put_no_rnd_qpel_pixels_tab[1][5]_sse2: 113.5 ( 4.06x)
put_no_rnd_qpel_pixels_tab[1][6]_c: 441.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][6]_mmxext: 126.7 ( 3.49x)
put_no_rnd_qpel_pixels_tab[1][6]_sse2: 103.7 ( 4.26x)
put_no_rnd_qpel_pixels_tab[1][7]_c: 465.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][7]_mmxext: 137.7 ( 3.38x)
put_no_rnd_qpel_pixels_tab[1][7]_sse2: 114.0 ( 4.09x)
put_no_rnd_qpel_pixels_tab[1][8]_c: 193.8 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][8]_mmxext: 52.1 ( 3.72x)
put_no_rnd_qpel_pixels_tab[1][8]_sse2: 27.8 ( 6.97x)
put_no_rnd_qpel_pixels_tab[1][9]_c: 450.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][9]_mmxext: 126.2 ( 3.57x)
put_no_rnd_qpel_pixels_tab[1][9]_sse2: 104.3 ( 4.32x)
put_no_rnd_qpel_pixels_tab[1][10]_c: 436.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][10]_mmxext: 118.1 ( 3.69x)
put_no_rnd_qpel_pixels_tab[1][10]_sse2: 92.4 ( 4.73x)
put_no_rnd_qpel_pixels_tab[1][11]_c: 453.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][11]_mmxext: 128.7 ( 3.52x)
put_no_rnd_qpel_pixels_tab[1][11]_sse2: 103.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[1][12]_c: 201.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][12]_mmxext: 64.2 ( 3.13x)
put_no_rnd_qpel_pixels_tab[1][12]_sse2: 39.6 ( 5.08x)
put_no_rnd_qpel_pixels_tab[1][13]_c: 461.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][13]_mmxext: 137.6 ( 3.36x)
put_no_rnd_qpel_pixels_tab[1][13]_sse2: 113.4 ( 4.07x)
put_no_rnd_qpel_pixels_tab[1][14]_c: 442.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][14]_mmxext: 127.0 ( 3.49x)
put_no_rnd_qpel_pixels_tab[1][14]_sse2: 102.2 ( 4.33x)
put_no_rnd_qpel_pixels_tab[1][15]_c: 462.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][15]_mmxext: 139.5 ( 3.32x)
put_no_rnd_qpel_pixels_tab[1][15]_sse2: 113.3 ( 4.09x)
put_qpel_pixels_tab[0][4]_c: 824.6 ( 1.00x)
put_qpel_pixels_tab[0][4]_mmxext: 220.1 ( 3.75x)
put_qpel_pixels_tab[0][4]_sse2: 137.8 ( 5.98x)
put_qpel_pixels_tab[0][5]_c: 1892.0 ( 1.00x)
put_qpel_pixels_tab[0][5]_mmxext: 508.0 ( 3.72x)
put_qpel_pixels_tab[0][5]_sse2: 408.6 ( 4.63x)
put_qpel_pixels_tab[0][6]_c: 1758.0 ( 1.00x)
put_qpel_pixels_tab[0][6]_mmxext: 476.7 ( 3.69x)
put_qpel_pixels_tab[0][6]_sse2: 381.4 ( 4.61x)
put_qpel_pixels_tab[0][7]_c: 1924.3 ( 1.00x)
put_qpel_pixels_tab[0][7]_mmxext: 495.1 ( 3.89x)
put_qpel_pixels_tab[0][7]_sse2: 417.2 ( 4.61x)
put_qpel_pixels_tab[0][8]_c: 772.1 ( 1.00x)
put_qpel_pixels_tab[0][8]_mmxext: 197.5 ( 3.91x)
put_qpel_pixels_tab[0][8]_sse2: 118.4 ( 6.52x)
put_qpel_pixels_tab[0][9]_c: 1778.2 ( 1.00x)
put_qpel_pixels_tab[0][9]_mmxext: 476.7 ( 3.73x)
put_qpel_pixels_tab[0][9]_sse2: 379.6 ( 4.68x)
put_qpel_pixels_tab[0][10]_c: 1714.6 ( 1.00x)
put_qpel_pixels_tab[0][10]_mmxext: 460.7 ( 3.72x)
put_qpel_pixels_tab[0][10]_sse2: 386.8 ( 4.43x)
put_qpel_pixels_tab[0][11]_c: 1819.1 ( 1.00x)
put_qpel_pixels_tab[0][11]_mmxext: 474.9 ( 3.83x)
put_qpel_pixels_tab[0][11]_sse2: 404.5 ( 4.50x)
put_qpel_pixels_tab[0][12]_c: 829.7 ( 1.00x)
put_qpel_pixels_tab[0][12]_mmxext: 221.5 ( 3.75x)
put_qpel_pixels_tab[0][12]_sse2: 138.7 ( 5.98x)
put_qpel_pixels_tab[0][13]_c: 1892.8 ( 1.00x)
put_qpel_pixels_tab[0][13]_mmxext: 494.4 ( 3.83x)
put_qpel_pixels_tab[0][13]_sse2: 413.9 ( 4.57x)
put_qpel_pixels_tab[0][14]_c: 1763.1 ( 1.00x)
put_qpel_pixels_tab[0][14]_mmxext: 473.4 ( 3.72x)
put_qpel_pixels_tab[0][14]_sse2: 377.8 ( 4.67x)
put_qpel_pixels_tab[0][15]_c: 1896.4 ( 1.00x)
put_qpel_pixels_tab[0][15]_mmxext: 492.5 ( 3.85x)
put_qpel_pixels_tab[0][15]_sse2: 399.0 ( 4.75x)
put_qpel_pixels_tab[1][4]_c: 198.6 ( 1.00x)
put_qpel_pixels_tab[1][4]_mmxext: 60.9 ( 3.26x)
put_qpel_pixels_tab[1][4]_sse2: 40.1 ( 4.95x)
put_qpel_pixels_tab[1][5]_c: 471.4 ( 1.00x)
put_qpel_pixels_tab[1][5]_mmxext: 131.8 ( 3.58x)
put_qpel_pixels_tab[1][5]_sse2: 107.2 ( 4.40x)
put_qpel_pixels_tab[1][6]_c: 440.3 ( 1.00x)
put_qpel_pixels_tab[1][6]_mmxext: 126.3 ( 3.49x)
put_qpel_pixels_tab[1][6]_sse2: 100.6 ( 4.38x)
put_qpel_pixels_tab[1][7]_c: 469.2 ( 1.00x)
put_qpel_pixels_tab[1][7]_mmxext: 131.7 ( 3.56x)
put_qpel_pixels_tab[1][7]_sse2: 106.9 ( 4.39x)
put_qpel_pixels_tab[1][8]_c: 194.2 ( 1.00x)
put_qpel_pixels_tab[1][8]_mmxext: 52.9 ( 3.67x)
put_qpel_pixels_tab[1][8]_sse2: 28.0 ( 6.95x)
put_qpel_pixels_tab[1][9]_c: 464.6 ( 1.00x)
put_qpel_pixels_tab[1][9]_mmxext: 125.1 ( 3.71x)
put_qpel_pixels_tab[1][9]_sse2: 100.9 ( 4.60x)
put_qpel_pixels_tab[1][10]_c: 433.8 ( 1.00x)
put_qpel_pixels_tab[1][10]_mmxext: 118.2 ( 3.67x)
put_qpel_pixels_tab[1][10]_sse2: 94.5 ( 4.59x)
put_qpel_pixels_tab[1][11]_c: 463.9 ( 1.00x)
put_qpel_pixels_tab[1][11]_mmxext: 125.5 ( 3.70x)
put_qpel_pixels_tab[1][11]_sse2: 102.6 ( 4.52x)
put_qpel_pixels_tab[1][12]_c: 199.2 ( 1.00x)
put_qpel_pixels_tab[1][12]_mmxext: 63.7 ( 3.12x)
put_qpel_pixels_tab[1][12]_sse2: 36.2 ( 5.50x)
put_qpel_pixels_tab[1][13]_c: 475.6 ( 1.00x)
put_qpel_pixels_tab[1][13]_mmxext: 139.5 ( 3.41x)
put_qpel_pixels_tab[1][13]_sse2: 107.3 ( 4.43x)
put_qpel_pixels_tab[1][14]_c: 441.9 ( 1.00x)
put_qpel_pixels_tab[1][14]_mmxext: 126.9 ( 3.48x)
put_qpel_pixels_tab[1][14]_sse2: 101.3 ( 4.36x)
put_qpel_pixels_tab[1][15]_c: 475.9 ( 1.00x)
put_qpel_pixels_tab[1][15]_mmxext: 131.9 ( 3.61x)
put_qpel_pixels_tab[1][15]_sse2: 107.0 ( 4.45x)
The new functions (in qpeldsp.asm) occupy 8244B (the MMXEXT functions
which they will replace occupy only 6720B).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
405465700c
avcodec/x86/qpeldsp: Don't allocate stack unnecessarily
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
188df9549c
avcodec/x86/qpeldsp: Don't use too much stack
...
We only need (SIZE+1)*SIZE words.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
bcf7293a21
avcodec/x86/qpeldsp: Remove unused declaration
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
7b56259dd5
avcodec/x86/constants: Move ff_pw_{15,20} to qpeldsp.asm
...
Only used there.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
c2685234a6
avcodec/x86/qpeldsp_init: Deduplicate 8x8 and 16x16 code
...
Also split the big macro into smaller ones for the pure horizontal vs
the pure vertical and the mixed directions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
cf79d8052d
avcodec/x86/qpeldsp_init: Specify alignment properly
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
69906d31c5
avcodec/x86/qpeldsp_init: Don't use unnecessarily big stack buffer
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
d3bd1318b3
avcodec/x86/qpeldsp: Don't zero unnecessarily
...
This value is write-only.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
d46414b46b
avcodec/x86/qpeldsp: Simplify resetting output pointer
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Martin Storsjö
963ea707e3
arm/rv40dsp: Add * on comment continuation lines in prototypes
...
This avoids that the assembly indenter script tries to indent these
lines as assembly code.
2026-04-29 13:53:07 +03:00
Martin Storsjö
0a86aead82
arm/vc1dsp: Fix a few cases of inconsistent indentation
...
The function ff_vc1_unescape_buffer_helper_neon intentionally
uses unusual indentation, to indicate different levels of
unrolling in the function.
2026-04-29 13:53:07 +03:00
Martin Storsjö
10a45072fc
arm/jrevdct: Indent previously unindented assembly
...
The comments have been manually tweaked to line up properly.
2026-04-29 13:53:07 +03:00
Martin Storsjö
5e0f1b1eda
arm/hevcdsp_qpel: Reindent code that seem to lack consistent indentation
2026-04-29 13:53:07 +03:00
Martin Storsjö
65d4c5bbe2
arm: Reindent asm that used consistent but differing styles
...
The qpel_filter macros in hevcdsp_qpel_neon.S have been
manually tweaked to keep reasonable indentation of the
comments.
2026-04-29 13:53:07 +03:00
Martin Storsjö
2325421904
arm/synth_filter_vfp: Fix indentation
...
This was done with manual adjustments; the reindentation
script doesn't handle the VFP/NOVFP macros at the start of
lines.
2026-04-29 13:53:07 +03:00
Ramiro Polla
8d9c1db95d
arm/simple_idct_arm: Reindent previously unindented code
2026-04-29 13:53:07 +03:00
Martin Storsjö
a65ed248fd
arm/simple_idct_armv6: Reindent previously consistent assembly to shared style
...
This has manual fixups, as the indenting script wants to
lowercase constants like W46 to w46, which breaks things.
2026-04-29 13:49:27 +03:00
Martin Storsjö
b27fd61020
arm/simple_idct_armv5te: Reindent previously consistent code to common style
...
This has manual fixups, as the indenting script wants to
lowercase constants like W26 to w26, which breaks things.
2026-04-29 13:49:27 +03:00
Martin Storsjö
8e199a2a9f
arm/rv34dsp: Adjust macro argument indentation slightly
...
The previous form did neatly align with the lines above, but doesn't
match general indentation rules from our indentation script.
2026-04-29 13:49:27 +03:00
Martin Storsjö
d94e2b0f7c
arm/hevcdsp: Fix misindented instructions in some macros
2026-04-29 13:49:27 +03:00