Andreas Rheinhardt
cc3ca17127
avcodec/x86/qpeldsp{,_init}: Use proper prefix
...
E.g. rename ff_put_mpeg4_qpel8_h_lowpass_ssse3 to
ff_mpeg4_put_qpel8_h_lowpass_ssse3.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
ca43bc6202
avcodec/x86/qpeldsp_init: Mark functions as hidden
...
It allows pic 32bit code to call the underlying
assembly functions directly, without loading
the GOT first; this saves 1245B of .text here
(for 32bit pic code).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
23d3116af9
avcodec/x86/qpeldsp: Add combination of h_lowpass + l2
...
If the subpel part of the horizontal component of
the motion vector is 1/4 or 3/4, the MPEG-4 qpel motion compensation
first computes the mc for the corresponding motion vector
with 1/2 horizontal subpel part and then averages this
with the left (for 1/4) or the right (for 3/4) source pixel.
These two stages are currently performed in two different functions,
involving a stack buffer as intermediate.
This means that horizontal prediction for every function with
a 1/4 or 3/4 horizontal subpel mv is more expensive code-size wise
(and also performance-wise) as it involves two calls. Given that
the horizontal lowpass functions are not that long, adding combinations
of h_lowpass+l2 actually reduces binary size: An increase of 1136B
in the asm files is more than offset by size reductions in
the wrappers: 1968B here when not using stack protection,
2256B when using stack protection.
Of course it also improves performance. Old benchmarks:
avg_qpel_pixels_tab[0][1]_ssse3: 106.9 ( 8.69x)
avg_qpel_pixels_tab[0][3]_ssse3: 105.5 ( 8.84x)
avg_qpel_pixels_tab[0][5]_ssse3: 226.9 ( 8.57x)
avg_qpel_pixels_tab[0][7]_ssse3: 231.1 ( 8.38x)
avg_qpel_pixels_tab[0][9]_ssse3: 217.8 ( 9.04x)
avg_qpel_pixels_tab[0][11]_ssse3: 214.9 ( 9.32x)
avg_qpel_pixels_tab[0][13]_ssse3: 227.1 ( 8.48x)
avg_qpel_pixels_tab[0][15]_ssse3: 236.1 ( 8.02x)
New benchmarks:
avg_qpel_pixels_tab[0][1]_ssse3: 96.7 ( 9.65x)
avg_qpel_pixels_tab[0][3]_ssse3: 96.6 ( 9.73x)
avg_qpel_pixels_tab[0][5]_ssse3: 225.8 ( 8.61x)
avg_qpel_pixels_tab[0][7]_ssse3: 228.4 ( 8.51x)
avg_qpel_pixels_tab[0][9]_ssse3: 217.1 ( 9.05x)
avg_qpel_pixels_tab[0][11]_ssse3: 217.8 ( 9.32x)
avg_qpel_pixels_tab[0][13]_ssse3: 227.2 ( 8.54x)
avg_qpel_pixels_tab[0][15]_ssse3: 220.5 ( 8.72x)
Note: The l2 functions are also used for vertical lowpass
functions, yet given that they are much bigger, duplicating
them would lead to massive code size increase.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
f946cac2d9
avcodec/x86/qpeldsp: Remove horizontal mmxext mc functions
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
1d040c527d
avcodec/x86/qpeldsp: Add SSSE3 size 8 horizontal filter
...
Beats the mmxext version by a lot (in the following,
[1][1-3] refers to horizontal-only size 8 mc;
the _sse2 comparators for the other cases use mmxext
horizontal mc coupled with vertical SSE2 mc):
avg_qpel_pixels_tab[1][1]_c: 223.9 ( 1.00x)
avg_qpel_pixels_tab[1][1]_mmxext: 66.2 ( 3.38x)
avg_qpel_pixels_tab[1][1]_ssse3: 36.8 ( 6.08x)
avg_qpel_pixels_tab[1][2]_c: 251.0 ( 1.00x)
avg_qpel_pixels_tab[1][2]_mmxext: 58.5 ( 4.29x)
avg_qpel_pixels_tab[1][2]_ssse3: 25.5 ( 9.84x)
avg_qpel_pixels_tab[1][3]_c: 226.9 ( 1.00x)
avg_qpel_pixels_tab[1][3]_mmxext: 66.3 ( 3.42x)
avg_qpel_pixels_tab[1][3]_ssse3: 35.8 ( 6.34x)
avg_qpel_pixels_tab[1][5]_c: 473.9 ( 1.00x)
avg_qpel_pixels_tab[1][5]_sse2: 110.7 ( 4.28x)
avg_qpel_pixels_tab[1][5]_ssse3: 76.0 ( 6.24x)
avg_qpel_pixels_tab[1][6]_c: 440.9 ( 1.00x)
avg_qpel_pixels_tab[1][6]_sse2: 102.1 ( 4.32x)
avg_qpel_pixels_tab[1][6]_ssse3: 67.1 ( 6.58x)
avg_qpel_pixels_tab[1][7]_c: 473.8 ( 1.00x)
avg_qpel_pixels_tab[1][7]_sse2: 108.0 ( 4.39x)
avg_qpel_pixels_tab[1][7]_ssse3: 74.6 ( 6.35x)
avg_qpel_pixels_tab[1][9]_c: 492.9 ( 1.00x)
avg_qpel_pixels_tab[1][9]_sse2: 102.1 ( 4.83x)
avg_qpel_pixels_tab[1][9]_ssse3: 67.1 ( 7.35x)
avg_qpel_pixels_tab[1][10]_c: 465.6 ( 1.00x)
avg_qpel_pixels_tab[1][10]_sse2: 94.9 ( 4.91x)
avg_qpel_pixels_tab[1][10]_ssse3: 57.5 ( 8.10x)
avg_qpel_pixels_tab[1][11]_c: 492.8 ( 1.00x)
avg_qpel_pixels_tab[1][11]_sse2: 102.4 ( 4.81x)
avg_qpel_pixels_tab[1][11]_ssse3: 68.7 ( 7.17x)
avg_qpel_pixels_tab[1][13]_c: 476.6 ( 1.00x)
avg_qpel_pixels_tab[1][13]_sse2: 108.6 ( 4.39x)
avg_qpel_pixels_tab[1][13]_ssse3: 74.7 ( 6.38x)
avg_qpel_pixels_tab[1][14]_c: 434.9 ( 1.00x)
avg_qpel_pixels_tab[1][14]_sse2: 102.2 ( 4.25x)
avg_qpel_pixels_tab[1][14]_ssse3: 66.6 ( 6.53x)
avg_qpel_pixels_tab[1][15]_c: 474.1 ( 1.00x)
avg_qpel_pixels_tab[1][15]_sse2: 107.9 ( 4.39x)
avg_qpel_pixels_tab[1][15]_ssse3: 74.3 ( 6.38x)
put_no_rnd_qpel_pixels_tab[1][1]_c: 222.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][1]_mmxext: 66.0 ( 3.37x)
put_no_rnd_qpel_pixels_tab[1][1]_ssse3: 35.2 ( 6.31x)
put_no_rnd_qpel_pixels_tab[1][2]_c: 212.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][2]_mmxext: 56.8 ( 3.74x)
put_no_rnd_qpel_pixels_tab[1][2]_ssse3: 25.0 ( 8.48x)
put_no_rnd_qpel_pixels_tab[1][3]_c: 224.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][3]_mmxext: 65.8 ( 3.41x)
put_no_rnd_qpel_pixels_tab[1][3]_ssse3: 35.8 ( 6.26x)
put_no_rnd_qpel_pixels_tab[1][5]_c: 460.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][5]_sse2: 114.6 ( 4.01x)
put_no_rnd_qpel_pixels_tab[1][5]_ssse3: 83.1 ( 5.53x)
put_no_rnd_qpel_pixels_tab[1][6]_c: 438.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][6]_sse2: 104.2 ( 4.21x)
put_no_rnd_qpel_pixels_tab[1][6]_ssse3: 67.5 ( 6.50x)
put_no_rnd_qpel_pixels_tab[1][7]_c: 458.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][7]_sse2: 113.8 ( 4.02x)
put_no_rnd_qpel_pixels_tab[1][7]_ssse3: 79.9 ( 5.73x)
put_no_rnd_qpel_pixels_tab[1][9]_c: 439.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][9]_sse2: 103.7 ( 4.23x)
put_no_rnd_qpel_pixels_tab[1][9]_ssse3: 68.9 ( 6.37x)
put_no_rnd_qpel_pixels_tab[1][10]_c: 427.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][10]_sse2: 93.2 ( 4.58x)
put_no_rnd_qpel_pixels_tab[1][10]_ssse3: 57.9 ( 7.37x)
put_no_rnd_qpel_pixels_tab[1][11]_c: 439.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][11]_sse2: 104.0 ( 4.23x)
put_no_rnd_qpel_pixels_tab[1][11]_ssse3: 69.2 ( 6.36x)
put_no_rnd_qpel_pixels_tab[1][13]_c: 459.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][13]_sse2: 113.2 ( 4.06x)
put_no_rnd_qpel_pixels_tab[1][13]_ssse3: 83.8 ( 5.48x)
put_no_rnd_qpel_pixels_tab[1][14]_c: 439.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][14]_sse2: 103.3 ( 4.25x)
put_no_rnd_qpel_pixels_tab[1][14]_ssse3: 67.9 ( 6.47x)
put_no_rnd_qpel_pixels_tab[1][15]_c: 453.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][15]_sse2: 113.7 ( 3.99x)
put_no_rnd_qpel_pixels_tab[1][15]_ssse3: 80.0 ( 5.67x)
put_qpel_pixels_tab[1][1]_c: 229.0 ( 1.00x)
put_qpel_pixels_tab[1][1]_mmxext: 65.5 ( 3.50x)
put_qpel_pixels_tab[1][1]_ssse3: 33.8 ( 6.77x)
put_qpel_pixels_tab[1][2]_c: 212.5 ( 1.00x)
put_qpel_pixels_tab[1][2]_mmxext: 56.6 ( 3.75x)
put_qpel_pixels_tab[1][2]_ssse3: 23.4 ( 9.08x)
put_qpel_pixels_tab[1][3]_c: 227.5 ( 1.00x)
put_qpel_pixels_tab[1][3]_mmxext: 64.4 ( 3.53x)
put_qpel_pixels_tab[1][3]_ssse3: 33.5 ( 6.79x)
put_qpel_pixels_tab[1][5]_c: 466.5 ( 1.00x)
put_qpel_pixels_tab[1][5]_sse2: 106.8 ( 4.37x)
put_qpel_pixels_tab[1][5]_ssse3: 71.8 ( 6.50x)
put_qpel_pixels_tab[1][6]_c: 438.7 ( 1.00x)
put_qpel_pixels_tab[1][6]_sse2: 102.0 ( 4.30x)
put_qpel_pixels_tab[1][6]_ssse3: 65.3 ( 6.72x)
put_qpel_pixels_tab[1][7]_c: 466.0 ( 1.00x)
put_qpel_pixels_tab[1][7]_sse2: 106.3 ( 4.38x)
put_qpel_pixels_tab[1][7]_ssse3: 70.9 ( 6.57x)
put_qpel_pixels_tab[1][9]_c: 456.0 ( 1.00x)
put_qpel_pixels_tab[1][9]_sse2: 100.1 ( 4.55x)
put_qpel_pixels_tab[1][9]_ssse3: 64.0 ( 7.13x)
put_qpel_pixels_tab[1][10]_c: 425.1 ( 1.00x)
put_qpel_pixels_tab[1][10]_sse2: 92.6 ( 4.59x)
put_qpel_pixels_tab[1][10]_ssse3: 55.1 ( 7.71x)
put_qpel_pixels_tab[1][11]_c: 452.7 ( 1.00x)
put_qpel_pixels_tab[1][11]_sse2: 99.6 ( 4.55x)
put_qpel_pixels_tab[1][11]_ssse3: 63.8 ( 7.09x)
put_qpel_pixels_tab[1][13]_c: 471.2 ( 1.00x)
put_qpel_pixels_tab[1][13]_sse2: 106.4 ( 4.43x)
put_qpel_pixels_tab[1][13]_ssse3: 71.4 ( 6.60x)
put_qpel_pixels_tab[1][14]_c: 439.7 ( 1.00x)
put_qpel_pixels_tab[1][14]_sse2: 101.8 ( 4.32x)
put_qpel_pixels_tab[1][14]_ssse3: 64.8 ( 6.79x)
put_qpel_pixels_tab[1][15]_c: 467.8 ( 1.00x)
put_qpel_pixels_tab[1][15]_sse2: 106.1 ( 4.41x)
put_qpel_pixels_tab[1][15]_ssse3: 72.6 ( 6.44x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
c0e1c1d6b3
avcodec/x86/qpeldsp: Add SSSE3 size 16 horizontal filter
...
Beats the mmxext version by a lot (in the following,
[0][1-3] refers to horizontal-only size 16 mc;
the _sse2 comparators for the other cases use mmxext
horizontal mc coupled with vertical SSE2 mc):
avg_qpel_pixels_tab[0][1]_c: 945.5 ( 1.00x)
avg_qpel_pixels_tab[0][1]_mmxext: 262.6 ( 3.60x)
avg_qpel_pixels_tab[0][1]_ssse3: 110.4 ( 8.57x)
avg_qpel_pixels_tab[0][2]_c: 1042.1 ( 1.00x)
avg_qpel_pixels_tab[0][2]_mmxext: 245.1 ( 4.25x)
avg_qpel_pixels_tab[0][2]_ssse3: 91.7 (11.37x)
avg_qpel_pixels_tab[0][3]_c: 941.8 ( 1.00x)
avg_qpel_pixels_tab[0][3]_mmxext: 260.1 ( 3.62x)
avg_qpel_pixels_tab[0][3]_ssse3: 110.1 ( 8.56x)
avg_qpel_pixels_tab[0][5]_c: 1939.5 ( 1.00x)
avg_qpel_pixels_tab[0][5]_sse2: 394.3 ( 4.92x)
avg_qpel_pixels_tab[0][5]_ssse3: 247.4 ( 7.84x)
avg_qpel_pixels_tab[0][6]_c: 1785.8 ( 1.00x)
avg_qpel_pixels_tab[0][6]_sse2: 380.6 ( 4.69x)
avg_qpel_pixels_tab[0][6]_ssse3: 221.1 ( 8.08x)
avg_qpel_pixels_tab[0][7]_c: 1932.5 ( 1.00x)
avg_qpel_pixels_tab[0][7]_sse2: 393.4 ( 4.91x)
avg_qpel_pixels_tab[0][7]_ssse3: 238.8 ( 8.09x)
avg_qpel_pixels_tab[0][9]_c: 1976.9 ( 1.00x)
avg_qpel_pixels_tab[0][9]_sse2: 380.8 ( 5.19x)
avg_qpel_pixels_tab[0][9]_ssse3: 223.3 ( 8.85x)
avg_qpel_pixels_tab[0][10]_c: 1911.9 ( 1.00x)
avg_qpel_pixels_tab[0][10]_sse2: 366.9 ( 5.21x)
avg_qpel_pixels_tab[0][10]_ssse3: 207.0 ( 9.24x)
avg_qpel_pixels_tab[0][11]_c: 2046.9 ( 1.00x)
avg_qpel_pixels_tab[0][11]_sse2: 385.5 ( 5.31x)
avg_qpel_pixels_tab[0][11]_ssse3: 227.9 ( 8.98x)
avg_qpel_pixels_tab[0][13]_c: 1940.8 ( 1.00x)
avg_qpel_pixels_tab[0][13]_sse2: 389.7 ( 4.98x)
avg_qpel_pixels_tab[0][13]_ssse3: 244.2 ( 7.95x)
avg_qpel_pixels_tab[0][14]_c: 1778.4 ( 1.00x)
avg_qpel_pixels_tab[0][14]_sse2: 379.2 ( 4.69x)
avg_qpel_pixels_tab[0][14]_ssse3: 223.5 ( 7.96x)
avg_qpel_pixels_tab[0][15]_c: 1905.9 ( 1.00x)
avg_qpel_pixels_tab[0][15]_sse2: 398.9 ( 4.78x)
avg_qpel_pixels_tab[0][15]_ssse3: 238.3 ( 8.00x)
put_no_rnd_qpel_pixels_tab[0][1]_c: 922.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][1]_mmxext: 275.0 ( 3.35x)
put_no_rnd_qpel_pixels_tab[0][1]_ssse3: 108.4 ( 8.51x)
put_no_rnd_qpel_pixels_tab[0][2]_c: 889.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][2]_mmxext: 236.7 ( 3.76x)
put_no_rnd_qpel_pixels_tab[0][2]_ssse3: 86.8 (10.25x)
put_no_rnd_qpel_pixels_tab[0][3]_c: 915.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][3]_mmxext: 274.3 ( 3.34x)
put_no_rnd_qpel_pixels_tab[0][3]_ssse3: 108.2 ( 8.46x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2: 400.0 ( 4.63x)
put_no_rnd_qpel_pixels_tab[0][5]_ssse3: 246.0 ( 7.53x)
put_no_rnd_qpel_pixels_tab[0][6]_c: 1753.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2: 382.5 ( 4.59x)
put_no_rnd_qpel_pixels_tab[0][6]_ssse3: 226.4 ( 7.75x)
put_no_rnd_qpel_pixels_tab[0][7]_c: 1854.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2: 393.5 ( 4.71x)
put_no_rnd_qpel_pixels_tab[0][7]_ssse3: 248.6 ( 7.46x)
put_no_rnd_qpel_pixels_tab[0][9]_c: 1794.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2: 382.2 ( 4.70x)
put_no_rnd_qpel_pixels_tab[0][9]_ssse3: 228.0 ( 7.87x)
put_no_rnd_qpel_pixels_tab[0][10]_c: 1724.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][10]_sse2: 353.8 ( 4.88x)
put_no_rnd_qpel_pixels_tab[0][10]_ssse3: 206.5 ( 8.35x)
put_no_rnd_qpel_pixels_tab[0][11]_c: 1796.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2: 378.1 ( 4.75x)
put_no_rnd_qpel_pixels_tab[0][11]_ssse3: 227.1 ( 7.91x)
put_no_rnd_qpel_pixels_tab[0][13]_c: 1834.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2: 400.7 ( 4.58x)
put_no_rnd_qpel_pixels_tab[0][13]_ssse3: 244.2 ( 7.51x)
put_no_rnd_qpel_pixels_tab[0][14]_c: 1755.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2: 387.2 ( 4.53x)
put_no_rnd_qpel_pixels_tab[0][14]_ssse3: 226.8 ( 7.74x)
put_no_rnd_qpel_pixels_tab[0][15]_c: 1847.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2: 400.6 ( 4.61x)
put_no_rnd_qpel_pixels_tab[0][15]_ssse3: 246.1 ( 7.51x)
put_qpel_pixels_tab[0][1]_c: 919.6 ( 1.00x)
put_qpel_pixels_tab[0][1]_mmxext: 255.5 ( 3.60x)
put_qpel_pixels_tab[0][1]_ssse3: 108.3 ( 8.49x)
put_qpel_pixels_tab[0][2]_c: 883.9 ( 1.00x)
put_qpel_pixels_tab[0][2]_mmxext: 238.1 ( 3.71x)
put_qpel_pixels_tab[0][2]_ssse3: 86.7 (10.19x)
put_qpel_pixels_tab[0][3]_c: 921.9 ( 1.00x)
put_qpel_pixels_tab[0][3]_mmxext: 258.9 ( 3.56x)
put_qpel_pixels_tab[0][3]_ssse3: 108.1 ( 8.53x)
put_qpel_pixels_tab[0][5]_c: 1907.5 ( 1.00x)
put_qpel_pixels_tab[0][5]_sse2: 384.2 ( 4.96x)
put_qpel_pixels_tab[0][5]_ssse3: 234.8 ( 8.13x)
put_qpel_pixels_tab[0][6]_c: 1757.4 ( 1.00x)
put_qpel_pixels_tab[0][6]_sse2: 382.8 ( 4.59x)
put_qpel_pixels_tab[0][6]_ssse3: 217.6 ( 8.08x)
put_qpel_pixels_tab[0][7]_c: 1927.5 ( 1.00x)
put_qpel_pixels_tab[0][7]_sse2: 384.6 ( 5.01x)
put_qpel_pixels_tab[0][7]_ssse3: 231.2 ( 8.34x)
put_qpel_pixels_tab[0][9]_c: 1832.1 ( 1.00x)
put_qpel_pixels_tab[0][9]_sse2: 374.8 ( 4.89x)
put_qpel_pixels_tab[0][9]_ssse3: 219.4 ( 8.35x)
put_qpel_pixels_tab[0][10]_c: 1710.3 ( 1.00x)
put_qpel_pixels_tab[0][10]_sse2: 384.5 ( 4.45x)
put_qpel_pixels_tab[0][10]_ssse3: 202.9 ( 8.43x)
put_qpel_pixels_tab[0][11]_c: 1825.0 ( 1.00x)
put_qpel_pixels_tab[0][11]_sse2: 369.6 ( 4.94x)
put_qpel_pixels_tab[0][11]_ssse3: 216.8 ( 8.42x)
put_qpel_pixels_tab[0][13]_c: 1898.4 ( 1.00x)
put_qpel_pixels_tab[0][13]_sse2: 384.9 ( 4.93x)
put_qpel_pixels_tab[0][13]_ssse3: 238.6 ( 7.96x)
put_qpel_pixels_tab[0][14]_c: 1779.1 ( 1.00x)
put_qpel_pixels_tab[0][14]_sse2: 373.3 ( 4.77x)
put_qpel_pixels_tab[0][14]_ssse3: 218.1 ( 8.16x)
put_qpel_pixels_tab[0][15]_c: 1918.2 ( 1.00x)
put_qpel_pixels_tab[0][15]_sse2: 385.3 ( 4.98x)
put_qpel_pixels_tab[0][15]_ssse3: 236.8 ( 8.10x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
a3d747f344
avcodec/x86/qpeldsp{,_init}: Use SSE2 pixels16x16_l2 functions
...
put and avg versions have been added and used in H264
in b91081274f . This commit
adds the size 16 version of put_no_rnd and uses all three
of them in the SSE2 size 16 qpel functions (i.e. it uses
them in the ones that have a vertical component); it also
removes the 16x17 MMXEXT versions (which are no longer used).
This is particularly beneficial for put_no_rnd:
avg_qpel_pixels_tab[0][5]_c: 1910.9 ( 1.00x)
avg_qpel_pixels_tab[0][5]_sse2 (old): 405.1 ( 4.72x)
avg_qpel_pixels_tab[0][5]_sse2: 392.9 ( 4.86x)
avg_qpel_pixels_tab[0][6]_c: 1778.9 ( 1.00x)
avg_qpel_pixels_tab[0][6]_sse2 (old): 385.5 ( 4.61x)
avg_qpel_pixels_tab[0][6]_sse2: 374.9 ( 4.75x)
avg_qpel_pixels_tab[0][7]_c: 1935.3 ( 1.00x)
avg_qpel_pixels_tab[0][7]_sse2 (old): 403.1 ( 4.80x)
avg_qpel_pixels_tab[0][7]_sse2: 391.6 ( 4.94x)
avg_qpel_pixels_tab[0][9]_c: 1969.0 ( 1.00x)
avg_qpel_pixels_tab[0][9]_sse2 (old): 384.1 ( 5.13x)
avg_qpel_pixels_tab[0][9]_sse2: 380.3 ( 5.18x)
avg_qpel_pixels_tab[0][11]_c: 2014.9 ( 1.00x)
avg_qpel_pixels_tab[0][11]_sse2 (old): 385.6 ( 5.23x)
avg_qpel_pixels_tab[0][11]_sse2: 380.2 ( 5.30x)
avg_qpel_pixels_tab[0][13]_c: 1925.7 ( 1.00x)
avg_qpel_pixels_tab[0][13]_sse2 (old): 406.1 ( 4.74x)
avg_qpel_pixels_tab[0][13]_sse2: 390.4 ( 4.93x)
avg_qpel_pixels_tab[0][14]_c: 1793.0 ( 1.00x)
avg_qpel_pixels_tab[0][14]_sse2 (old): 389.6 ( 4.60x)
avg_qpel_pixels_tab[0][14]_sse2: 377.1 ( 4.75x)
avg_qpel_pixels_tab[0][15]_c: 1913.0 ( 1.00x)
avg_qpel_pixels_tab[0][15]_sse2 (old): 404.2 ( 4.73x)
avg_qpel_pixels_tab[0][15]_sse2: 390.8 ( 4.89x)
put_no_rnd_qpel_pixels_tab[0][5]_c: 1864.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2 (old): 425.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2: 396.2 ( 4.71x)
put_no_rnd_qpel_pixels_tab[0][6]_c: 1767.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2 (old): 388.4 ( 4.55x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2: 377.7 ( 4.68x)
put_no_rnd_qpel_pixels_tab[0][7]_c: 1874.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2 (old): 427.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2: 400.0 ( 4.69x)
put_no_rnd_qpel_pixels_tab[0][9]_c: 1759.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2 (old): 393.0 ( 4.48x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2: 379.7 ( 4.63x)
put_no_rnd_qpel_pixels_tab[0][11]_c: 1820.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2 (old): 392.7 ( 4.64x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2: 377.4 ( 4.82x)
put_no_rnd_qpel_pixels_tab[0][13]_c: 1841.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2 (old): 427.1 ( 4.31x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2: 395.9 ( 4.65x)
put_no_rnd_qpel_pixels_tab[0][14]_c: 1761.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2 (old): 392.3 ( 4.49x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2: 375.9 ( 4.69x)
put_no_rnd_qpel_pixels_tab[0][15]_c: 1869.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2 (old): 425.6 ( 4.39x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2: 397.3 ( 4.70x)
put_qpel_pixels_tab[0][5]_c: 1888.2 ( 1.00x)
put_qpel_pixels_tab[0][5]_sse2 (old): 396.5 ( 4.76x)
put_qpel_pixels_tab[0][5]_sse2: 382.5 ( 4.94x)
put_qpel_pixels_tab[0][6]_c: 1760.4 ( 1.00x)
put_qpel_pixels_tab[0][6]_sse2 (old): 377.0 ( 4.67x)
put_qpel_pixels_tab[0][6]_sse2: 372.1 ( 4.73x)
put_qpel_pixels_tab[0][7]_c: 1927.6 ( 1.00x)
put_qpel_pixels_tab[0][7]_sse2 (old): 396.5 ( 4.86x)
put_qpel_pixels_tab[0][7]_sse2: 383.4 ( 5.03x)
put_qpel_pixels_tab[0][9]_c: 1775.9 ( 1.00x)
put_qpel_pixels_tab[0][9]_sse2 (old): 377.9 ( 4.70x)
put_qpel_pixels_tab[0][9]_sse2: 372.3 ( 4.77x)
put_qpel_pixels_tab[0][11]_c: 1809.0 ( 1.00x)
put_qpel_pixels_tab[0][11]_sse2 (old): 374.6 ( 4.83x)
put_qpel_pixels_tab[0][11]_sse2: 380.3 ( 4.76x)
put_qpel_pixels_tab[0][13]_c: 1893.2 ( 1.00x)
put_qpel_pixels_tab[0][13]_sse2 (old): 399.2 ( 4.74x)
put_qpel_pixels_tab[0][13]_sse2: 384.7 ( 4.92x)
put_qpel_pixels_tab[0][14]_c: 1756.2 ( 1.00x)
put_qpel_pixels_tab[0][14]_sse2 (old): 377.9 ( 4.65x)
put_qpel_pixels_tab[0][14]_sse2: 374.4 ( 4.69x)
put_qpel_pixels_tab[0][15]_c: 1922.8 ( 1.00x)
put_qpel_pixels_tab[0][15]_sse2 (old): 399.0 ( 4.82x)
put_qpel_pixels_tab[0][15]_sse2: 387.8 ( 4.96x)
The purely vertical size 16 mc functions now no longer use any MMX.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
dad0c01076
avcodec/x86/qpeldsp: Remove vertical MMXEXT mc functions
...
Superseded by SSE2.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
9beecb2670
avcodec/x86/qpeldsp: Add SSE2 vertical lowpass functions
...
Benchmarks ([4], [8] and [12] are pure vertical functions
and therefore show the biggest improvements):
avg_qpel_pixels_tab[0][4]_c: 844.5 ( 1.00x)
avg_qpel_pixels_tab[0][4]_mmxext: 225.5 ( 3.74x)
avg_qpel_pixels_tab[0][4]_sse2: 146.6 ( 5.76x)
avg_qpel_pixels_tab[0][5]_c: 1915.9 ( 1.00x)
avg_qpel_pixels_tab[0][5]_mmxext: 499.6 ( 3.83x)
avg_qpel_pixels_tab[0][5]_sse2: 405.5 ( 4.72x)
avg_qpel_pixels_tab[0][6]_c: 1775.9 ( 1.00x)
avg_qpel_pixels_tab[0][6]_mmxext: 484.9 ( 3.66x)
avg_qpel_pixels_tab[0][6]_sse2: 385.4 ( 4.61x)
avg_qpel_pixels_tab[0][7]_c: 1937.0 ( 1.00x)
avg_qpel_pixels_tab[0][7]_mmxext: 501.3 ( 3.86x)
avg_qpel_pixels_tab[0][7]_sse2: 403.6 ( 4.80x)
avg_qpel_pixels_tab[0][8]_c: 976.7 ( 1.00x)
avg_qpel_pixels_tab[0][8]_mmxext: 216.9 ( 4.50x)
avg_qpel_pixels_tab[0][8]_sse2: 113.1 ( 8.64x)
avg_qpel_pixels_tab[0][9]_c: 1971.8 ( 1.00x)
avg_qpel_pixels_tab[0][9]_mmxext: 494.9 ( 3.98x)
avg_qpel_pixels_tab[0][9]_sse2: 388.3 ( 5.08x)
avg_qpel_pixels_tab[0][10]_c: 1900.8 ( 1.00x)
avg_qpel_pixels_tab[0][10]_mmxext: 476.4 ( 3.99x)
avg_qpel_pixels_tab[0][10]_sse2: 362.4 ( 5.24x)
avg_qpel_pixels_tab[0][11]_c: 2003.3 ( 1.00x)
avg_qpel_pixels_tab[0][11]_mmxext: 496.5 ( 4.04x)
avg_qpel_pixels_tab[0][11]_sse2: 385.9 ( 5.19x)
avg_qpel_pixels_tab[0][12]_c: 841.8 ( 1.00x)
avg_qpel_pixels_tab[0][12]_mmxext: 226.7 ( 3.71x)
avg_qpel_pixels_tab[0][12]_sse2: 143.3 ( 5.87x)
avg_qpel_pixels_tab[0][13]_c: 1929.0 ( 1.00x)
avg_qpel_pixels_tab[0][13]_mmxext: 499.6 ( 3.86x)
avg_qpel_pixels_tab[0][13]_sse2: 412.1 ( 4.68x)
avg_qpel_pixels_tab[0][14]_c: 1777.9 ( 1.00x)
avg_qpel_pixels_tab[0][14]_mmxext: 484.8 ( 3.67x)
avg_qpel_pixels_tab[0][14]_sse2: 385.9 ( 4.61x)
avg_qpel_pixels_tab[0][15]_c: 1914.8 ( 1.00x)
avg_qpel_pixels_tab[0][15]_mmxext: 501.8 ( 3.82x)
avg_qpel_pixels_tab[0][15]_sse2: 405.0 ( 4.73x)
avg_qpel_pixels_tab[1][4]_c: 203.4 ( 1.00x)
avg_qpel_pixels_tab[1][4]_mmxext: 64.7 ( 3.14x)
avg_qpel_pixels_tab[1][4]_sse2: 40.3 ( 5.05x)
avg_qpel_pixels_tab[1][5]_c: 488.8 ( 1.00x)
avg_qpel_pixels_tab[1][5]_mmxext: 134.6 ( 3.63x)
avg_qpel_pixels_tab[1][5]_sse2: 108.5 ( 4.50x)
avg_qpel_pixels_tab[1][6]_c: 448.2 ( 1.00x)
avg_qpel_pixels_tab[1][6]_mmxext: 128.8 ( 3.48x)
avg_qpel_pixels_tab[1][6]_sse2: 102.5 ( 4.37x)
avg_qpel_pixels_tab[1][7]_c: 489.6 ( 1.00x)
avg_qpel_pixels_tab[1][7]_mmxext: 134.5 ( 3.64x)
avg_qpel_pixels_tab[1][7]_sse2: 108.8 ( 4.50x)
avg_qpel_pixels_tab[1][8]_c: 223.8 ( 1.00x)
avg_qpel_pixels_tab[1][8]_mmxext: 57.5 ( 3.89x)
avg_qpel_pixels_tab[1][8]_sse2: 36.3 ( 6.16x)
avg_qpel_pixels_tab[1][9]_c: 496.6 ( 1.00x)
avg_qpel_pixels_tab[1][9]_mmxext: 129.8 ( 3.82x)
avg_qpel_pixels_tab[1][9]_sse2: 105.1 ( 4.72x)
avg_qpel_pixels_tab[1][10]_c: 466.1 ( 1.00x)
avg_qpel_pixels_tab[1][10]_mmxext: 123.2 ( 3.78x)
avg_qpel_pixels_tab[1][10]_sse2: 99.1 ( 4.70x)
avg_qpel_pixels_tab[1][11]_c: 497.9 ( 1.00x)
avg_qpel_pixels_tab[1][11]_mmxext: 129.9 ( 3.83x)
avg_qpel_pixels_tab[1][11]_sse2: 105.4 ( 4.72x)
avg_qpel_pixels_tab[1][12]_c: 203.5 ( 1.00x)
avg_qpel_pixels_tab[1][12]_mmxext: 63.8 ( 3.19x)
avg_qpel_pixels_tab[1][12]_sse2: 38.8 ( 5.25x)
avg_qpel_pixels_tab[1][13]_c: 487.9 ( 1.00x)
avg_qpel_pixels_tab[1][13]_mmxext: 134.7 ( 3.62x)
avg_qpel_pixels_tab[1][13]_sse2: 108.4 ( 4.50x)
avg_qpel_pixels_tab[1][14]_c: 447.4 ( 1.00x)
avg_qpel_pixels_tab[1][14]_mmxext: 128.2 ( 3.49x)
avg_qpel_pixels_tab[1][14]_sse2: 102.4 ( 4.37x)
avg_qpel_pixels_tab[1][15]_c: 487.5 ( 1.00x)
avg_qpel_pixels_tab[1][15]_mmxext: 134.0 ( 3.64x)
avg_qpel_pixels_tab[1][15]_sse2: 109.9 ( 4.44x)
put_no_rnd_qpel_pixels_tab[0][4]_c: 825.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][4]_mmxext: 242.5 ( 3.40x)
put_no_rnd_qpel_pixels_tab[0][4]_sse2: 136.0 ( 6.07x)
put_no_rnd_qpel_pixels_tab[0][5]_c: 1837.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][5]_mmxext: 542.5 ( 3.39x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2: 446.5 ( 4.11x)
put_no_rnd_qpel_pixels_tab[0][6]_c: 1766.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_mmxext: 493.6 ( 3.58x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2: 394.6 ( 4.48x)
put_no_rnd_qpel_pixels_tab[0][7]_c: 1877.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_mmxext: 541.9 ( 3.46x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2: 447.6 ( 4.19x)
put_no_rnd_qpel_pixels_tab[0][8]_c: 785.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][8]_mmxext: 206.2 ( 3.81x)
put_no_rnd_qpel_pixels_tab[0][8]_sse2: 101.6 ( 7.73x)
put_no_rnd_qpel_pixels_tab[0][9]_c: 1772.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_mmxext: 489.5 ( 3.62x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2: 394.8 ( 4.49x)
put_no_rnd_qpel_pixels_tab[0][10]_c: 1711.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][10]_mmxext: 461.2 ( 3.71x)
put_no_rnd_qpel_pixels_tab[0][10]_sse2: 357.9 ( 4.78x)
put_no_rnd_qpel_pixels_tab[0][11]_c: 1815.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_mmxext: 490.8 ( 3.70x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2: 394.0 ( 4.61x)
put_no_rnd_qpel_pixels_tab[0][12]_c: 824.8 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][12]_mmxext: 242.9 ( 3.40x)
put_no_rnd_qpel_pixels_tab[0][12]_sse2: 135.3 ( 6.10x)
put_no_rnd_qpel_pixels_tab[0][13]_c: 1843.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_mmxext: 545.4 ( 3.38x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2: 444.9 ( 4.14x)
put_no_rnd_qpel_pixels_tab[0][14]_c: 1758.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_mmxext: 497.7 ( 3.53x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2: 393.5 ( 4.47x)
put_no_rnd_qpel_pixels_tab[0][15]_c: 1861.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_mmxext: 545.0 ( 3.42x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2: 445.7 ( 4.18x)
put_no_rnd_qpel_pixels_tab[1][4]_c: 198.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][4]_mmxext: 64.3 ( 3.08x)
put_no_rnd_qpel_pixels_tab[1][4]_sse2: 39.8 ( 4.98x)
put_no_rnd_qpel_pixels_tab[1][5]_c: 460.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][5]_mmxext: 137.2 ( 3.36x)
put_no_rnd_qpel_pixels_tab[1][5]_sse2: 113.5 ( 4.06x)
put_no_rnd_qpel_pixels_tab[1][6]_c: 441.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][6]_mmxext: 126.7 ( 3.49x)
put_no_rnd_qpel_pixels_tab[1][6]_sse2: 103.7 ( 4.26x)
put_no_rnd_qpel_pixels_tab[1][7]_c: 465.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][7]_mmxext: 137.7 ( 3.38x)
put_no_rnd_qpel_pixels_tab[1][7]_sse2: 114.0 ( 4.09x)
put_no_rnd_qpel_pixels_tab[1][8]_c: 193.8 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][8]_mmxext: 52.1 ( 3.72x)
put_no_rnd_qpel_pixels_tab[1][8]_sse2: 27.8 ( 6.97x)
put_no_rnd_qpel_pixels_tab[1][9]_c: 450.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][9]_mmxext: 126.2 ( 3.57x)
put_no_rnd_qpel_pixels_tab[1][9]_sse2: 104.3 ( 4.32x)
put_no_rnd_qpel_pixels_tab[1][10]_c: 436.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][10]_mmxext: 118.1 ( 3.69x)
put_no_rnd_qpel_pixels_tab[1][10]_sse2: 92.4 ( 4.73x)
put_no_rnd_qpel_pixels_tab[1][11]_c: 453.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][11]_mmxext: 128.7 ( 3.52x)
put_no_rnd_qpel_pixels_tab[1][11]_sse2: 103.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[1][12]_c: 201.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][12]_mmxext: 64.2 ( 3.13x)
put_no_rnd_qpel_pixels_tab[1][12]_sse2: 39.6 ( 5.08x)
put_no_rnd_qpel_pixels_tab[1][13]_c: 461.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][13]_mmxext: 137.6 ( 3.36x)
put_no_rnd_qpel_pixels_tab[1][13]_sse2: 113.4 ( 4.07x)
put_no_rnd_qpel_pixels_tab[1][14]_c: 442.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][14]_mmxext: 127.0 ( 3.49x)
put_no_rnd_qpel_pixels_tab[1][14]_sse2: 102.2 ( 4.33x)
put_no_rnd_qpel_pixels_tab[1][15]_c: 462.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][15]_mmxext: 139.5 ( 3.32x)
put_no_rnd_qpel_pixels_tab[1][15]_sse2: 113.3 ( 4.09x)
put_qpel_pixels_tab[0][4]_c: 824.6 ( 1.00x)
put_qpel_pixels_tab[0][4]_mmxext: 220.1 ( 3.75x)
put_qpel_pixels_tab[0][4]_sse2: 137.8 ( 5.98x)
put_qpel_pixels_tab[0][5]_c: 1892.0 ( 1.00x)
put_qpel_pixels_tab[0][5]_mmxext: 508.0 ( 3.72x)
put_qpel_pixels_tab[0][5]_sse2: 408.6 ( 4.63x)
put_qpel_pixels_tab[0][6]_c: 1758.0 ( 1.00x)
put_qpel_pixels_tab[0][6]_mmxext: 476.7 ( 3.69x)
put_qpel_pixels_tab[0][6]_sse2: 381.4 ( 4.61x)
put_qpel_pixels_tab[0][7]_c: 1924.3 ( 1.00x)
put_qpel_pixels_tab[0][7]_mmxext: 495.1 ( 3.89x)
put_qpel_pixels_tab[0][7]_sse2: 417.2 ( 4.61x)
put_qpel_pixels_tab[0][8]_c: 772.1 ( 1.00x)
put_qpel_pixels_tab[0][8]_mmxext: 197.5 ( 3.91x)
put_qpel_pixels_tab[0][8]_sse2: 118.4 ( 6.52x)
put_qpel_pixels_tab[0][9]_c: 1778.2 ( 1.00x)
put_qpel_pixels_tab[0][9]_mmxext: 476.7 ( 3.73x)
put_qpel_pixels_tab[0][9]_sse2: 379.6 ( 4.68x)
put_qpel_pixels_tab[0][10]_c: 1714.6 ( 1.00x)
put_qpel_pixels_tab[0][10]_mmxext: 460.7 ( 3.72x)
put_qpel_pixels_tab[0][10]_sse2: 386.8 ( 4.43x)
put_qpel_pixels_tab[0][11]_c: 1819.1 ( 1.00x)
put_qpel_pixels_tab[0][11]_mmxext: 474.9 ( 3.83x)
put_qpel_pixels_tab[0][11]_sse2: 404.5 ( 4.50x)
put_qpel_pixels_tab[0][12]_c: 829.7 ( 1.00x)
put_qpel_pixels_tab[0][12]_mmxext: 221.5 ( 3.75x)
put_qpel_pixels_tab[0][12]_sse2: 138.7 ( 5.98x)
put_qpel_pixels_tab[0][13]_c: 1892.8 ( 1.00x)
put_qpel_pixels_tab[0][13]_mmxext: 494.4 ( 3.83x)
put_qpel_pixels_tab[0][13]_sse2: 413.9 ( 4.57x)
put_qpel_pixels_tab[0][14]_c: 1763.1 ( 1.00x)
put_qpel_pixels_tab[0][14]_mmxext: 473.4 ( 3.72x)
put_qpel_pixels_tab[0][14]_sse2: 377.8 ( 4.67x)
put_qpel_pixels_tab[0][15]_c: 1896.4 ( 1.00x)
put_qpel_pixels_tab[0][15]_mmxext: 492.5 ( 3.85x)
put_qpel_pixels_tab[0][15]_sse2: 399.0 ( 4.75x)
put_qpel_pixels_tab[1][4]_c: 198.6 ( 1.00x)
put_qpel_pixels_tab[1][4]_mmxext: 60.9 ( 3.26x)
put_qpel_pixels_tab[1][4]_sse2: 40.1 ( 4.95x)
put_qpel_pixels_tab[1][5]_c: 471.4 ( 1.00x)
put_qpel_pixels_tab[1][5]_mmxext: 131.8 ( 3.58x)
put_qpel_pixels_tab[1][5]_sse2: 107.2 ( 4.40x)
put_qpel_pixels_tab[1][6]_c: 440.3 ( 1.00x)
put_qpel_pixels_tab[1][6]_mmxext: 126.3 ( 3.49x)
put_qpel_pixels_tab[1][6]_sse2: 100.6 ( 4.38x)
put_qpel_pixels_tab[1][7]_c: 469.2 ( 1.00x)
put_qpel_pixels_tab[1][7]_mmxext: 131.7 ( 3.56x)
put_qpel_pixels_tab[1][7]_sse2: 106.9 ( 4.39x)
put_qpel_pixels_tab[1][8]_c: 194.2 ( 1.00x)
put_qpel_pixels_tab[1][8]_mmxext: 52.9 ( 3.67x)
put_qpel_pixels_tab[1][8]_sse2: 28.0 ( 6.95x)
put_qpel_pixels_tab[1][9]_c: 464.6 ( 1.00x)
put_qpel_pixels_tab[1][9]_mmxext: 125.1 ( 3.71x)
put_qpel_pixels_tab[1][9]_sse2: 100.9 ( 4.60x)
put_qpel_pixels_tab[1][10]_c: 433.8 ( 1.00x)
put_qpel_pixels_tab[1][10]_mmxext: 118.2 ( 3.67x)
put_qpel_pixels_tab[1][10]_sse2: 94.5 ( 4.59x)
put_qpel_pixels_tab[1][11]_c: 463.9 ( 1.00x)
put_qpel_pixels_tab[1][11]_mmxext: 125.5 ( 3.70x)
put_qpel_pixels_tab[1][11]_sse2: 102.6 ( 4.52x)
put_qpel_pixels_tab[1][12]_c: 199.2 ( 1.00x)
put_qpel_pixels_tab[1][12]_mmxext: 63.7 ( 3.12x)
put_qpel_pixels_tab[1][12]_sse2: 36.2 ( 5.50x)
put_qpel_pixels_tab[1][13]_c: 475.6 ( 1.00x)
put_qpel_pixels_tab[1][13]_mmxext: 139.5 ( 3.41x)
put_qpel_pixels_tab[1][13]_sse2: 107.3 ( 4.43x)
put_qpel_pixels_tab[1][14]_c: 441.9 ( 1.00x)
put_qpel_pixels_tab[1][14]_mmxext: 126.9 ( 3.48x)
put_qpel_pixels_tab[1][14]_sse2: 101.3 ( 4.36x)
put_qpel_pixels_tab[1][15]_c: 475.9 ( 1.00x)
put_qpel_pixels_tab[1][15]_mmxext: 131.9 ( 3.61x)
put_qpel_pixels_tab[1][15]_sse2: 107.0 ( 4.45x)
The new functions (in qpeldsp.asm) occupy 8244B (the MMXEXT functions
which they will replace occupy only 6720B).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
405465700c
avcodec/x86/qpeldsp: Don't allocate stack unnecessarily
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
188df9549c
avcodec/x86/qpeldsp: Don't use too much stack
...
We only need (SIZE+1)*SIZE words.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
bcf7293a21
avcodec/x86/qpeldsp: Remove unused declaration
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
7b56259dd5
avcodec/x86/constants: Move ff_pw_{15,20} to qpeldsp.asm
...
Only used there.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
c2685234a6
avcodec/x86/qpeldsp_init: Deduplicate 8x8 and 16x16 code
...
Also split the big macro into smaller ones for the pure horizontal vs
the pure vertical and the mixed directions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
cf79d8052d
avcodec/x86/qpeldsp_init: Specify alignment properly
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
69906d31c5
avcodec/x86/qpeldsp_init: Don't use unnecessarily big stack buffer
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
d3bd1318b3
avcodec/x86/qpeldsp: Don't zero unnecessarily
...
This value is write-only.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
d46414b46b
avcodec/x86/qpeldsp: Simplify resetting output pointer
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Martin Storsjö
963ea707e3
arm/rv40dsp: Add * on comment continuation lines in prototypes
...
This avoids that the assembly indenter script tries to indent these
lines as assembly code.
2026-04-29 13:53:07 +03:00
Martin Storsjö
0a86aead82
arm/vc1dsp: Fix a few cases of inconsistent indentation
...
The function ff_vc1_unescape_buffer_helper_neon intentionally
uses unusual indentation, to indicate different levels of
unrolling in the function.
2026-04-29 13:53:07 +03:00
Martin Storsjö
10a45072fc
arm/jrevdct: Indent previously unindented assembly
...
The comments have been manually tweaked to line up properly.
2026-04-29 13:53:07 +03:00
Martin Storsjö
5e0f1b1eda
arm/hevcdsp_qpel: Reindent code that seem to lack consistent indentation
2026-04-29 13:53:07 +03:00
Martin Storsjö
65d4c5bbe2
arm: Reindent asm that used consistent but differing styles
...
The qpel_filter macros in hevcdsp_qpel_neon.S have been
manually tweaked to keep reasonable indentation of the
comments.
2026-04-29 13:53:07 +03:00
Martin Storsjö
2325421904
arm/synth_filter_vfp: Fix indentation
...
This was done with manual adjustments; the reindentation
script doesn't handle the VFP/NOVFP macros at the start of
lines.
2026-04-29 13:53:07 +03:00
Ramiro Polla
8d9c1db95d
arm/simple_idct_arm: Reindent previously unindented code
2026-04-29 13:53:07 +03:00
Martin Storsjö
a65ed248fd
arm/simple_idct_armv6: Reindent previously consistent assembly to shared style
...
This has manual fixups, as the indenting script wants to
lowercase constants like W46 to w46, which breaks things.
2026-04-29 13:49:27 +03:00
Martin Storsjö
b27fd61020
arm/simple_idct_armv5te: Reindent previously consistent code to common style
...
This has manual fixups, as the indenting script wants to
lowercase constants like W26 to w26, which breaks things.
2026-04-29 13:49:27 +03:00
Martin Storsjö
8e199a2a9f
arm/rv34dsp: Adjust macro argument indentation slightly
...
The previous form did neatly align with the lines above, but doesn't
match general indentation rules from our indentation script.
2026-04-29 13:49:27 +03:00
Martin Storsjö
d94e2b0f7c
arm/hevcdsp: Fix misindented instructions in some macros
2026-04-29 13:49:27 +03:00
Martin Storsjö
7eaeb5ab4a
arm: Fix indentation of stray individual misaligned instructions
2026-04-29 13:49:27 +03:00
Martin Storsjö
17765fe831
arm: Reindent assembly where it was off by one char
2026-04-29 13:49:27 +03:00
Marvin Scholz
f044c5e627
doc: remove unclear description
...
There is no caller when presuming that the user will use lavc for
decoding.
2026-04-28 14:31:19 +02:00
Marvin Scholz
c9937ff139
doc: mark functions related to AVCodecParameters
...
This makes these functions appear in the AVCodecParameters
documentation page, so they are easier to find.
2026-04-28 14:31:19 +02:00
Marvin Scholz
ab1a970bc0
doc: style changes for the AVCodecParameters
...
Mostly adding references and making the video/audio only
annotations not be the brief description.
2026-04-28 14:31:19 +02:00
Marvin Scholz
e4f6aa8611
avcodec/wmadec: add fall-through annotations
2026-04-28 12:29:37 +00:00
Marvin Scholz
dc7692b831
avcodec/aac: add fall-through annotations
2026-04-28 12:29:37 +00:00
Marvin Scholz
97ff804e21
avcodec/ac3dec: add fall-through annotations
2026-04-28 12:29:37 +00:00
Marvin Scholz
a384a4ff3a
avcodec/ansi: add fall-through annotations
2026-04-28 12:29:37 +00:00
Marvin Scholz
5cee00b85f
avcodec/argo: add fall-through annotations
2026-04-28 12:29:37 +00:00
Marvin Scholz
0f3fe9e2bf
avcodec/avs: add fall-through annotations
2026-04-28 12:29:37 +00:00
Marvin Scholz
e5e12328bf
avcodec/bethsoftvideo: add fall-through annotations
2026-04-28 12:29:37 +00:00
Marvin Scholz
0f81f78829
avcodec/bink: add fall-through annotations
2026-04-28 12:29:37 +00:00
Marvin Scholz
49c62c3337
avcodec/bintext: add fall-through annotations
2026-04-28 12:29:37 +00:00
Marvin Scholz
d578926366
avcodec/c39: add fall-through annotations
2026-04-28 12:29:37 +00:00
Marvin Scholz
7b94360e0e
avcodec/cavs: add fall-through annotations
2026-04-28 12:29:37 +00:00
Marvin Scholz
c772decdd0
avcodec/dca: add break
2026-04-28 12:29:37 +00:00
Marvin Scholz
5cdbd0337f
avcodec/dds: add fall-through annotations
2026-04-28 12:29:37 +00:00
Marvin Scholz
9a765c453a
avcodec/dpxenc: add fall-through annotations
2026-04-28 12:29:37 +00:00
Marvin Scholz
b70d6b4f58
avcodec/dv: add break
2026-04-28 12:29:37 +00:00
Marvin Scholz
5a5742498b
avcodec/dxa: add fall-through annotations
2026-04-28 12:29:37 +00:00