Commit graph

54060 commits

Author SHA1 Message Date
Andreas Rheinhardt
cc3ca17127 avcodec/x86/qpeldsp{,_init}: Use proper prefix
E.g. rename ff_put_mpeg4_qpel8_h_lowpass_ssse3 to
ff_mpeg4_put_qpel8_h_lowpass_ssse3.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
ca43bc6202 avcodec/x86/qpeldsp_init: Mark functions as hidden
It allows pic 32bit code to call the underlying
assembly functions directly, without loading
the GOT first; this saves 1245B of .text here
(for 32bit pic code).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
23d3116af9 avcodec/x86/qpeldsp: Add combination of h_lowpass + l2
If the subpel part of the horizontal component of
the motion vector is 1/4 or 3/4, the MPEG-4 qpel motion compensation
first computes the mc for the corresponding motion vector
with 1/2 horizontal subpel part and then averages this
with the left (for 1/4) or the right (for 3/4) source pixel.
These two stages are currently performed in two different functions,
involving a stack buffer as intermediate.

This means that horizontal prediction for every function with
a 1/4 or 3/4 horizontal subpel mv is more expensive code-size wise
(and also performance-wise) as it involves two calls. Given that
the horizontal lowpass functions are not that long, adding combinations
of h_lowpass+l2 actually reduces binary size: An increase of 1136B
in the asm files is more than offset by size reductions in
the wrappers: 1968B here when not using stack protection,
2256B when using stack protection.

Of course it also improves performance. Old benchmarks:
avg_qpel_pixels_tab[0][1]_ssse3:                       106.9 ( 8.69x)
avg_qpel_pixels_tab[0][3]_ssse3:                       105.5 ( 8.84x)
avg_qpel_pixels_tab[0][5]_ssse3:                       226.9 ( 8.57x)
avg_qpel_pixels_tab[0][7]_ssse3:                       231.1 ( 8.38x)
avg_qpel_pixels_tab[0][9]_ssse3:                       217.8 ( 9.04x)
avg_qpel_pixels_tab[0][11]_ssse3:                      214.9 ( 9.32x)
avg_qpel_pixels_tab[0][13]_ssse3:                      227.1 ( 8.48x)
avg_qpel_pixels_tab[0][15]_ssse3:                      236.1 ( 8.02x)

New benchmarks:
avg_qpel_pixels_tab[0][1]_ssse3:                        96.7 ( 9.65x)
avg_qpel_pixels_tab[0][3]_ssse3:                        96.6 ( 9.73x)
avg_qpel_pixels_tab[0][5]_ssse3:                       225.8 ( 8.61x)
avg_qpel_pixels_tab[0][7]_ssse3:                       228.4 ( 8.51x)
avg_qpel_pixels_tab[0][9]_ssse3:                       217.1 ( 9.05x)
avg_qpel_pixels_tab[0][11]_ssse3:                      217.8 ( 9.32x)
avg_qpel_pixels_tab[0][13]_ssse3:                      227.2 ( 8.54x)
avg_qpel_pixels_tab[0][15]_ssse3:                      220.5 ( 8.72x)

Note: The l2 functions are also used for vertical lowpass
functions, yet given that they are much bigger, duplicating
them would lead to massive code size increase.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
f946cac2d9 avcodec/x86/qpeldsp: Remove horizontal mmxext mc functions
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
1d040c527d avcodec/x86/qpeldsp: Add SSSE3 size 8 horizontal filter
Beats the mmxext version by a lot (in the following,
[1][1-3] refers to horizontal-only size 8 mc;
the _sse2 comparators for the other cases use mmxext
horizontal mc coupled with vertical SSE2 mc):

avg_qpel_pixels_tab[1][1]_c:                           223.9 ( 1.00x)
avg_qpel_pixels_tab[1][1]_mmxext:                       66.2 ( 3.38x)
avg_qpel_pixels_tab[1][1]_ssse3:                        36.8 ( 6.08x)
avg_qpel_pixels_tab[1][2]_c:                           251.0 ( 1.00x)
avg_qpel_pixels_tab[1][2]_mmxext:                       58.5 ( 4.29x)
avg_qpel_pixels_tab[1][2]_ssse3:                        25.5 ( 9.84x)
avg_qpel_pixels_tab[1][3]_c:                           226.9 ( 1.00x)
avg_qpel_pixels_tab[1][3]_mmxext:                       66.3 ( 3.42x)
avg_qpel_pixels_tab[1][3]_ssse3:                        35.8 ( 6.34x)
avg_qpel_pixels_tab[1][5]_c:                           473.9 ( 1.00x)
avg_qpel_pixels_tab[1][5]_sse2:                        110.7 ( 4.28x)
avg_qpel_pixels_tab[1][5]_ssse3:                        76.0 ( 6.24x)
avg_qpel_pixels_tab[1][6]_c:                           440.9 ( 1.00x)
avg_qpel_pixels_tab[1][6]_sse2:                        102.1 ( 4.32x)
avg_qpel_pixels_tab[1][6]_ssse3:                        67.1 ( 6.58x)
avg_qpel_pixels_tab[1][7]_c:                           473.8 ( 1.00x)
avg_qpel_pixels_tab[1][7]_sse2:                        108.0 ( 4.39x)
avg_qpel_pixels_tab[1][7]_ssse3:                        74.6 ( 6.35x)
avg_qpel_pixels_tab[1][9]_c:                           492.9 ( 1.00x)
avg_qpel_pixels_tab[1][9]_sse2:                        102.1 ( 4.83x)
avg_qpel_pixels_tab[1][9]_ssse3:                        67.1 ( 7.35x)
avg_qpel_pixels_tab[1][10]_c:                          465.6 ( 1.00x)
avg_qpel_pixels_tab[1][10]_sse2:                        94.9 ( 4.91x)
avg_qpel_pixels_tab[1][10]_ssse3:                       57.5 ( 8.10x)
avg_qpel_pixels_tab[1][11]_c:                          492.8 ( 1.00x)
avg_qpel_pixels_tab[1][11]_sse2:                       102.4 ( 4.81x)
avg_qpel_pixels_tab[1][11]_ssse3:                       68.7 ( 7.17x)
avg_qpel_pixels_tab[1][13]_c:                          476.6 ( 1.00x)
avg_qpel_pixels_tab[1][13]_sse2:                       108.6 ( 4.39x)
avg_qpel_pixels_tab[1][13]_ssse3:                       74.7 ( 6.38x)
avg_qpel_pixels_tab[1][14]_c:                          434.9 ( 1.00x)
avg_qpel_pixels_tab[1][14]_sse2:                       102.2 ( 4.25x)
avg_qpel_pixels_tab[1][14]_ssse3:                       66.6 ( 6.53x)
avg_qpel_pixels_tab[1][15]_c:                          474.1 ( 1.00x)
avg_qpel_pixels_tab[1][15]_sse2:                       107.9 ( 4.39x)
avg_qpel_pixels_tab[1][15]_ssse3:                       74.3 ( 6.38x)
put_no_rnd_qpel_pixels_tab[1][1]_c:                    222.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][1]_mmxext:                66.0 ( 3.37x)
put_no_rnd_qpel_pixels_tab[1][1]_ssse3:                 35.2 ( 6.31x)
put_no_rnd_qpel_pixels_tab[1][2]_c:                    212.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][2]_mmxext:                56.8 ( 3.74x)
put_no_rnd_qpel_pixels_tab[1][2]_ssse3:                 25.0 ( 8.48x)
put_no_rnd_qpel_pixels_tab[1][3]_c:                    224.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][3]_mmxext:                65.8 ( 3.41x)
put_no_rnd_qpel_pixels_tab[1][3]_ssse3:                 35.8 ( 6.26x)
put_no_rnd_qpel_pixels_tab[1][5]_c:                    460.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][5]_sse2:                 114.6 ( 4.01x)
put_no_rnd_qpel_pixels_tab[1][5]_ssse3:                 83.1 ( 5.53x)
put_no_rnd_qpel_pixels_tab[1][6]_c:                    438.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][6]_sse2:                 104.2 ( 4.21x)
put_no_rnd_qpel_pixels_tab[1][6]_ssse3:                 67.5 ( 6.50x)
put_no_rnd_qpel_pixels_tab[1][7]_c:                    458.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][7]_sse2:                 113.8 ( 4.02x)
put_no_rnd_qpel_pixels_tab[1][7]_ssse3:                 79.9 ( 5.73x)
put_no_rnd_qpel_pixels_tab[1][9]_c:                    439.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][9]_sse2:                 103.7 ( 4.23x)
put_no_rnd_qpel_pixels_tab[1][9]_ssse3:                 68.9 ( 6.37x)
put_no_rnd_qpel_pixels_tab[1][10]_c:                   427.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][10]_sse2:                 93.2 ( 4.58x)
put_no_rnd_qpel_pixels_tab[1][10]_ssse3:                57.9 ( 7.37x)
put_no_rnd_qpel_pixels_tab[1][11]_c:                   439.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][11]_sse2:                104.0 ( 4.23x)
put_no_rnd_qpel_pixels_tab[1][11]_ssse3:                69.2 ( 6.36x)
put_no_rnd_qpel_pixels_tab[1][13]_c:                   459.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][13]_sse2:                113.2 ( 4.06x)
put_no_rnd_qpel_pixels_tab[1][13]_ssse3:                83.8 ( 5.48x)
put_no_rnd_qpel_pixels_tab[1][14]_c:                   439.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][14]_sse2:                103.3 ( 4.25x)
put_no_rnd_qpel_pixels_tab[1][14]_ssse3:                67.9 ( 6.47x)
put_no_rnd_qpel_pixels_tab[1][15]_c:                   453.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][15]_sse2:                113.7 ( 3.99x)
put_no_rnd_qpel_pixels_tab[1][15]_ssse3:                80.0 ( 5.67x)
put_qpel_pixels_tab[1][1]_c:                           229.0 ( 1.00x)
put_qpel_pixels_tab[1][1]_mmxext:                       65.5 ( 3.50x)
put_qpel_pixels_tab[1][1]_ssse3:                        33.8 ( 6.77x)
put_qpel_pixels_tab[1][2]_c:                           212.5 ( 1.00x)
put_qpel_pixels_tab[1][2]_mmxext:                       56.6 ( 3.75x)
put_qpel_pixels_tab[1][2]_ssse3:                        23.4 ( 9.08x)
put_qpel_pixels_tab[1][3]_c:                           227.5 ( 1.00x)
put_qpel_pixels_tab[1][3]_mmxext:                       64.4 ( 3.53x)
put_qpel_pixels_tab[1][3]_ssse3:                        33.5 ( 6.79x)
put_qpel_pixels_tab[1][5]_c:                           466.5 ( 1.00x)
put_qpel_pixels_tab[1][5]_sse2:                        106.8 ( 4.37x)
put_qpel_pixels_tab[1][5]_ssse3:                        71.8 ( 6.50x)
put_qpel_pixels_tab[1][6]_c:                           438.7 ( 1.00x)
put_qpel_pixels_tab[1][6]_sse2:                        102.0 ( 4.30x)
put_qpel_pixels_tab[1][6]_ssse3:                        65.3 ( 6.72x)
put_qpel_pixels_tab[1][7]_c:                           466.0 ( 1.00x)
put_qpel_pixels_tab[1][7]_sse2:                        106.3 ( 4.38x)
put_qpel_pixels_tab[1][7]_ssse3:                        70.9 ( 6.57x)
put_qpel_pixels_tab[1][9]_c:                           456.0 ( 1.00x)
put_qpel_pixels_tab[1][9]_sse2:                        100.1 ( 4.55x)
put_qpel_pixels_tab[1][9]_ssse3:                        64.0 ( 7.13x)
put_qpel_pixels_tab[1][10]_c:                          425.1 ( 1.00x)
put_qpel_pixels_tab[1][10]_sse2:                        92.6 ( 4.59x)
put_qpel_pixels_tab[1][10]_ssse3:                       55.1 ( 7.71x)
put_qpel_pixels_tab[1][11]_c:                          452.7 ( 1.00x)
put_qpel_pixels_tab[1][11]_sse2:                        99.6 ( 4.55x)
put_qpel_pixels_tab[1][11]_ssse3:                       63.8 ( 7.09x)
put_qpel_pixels_tab[1][13]_c:                          471.2 ( 1.00x)
put_qpel_pixels_tab[1][13]_sse2:                       106.4 ( 4.43x)
put_qpel_pixels_tab[1][13]_ssse3:                       71.4 ( 6.60x)
put_qpel_pixels_tab[1][14]_c:                          439.7 ( 1.00x)
put_qpel_pixels_tab[1][14]_sse2:                       101.8 ( 4.32x)
put_qpel_pixels_tab[1][14]_ssse3:                       64.8 ( 6.79x)
put_qpel_pixels_tab[1][15]_c:                          467.8 ( 1.00x)
put_qpel_pixels_tab[1][15]_sse2:                       106.1 ( 4.41x)
put_qpel_pixels_tab[1][15]_ssse3:                       72.6 ( 6.44x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
c0e1c1d6b3 avcodec/x86/qpeldsp: Add SSSE3 size 16 horizontal filter
Beats the mmxext version by a lot (in the following,
[0][1-3] refers to horizontal-only size 16 mc;
the _sse2 comparators for the other cases use mmxext
horizontal mc coupled with vertical SSE2 mc):

avg_qpel_pixels_tab[0][1]_c:                           945.5 ( 1.00x)
avg_qpel_pixels_tab[0][1]_mmxext:                      262.6 ( 3.60x)
avg_qpel_pixels_tab[0][1]_ssse3:                       110.4 ( 8.57x)
avg_qpel_pixels_tab[0][2]_c:                          1042.1 ( 1.00x)
avg_qpel_pixels_tab[0][2]_mmxext:                      245.1 ( 4.25x)
avg_qpel_pixels_tab[0][2]_ssse3:                        91.7 (11.37x)
avg_qpel_pixels_tab[0][3]_c:                           941.8 ( 1.00x)
avg_qpel_pixels_tab[0][3]_mmxext:                      260.1 ( 3.62x)
avg_qpel_pixels_tab[0][3]_ssse3:                       110.1 ( 8.56x)
avg_qpel_pixels_tab[0][5]_c:                          1939.5 ( 1.00x)
avg_qpel_pixels_tab[0][5]_sse2:                        394.3 ( 4.92x)
avg_qpel_pixels_tab[0][5]_ssse3:                       247.4 ( 7.84x)
avg_qpel_pixels_tab[0][6]_c:                          1785.8 ( 1.00x)
avg_qpel_pixels_tab[0][6]_sse2:                        380.6 ( 4.69x)
avg_qpel_pixels_tab[0][6]_ssse3:                       221.1 ( 8.08x)
avg_qpel_pixels_tab[0][7]_c:                          1932.5 ( 1.00x)
avg_qpel_pixels_tab[0][7]_sse2:                        393.4 ( 4.91x)
avg_qpel_pixels_tab[0][7]_ssse3:                       238.8 ( 8.09x)
avg_qpel_pixels_tab[0][9]_c:                          1976.9 ( 1.00x)
avg_qpel_pixels_tab[0][9]_sse2:                        380.8 ( 5.19x)
avg_qpel_pixels_tab[0][9]_ssse3:                       223.3 ( 8.85x)
avg_qpel_pixels_tab[0][10]_c:                         1911.9 ( 1.00x)
avg_qpel_pixels_tab[0][10]_sse2:                       366.9 ( 5.21x)
avg_qpel_pixels_tab[0][10]_ssse3:                      207.0 ( 9.24x)
avg_qpel_pixels_tab[0][11]_c:                         2046.9 ( 1.00x)
avg_qpel_pixels_tab[0][11]_sse2:                       385.5 ( 5.31x)
avg_qpel_pixels_tab[0][11]_ssse3:                      227.9 ( 8.98x)
avg_qpel_pixels_tab[0][13]_c:                         1940.8 ( 1.00x)
avg_qpel_pixels_tab[0][13]_sse2:                       389.7 ( 4.98x)
avg_qpel_pixels_tab[0][13]_ssse3:                      244.2 ( 7.95x)
avg_qpel_pixels_tab[0][14]_c:                         1778.4 ( 1.00x)
avg_qpel_pixels_tab[0][14]_sse2:                       379.2 ( 4.69x)
avg_qpel_pixels_tab[0][14]_ssse3:                      223.5 ( 7.96x)
avg_qpel_pixels_tab[0][15]_c:                         1905.9 ( 1.00x)
avg_qpel_pixels_tab[0][15]_sse2:                       398.9 ( 4.78x)
avg_qpel_pixels_tab[0][15]_ssse3:                      238.3 ( 8.00x)
put_no_rnd_qpel_pixels_tab[0][1]_c:                    922.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][1]_mmxext:               275.0 ( 3.35x)
put_no_rnd_qpel_pixels_tab[0][1]_ssse3:                108.4 ( 8.51x)
put_no_rnd_qpel_pixels_tab[0][2]_c:                    889.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][2]_mmxext:               236.7 ( 3.76x)
put_no_rnd_qpel_pixels_tab[0][2]_ssse3:                 86.8 (10.25x)
put_no_rnd_qpel_pixels_tab[0][3]_c:                    915.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][3]_mmxext:               274.3 ( 3.34x)
put_no_rnd_qpel_pixels_tab[0][3]_ssse3:                108.2 ( 8.46x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2:                 400.0 ( 4.63x)
put_no_rnd_qpel_pixels_tab[0][5]_ssse3:                246.0 ( 7.53x)
put_no_rnd_qpel_pixels_tab[0][6]_c:                   1753.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2:                 382.5 ( 4.59x)
put_no_rnd_qpel_pixels_tab[0][6]_ssse3:                226.4 ( 7.75x)
put_no_rnd_qpel_pixels_tab[0][7]_c:                   1854.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2:                 393.5 ( 4.71x)
put_no_rnd_qpel_pixels_tab[0][7]_ssse3:                248.6 ( 7.46x)
put_no_rnd_qpel_pixels_tab[0][9]_c:                   1794.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2:                 382.2 ( 4.70x)
put_no_rnd_qpel_pixels_tab[0][9]_ssse3:                228.0 ( 7.87x)
put_no_rnd_qpel_pixels_tab[0][10]_c:                  1724.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][10]_sse2:                353.8 ( 4.88x)
put_no_rnd_qpel_pixels_tab[0][10]_ssse3:               206.5 ( 8.35x)
put_no_rnd_qpel_pixels_tab[0][11]_c:                  1796.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2:                378.1 ( 4.75x)
put_no_rnd_qpel_pixels_tab[0][11]_ssse3:               227.1 ( 7.91x)
put_no_rnd_qpel_pixels_tab[0][13]_c:                  1834.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2:                400.7 ( 4.58x)
put_no_rnd_qpel_pixels_tab[0][13]_ssse3:               244.2 ( 7.51x)
put_no_rnd_qpel_pixels_tab[0][14]_c:                  1755.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2:                387.2 ( 4.53x)
put_no_rnd_qpel_pixels_tab[0][14]_ssse3:               226.8 ( 7.74x)
put_no_rnd_qpel_pixels_tab[0][15]_c:                  1847.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2:                400.6 ( 4.61x)
put_no_rnd_qpel_pixels_tab[0][15]_ssse3:               246.1 ( 7.51x)
put_qpel_pixels_tab[0][1]_c:                           919.6 ( 1.00x)
put_qpel_pixels_tab[0][1]_mmxext:                      255.5 ( 3.60x)
put_qpel_pixels_tab[0][1]_ssse3:                       108.3 ( 8.49x)
put_qpel_pixels_tab[0][2]_c:                           883.9 ( 1.00x)
put_qpel_pixels_tab[0][2]_mmxext:                      238.1 ( 3.71x)
put_qpel_pixels_tab[0][2]_ssse3:                        86.7 (10.19x)
put_qpel_pixels_tab[0][3]_c:                           921.9 ( 1.00x)
put_qpel_pixels_tab[0][3]_mmxext:                      258.9 ( 3.56x)
put_qpel_pixels_tab[0][3]_ssse3:                       108.1 ( 8.53x)
put_qpel_pixels_tab[0][5]_c:                          1907.5 ( 1.00x)
put_qpel_pixels_tab[0][5]_sse2:                        384.2 ( 4.96x)
put_qpel_pixels_tab[0][5]_ssse3:                       234.8 ( 8.13x)
put_qpel_pixels_tab[0][6]_c:                          1757.4 ( 1.00x)
put_qpel_pixels_tab[0][6]_sse2:                        382.8 ( 4.59x)
put_qpel_pixels_tab[0][6]_ssse3:                       217.6 ( 8.08x)
put_qpel_pixels_tab[0][7]_c:                          1927.5 ( 1.00x)
put_qpel_pixels_tab[0][7]_sse2:                        384.6 ( 5.01x)
put_qpel_pixels_tab[0][7]_ssse3:                       231.2 ( 8.34x)
put_qpel_pixels_tab[0][9]_c:                          1832.1 ( 1.00x)
put_qpel_pixels_tab[0][9]_sse2:                        374.8 ( 4.89x)
put_qpel_pixels_tab[0][9]_ssse3:                       219.4 ( 8.35x)
put_qpel_pixels_tab[0][10]_c:                         1710.3 ( 1.00x)
put_qpel_pixels_tab[0][10]_sse2:                       384.5 ( 4.45x)
put_qpel_pixels_tab[0][10]_ssse3:                      202.9 ( 8.43x)
put_qpel_pixels_tab[0][11]_c:                         1825.0 ( 1.00x)
put_qpel_pixels_tab[0][11]_sse2:                       369.6 ( 4.94x)
put_qpel_pixels_tab[0][11]_ssse3:                      216.8 ( 8.42x)
put_qpel_pixels_tab[0][13]_c:                         1898.4 ( 1.00x)
put_qpel_pixels_tab[0][13]_sse2:                       384.9 ( 4.93x)
put_qpel_pixels_tab[0][13]_ssse3:                      238.6 ( 7.96x)
put_qpel_pixels_tab[0][14]_c:                         1779.1 ( 1.00x)
put_qpel_pixels_tab[0][14]_sse2:                       373.3 ( 4.77x)
put_qpel_pixels_tab[0][14]_ssse3:                      218.1 ( 8.16x)
put_qpel_pixels_tab[0][15]_c:                         1918.2 ( 1.00x)
put_qpel_pixels_tab[0][15]_sse2:                       385.3 ( 4.98x)
put_qpel_pixels_tab[0][15]_ssse3:                      236.8 ( 8.10x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
a3d747f344 avcodec/x86/qpeldsp{,_init}: Use SSE2 pixels16x16_l2 functions
put and avg versions have been added and used in H264
in b91081274f. This commit
adds the size 16 version of put_no_rnd and uses all three
of them in the SSE2 size 16 qpel functions (i.e. it uses
them in the ones that have a vertical component); it also
removes the 16x17 MMXEXT versions (which are no longer used).

This is particularly beneficial for put_no_rnd:
avg_qpel_pixels_tab[0][5]_c:                          1910.9 ( 1.00x)
avg_qpel_pixels_tab[0][5]_sse2 (old):                  405.1 ( 4.72x)
avg_qpel_pixels_tab[0][5]_sse2:                        392.9 ( 4.86x)
avg_qpel_pixels_tab[0][6]_c:                          1778.9 ( 1.00x)
avg_qpel_pixels_tab[0][6]_sse2 (old):                  385.5 ( 4.61x)
avg_qpel_pixels_tab[0][6]_sse2:                        374.9 ( 4.75x)
avg_qpel_pixels_tab[0][7]_c:                          1935.3 ( 1.00x)
avg_qpel_pixels_tab[0][7]_sse2 (old):                  403.1 ( 4.80x)
avg_qpel_pixels_tab[0][7]_sse2:                        391.6 ( 4.94x)
avg_qpel_pixels_tab[0][9]_c:                          1969.0 ( 1.00x)
avg_qpel_pixels_tab[0][9]_sse2 (old):                  384.1 ( 5.13x)
avg_qpel_pixels_tab[0][9]_sse2:                        380.3 ( 5.18x)
avg_qpel_pixels_tab[0][11]_c:                         2014.9 ( 1.00x)
avg_qpel_pixels_tab[0][11]_sse2 (old):                 385.6 ( 5.23x)
avg_qpel_pixels_tab[0][11]_sse2:                       380.2 ( 5.30x)
avg_qpel_pixels_tab[0][13]_c:                         1925.7 ( 1.00x)
avg_qpel_pixels_tab[0][13]_sse2 (old):                 406.1 ( 4.74x)
avg_qpel_pixels_tab[0][13]_sse2:                       390.4 ( 4.93x)
avg_qpel_pixels_tab[0][14]_c:                         1793.0 ( 1.00x)
avg_qpel_pixels_tab[0][14]_sse2 (old):                 389.6 ( 4.60x)
avg_qpel_pixels_tab[0][14]_sse2:                       377.1 ( 4.75x)
avg_qpel_pixels_tab[0][15]_c:                         1913.0 ( 1.00x)
avg_qpel_pixels_tab[0][15]_sse2 (old):                 404.2 ( 4.73x)
avg_qpel_pixels_tab[0][15]_sse2:                       390.8 ( 4.89x)
put_no_rnd_qpel_pixels_tab[0][5]_c:                   1864.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2 (old):           425.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2:                 396.2 ( 4.71x)
put_no_rnd_qpel_pixels_tab[0][6]_c:                   1767.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2 (old):           388.4 ( 4.55x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2:                 377.7 ( 4.68x)
put_no_rnd_qpel_pixels_tab[0][7]_c:                   1874.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2 (old):           427.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2:                 400.0 ( 4.69x)
put_no_rnd_qpel_pixels_tab[0][9]_c:                   1759.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2 (old):           393.0 ( 4.48x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2:                 379.7 ( 4.63x)
put_no_rnd_qpel_pixels_tab[0][11]_c:                  1820.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2 (old):          392.7 ( 4.64x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2:                377.4 ( 4.82x)
put_no_rnd_qpel_pixels_tab[0][13]_c:                  1841.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2 (old):          427.1 ( 4.31x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2:                395.9 ( 4.65x)
put_no_rnd_qpel_pixels_tab[0][14]_c:                  1761.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2 (old):          392.3 ( 4.49x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2:                375.9 ( 4.69x)
put_no_rnd_qpel_pixels_tab[0][15]_c:                  1869.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2 (old):          425.6 ( 4.39x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2:                397.3 ( 4.70x)
put_qpel_pixels_tab[0][5]_c:                          1888.2 ( 1.00x)
put_qpel_pixels_tab[0][5]_sse2 (old):                  396.5 ( 4.76x)
put_qpel_pixels_tab[0][5]_sse2:                        382.5 ( 4.94x)
put_qpel_pixels_tab[0][6]_c:                          1760.4 ( 1.00x)
put_qpel_pixels_tab[0][6]_sse2 (old):                  377.0 ( 4.67x)
put_qpel_pixels_tab[0][6]_sse2:                        372.1 ( 4.73x)
put_qpel_pixels_tab[0][7]_c:                          1927.6 ( 1.00x)
put_qpel_pixels_tab[0][7]_sse2 (old):                  396.5 ( 4.86x)
put_qpel_pixels_tab[0][7]_sse2:                        383.4 ( 5.03x)
put_qpel_pixels_tab[0][9]_c:                          1775.9 ( 1.00x)
put_qpel_pixels_tab[0][9]_sse2 (old):                  377.9 ( 4.70x)
put_qpel_pixels_tab[0][9]_sse2:                        372.3 ( 4.77x)
put_qpel_pixels_tab[0][11]_c:                         1809.0 ( 1.00x)
put_qpel_pixels_tab[0][11]_sse2 (old):                 374.6 ( 4.83x)
put_qpel_pixels_tab[0][11]_sse2:                       380.3 ( 4.76x)
put_qpel_pixels_tab[0][13]_c:                         1893.2 ( 1.00x)
put_qpel_pixels_tab[0][13]_sse2 (old):                 399.2 ( 4.74x)
put_qpel_pixels_tab[0][13]_sse2:                       384.7 ( 4.92x)
put_qpel_pixels_tab[0][14]_c:                         1756.2 ( 1.00x)
put_qpel_pixels_tab[0][14]_sse2 (old):                 377.9 ( 4.65x)
put_qpel_pixels_tab[0][14]_sse2:                       374.4 ( 4.69x)
put_qpel_pixels_tab[0][15]_c:                         1922.8 ( 1.00x)
put_qpel_pixels_tab[0][15]_sse2 (old):                 399.0 ( 4.82x)
put_qpel_pixels_tab[0][15]_sse2:                       387.8 ( 4.96x)

The purely vertical size 16 mc functions now no longer use any MMX.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
dad0c01076 avcodec/x86/qpeldsp: Remove vertical MMXEXT mc functions
Superseded by SSE2.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
9beecb2670 avcodec/x86/qpeldsp: Add SSE2 vertical lowpass functions
Benchmarks ([4], [8] and [12] are pure vertical functions
and therefore show the biggest improvements):

avg_qpel_pixels_tab[0][4]_c:                           844.5 ( 1.00x)
avg_qpel_pixels_tab[0][4]_mmxext:                      225.5 ( 3.74x)
avg_qpel_pixels_tab[0][4]_sse2:                        146.6 ( 5.76x)
avg_qpel_pixels_tab[0][5]_c:                          1915.9 ( 1.00x)
avg_qpel_pixels_tab[0][5]_mmxext:                      499.6 ( 3.83x)
avg_qpel_pixels_tab[0][5]_sse2:                        405.5 ( 4.72x)
avg_qpel_pixels_tab[0][6]_c:                          1775.9 ( 1.00x)
avg_qpel_pixels_tab[0][6]_mmxext:                      484.9 ( 3.66x)
avg_qpel_pixels_tab[0][6]_sse2:                        385.4 ( 4.61x)
avg_qpel_pixels_tab[0][7]_c:                          1937.0 ( 1.00x)
avg_qpel_pixels_tab[0][7]_mmxext:                      501.3 ( 3.86x)
avg_qpel_pixels_tab[0][7]_sse2:                        403.6 ( 4.80x)
avg_qpel_pixels_tab[0][8]_c:                           976.7 ( 1.00x)
avg_qpel_pixels_tab[0][8]_mmxext:                      216.9 ( 4.50x)
avg_qpel_pixels_tab[0][8]_sse2:                        113.1 ( 8.64x)
avg_qpel_pixels_tab[0][9]_c:                          1971.8 ( 1.00x)
avg_qpel_pixels_tab[0][9]_mmxext:                      494.9 ( 3.98x)
avg_qpel_pixels_tab[0][9]_sse2:                        388.3 ( 5.08x)
avg_qpel_pixels_tab[0][10]_c:                         1900.8 ( 1.00x)
avg_qpel_pixels_tab[0][10]_mmxext:                     476.4 ( 3.99x)
avg_qpel_pixels_tab[0][10]_sse2:                       362.4 ( 5.24x)
avg_qpel_pixels_tab[0][11]_c:                         2003.3 ( 1.00x)
avg_qpel_pixels_tab[0][11]_mmxext:                     496.5 ( 4.04x)
avg_qpel_pixels_tab[0][11]_sse2:                       385.9 ( 5.19x)
avg_qpel_pixels_tab[0][12]_c:                          841.8 ( 1.00x)
avg_qpel_pixels_tab[0][12]_mmxext:                     226.7 ( 3.71x)
avg_qpel_pixels_tab[0][12]_sse2:                       143.3 ( 5.87x)
avg_qpel_pixels_tab[0][13]_c:                         1929.0 ( 1.00x)
avg_qpel_pixels_tab[0][13]_mmxext:                     499.6 ( 3.86x)
avg_qpel_pixels_tab[0][13]_sse2:                       412.1 ( 4.68x)
avg_qpel_pixels_tab[0][14]_c:                         1777.9 ( 1.00x)
avg_qpel_pixels_tab[0][14]_mmxext:                     484.8 ( 3.67x)
avg_qpel_pixels_tab[0][14]_sse2:                       385.9 ( 4.61x)
avg_qpel_pixels_tab[0][15]_c:                         1914.8 ( 1.00x)
avg_qpel_pixels_tab[0][15]_mmxext:                     501.8 ( 3.82x)
avg_qpel_pixels_tab[0][15]_sse2:                       405.0 ( 4.73x)
avg_qpel_pixels_tab[1][4]_c:                           203.4 ( 1.00x)
avg_qpel_pixels_tab[1][4]_mmxext:                       64.7 ( 3.14x)
avg_qpel_pixels_tab[1][4]_sse2:                         40.3 ( 5.05x)
avg_qpel_pixels_tab[1][5]_c:                           488.8 ( 1.00x)
avg_qpel_pixels_tab[1][5]_mmxext:                      134.6 ( 3.63x)
avg_qpel_pixels_tab[1][5]_sse2:                        108.5 ( 4.50x)
avg_qpel_pixels_tab[1][6]_c:                           448.2 ( 1.00x)
avg_qpel_pixels_tab[1][6]_mmxext:                      128.8 ( 3.48x)
avg_qpel_pixels_tab[1][6]_sse2:                        102.5 ( 4.37x)
avg_qpel_pixels_tab[1][7]_c:                           489.6 ( 1.00x)
avg_qpel_pixels_tab[1][7]_mmxext:                      134.5 ( 3.64x)
avg_qpel_pixels_tab[1][7]_sse2:                        108.8 ( 4.50x)
avg_qpel_pixels_tab[1][8]_c:                           223.8 ( 1.00x)
avg_qpel_pixels_tab[1][8]_mmxext:                       57.5 ( 3.89x)
avg_qpel_pixels_tab[1][8]_sse2:                         36.3 ( 6.16x)
avg_qpel_pixels_tab[1][9]_c:                           496.6 ( 1.00x)
avg_qpel_pixels_tab[1][9]_mmxext:                      129.8 ( 3.82x)
avg_qpel_pixels_tab[1][9]_sse2:                        105.1 ( 4.72x)
avg_qpel_pixels_tab[1][10]_c:                          466.1 ( 1.00x)
avg_qpel_pixels_tab[1][10]_mmxext:                     123.2 ( 3.78x)
avg_qpel_pixels_tab[1][10]_sse2:                        99.1 ( 4.70x)
avg_qpel_pixels_tab[1][11]_c:                          497.9 ( 1.00x)
avg_qpel_pixels_tab[1][11]_mmxext:                     129.9 ( 3.83x)
avg_qpel_pixels_tab[1][11]_sse2:                       105.4 ( 4.72x)
avg_qpel_pixels_tab[1][12]_c:                          203.5 ( 1.00x)
avg_qpel_pixels_tab[1][12]_mmxext:                      63.8 ( 3.19x)
avg_qpel_pixels_tab[1][12]_sse2:                        38.8 ( 5.25x)
avg_qpel_pixels_tab[1][13]_c:                          487.9 ( 1.00x)
avg_qpel_pixels_tab[1][13]_mmxext:                     134.7 ( 3.62x)
avg_qpel_pixels_tab[1][13]_sse2:                       108.4 ( 4.50x)
avg_qpel_pixels_tab[1][14]_c:                          447.4 ( 1.00x)
avg_qpel_pixels_tab[1][14]_mmxext:                     128.2 ( 3.49x)
avg_qpel_pixels_tab[1][14]_sse2:                       102.4 ( 4.37x)
avg_qpel_pixels_tab[1][15]_c:                          487.5 ( 1.00x)
avg_qpel_pixels_tab[1][15]_mmxext:                     134.0 ( 3.64x)
avg_qpel_pixels_tab[1][15]_sse2:                       109.9 ( 4.44x)

put_no_rnd_qpel_pixels_tab[0][4]_c:                    825.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][4]_mmxext:               242.5 ( 3.40x)
put_no_rnd_qpel_pixels_tab[0][4]_sse2:                 136.0 ( 6.07x)
put_no_rnd_qpel_pixels_tab[0][5]_c:                   1837.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][5]_mmxext:               542.5 ( 3.39x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2:                 446.5 ( 4.11x)
put_no_rnd_qpel_pixels_tab[0][6]_c:                   1766.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_mmxext:               493.6 ( 3.58x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2:                 394.6 ( 4.48x)
put_no_rnd_qpel_pixels_tab[0][7]_c:                   1877.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_mmxext:               541.9 ( 3.46x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2:                 447.6 ( 4.19x)
put_no_rnd_qpel_pixels_tab[0][8]_c:                    785.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][8]_mmxext:               206.2 ( 3.81x)
put_no_rnd_qpel_pixels_tab[0][8]_sse2:                 101.6 ( 7.73x)
put_no_rnd_qpel_pixels_tab[0][9]_c:                   1772.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_mmxext:               489.5 ( 3.62x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2:                 394.8 ( 4.49x)
put_no_rnd_qpel_pixels_tab[0][10]_c:                  1711.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][10]_mmxext:              461.2 ( 3.71x)
put_no_rnd_qpel_pixels_tab[0][10]_sse2:                357.9 ( 4.78x)
put_no_rnd_qpel_pixels_tab[0][11]_c:                  1815.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_mmxext:              490.8 ( 3.70x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2:                394.0 ( 4.61x)
put_no_rnd_qpel_pixels_tab[0][12]_c:                   824.8 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][12]_mmxext:              242.9 ( 3.40x)
put_no_rnd_qpel_pixels_tab[0][12]_sse2:                135.3 ( 6.10x)
put_no_rnd_qpel_pixels_tab[0][13]_c:                  1843.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_mmxext:              545.4 ( 3.38x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2:                444.9 ( 4.14x)
put_no_rnd_qpel_pixels_tab[0][14]_c:                  1758.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_mmxext:              497.7 ( 3.53x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2:                393.5 ( 4.47x)
put_no_rnd_qpel_pixels_tab[0][15]_c:                  1861.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_mmxext:              545.0 ( 3.42x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2:                445.7 ( 4.18x)
put_no_rnd_qpel_pixels_tab[1][4]_c:                    198.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][4]_mmxext:                64.3 ( 3.08x)
put_no_rnd_qpel_pixels_tab[1][4]_sse2:                  39.8 ( 4.98x)
put_no_rnd_qpel_pixels_tab[1][5]_c:                    460.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][5]_mmxext:               137.2 ( 3.36x)
put_no_rnd_qpel_pixels_tab[1][5]_sse2:                 113.5 ( 4.06x)
put_no_rnd_qpel_pixels_tab[1][6]_c:                    441.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][6]_mmxext:               126.7 ( 3.49x)
put_no_rnd_qpel_pixels_tab[1][6]_sse2:                 103.7 ( 4.26x)
put_no_rnd_qpel_pixels_tab[1][7]_c:                    465.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][7]_mmxext:               137.7 ( 3.38x)
put_no_rnd_qpel_pixels_tab[1][7]_sse2:                 114.0 ( 4.09x)
put_no_rnd_qpel_pixels_tab[1][8]_c:                    193.8 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][8]_mmxext:                52.1 ( 3.72x)
put_no_rnd_qpel_pixels_tab[1][8]_sse2:                  27.8 ( 6.97x)
put_no_rnd_qpel_pixels_tab[1][9]_c:                    450.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][9]_mmxext:               126.2 ( 3.57x)
put_no_rnd_qpel_pixels_tab[1][9]_sse2:                 104.3 ( 4.32x)
put_no_rnd_qpel_pixels_tab[1][10]_c:                   436.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][10]_mmxext:              118.1 ( 3.69x)
put_no_rnd_qpel_pixels_tab[1][10]_sse2:                 92.4 ( 4.73x)
put_no_rnd_qpel_pixels_tab[1][11]_c:                   453.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][11]_mmxext:              128.7 ( 3.52x)
put_no_rnd_qpel_pixels_tab[1][11]_sse2:                103.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[1][12]_c:                   201.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][12]_mmxext:               64.2 ( 3.13x)
put_no_rnd_qpel_pixels_tab[1][12]_sse2:                 39.6 ( 5.08x)
put_no_rnd_qpel_pixels_tab[1][13]_c:                   461.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][13]_mmxext:              137.6 ( 3.36x)
put_no_rnd_qpel_pixels_tab[1][13]_sse2:                113.4 ( 4.07x)
put_no_rnd_qpel_pixels_tab[1][14]_c:                   442.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][14]_mmxext:              127.0 ( 3.49x)
put_no_rnd_qpel_pixels_tab[1][14]_sse2:                102.2 ( 4.33x)
put_no_rnd_qpel_pixels_tab[1][15]_c:                   462.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][15]_mmxext:              139.5 ( 3.32x)
put_no_rnd_qpel_pixels_tab[1][15]_sse2:                113.3 ( 4.09x)

put_qpel_pixels_tab[0][4]_c:                           824.6 ( 1.00x)
put_qpel_pixels_tab[0][4]_mmxext:                      220.1 ( 3.75x)
put_qpel_pixels_tab[0][4]_sse2:                        137.8 ( 5.98x)
put_qpel_pixels_tab[0][5]_c:                          1892.0 ( 1.00x)
put_qpel_pixels_tab[0][5]_mmxext:                      508.0 ( 3.72x)
put_qpel_pixels_tab[0][5]_sse2:                        408.6 ( 4.63x)
put_qpel_pixels_tab[0][6]_c:                          1758.0 ( 1.00x)
put_qpel_pixels_tab[0][6]_mmxext:                      476.7 ( 3.69x)
put_qpel_pixels_tab[0][6]_sse2:                        381.4 ( 4.61x)
put_qpel_pixels_tab[0][7]_c:                          1924.3 ( 1.00x)
put_qpel_pixels_tab[0][7]_mmxext:                      495.1 ( 3.89x)
put_qpel_pixels_tab[0][7]_sse2:                        417.2 ( 4.61x)
put_qpel_pixels_tab[0][8]_c:                           772.1 ( 1.00x)
put_qpel_pixels_tab[0][8]_mmxext:                      197.5 ( 3.91x)
put_qpel_pixels_tab[0][8]_sse2:                        118.4 ( 6.52x)
put_qpel_pixels_tab[0][9]_c:                          1778.2 ( 1.00x)
put_qpel_pixels_tab[0][9]_mmxext:                      476.7 ( 3.73x)
put_qpel_pixels_tab[0][9]_sse2:                        379.6 ( 4.68x)
put_qpel_pixels_tab[0][10]_c:                         1714.6 ( 1.00x)
put_qpel_pixels_tab[0][10]_mmxext:                     460.7 ( 3.72x)
put_qpel_pixels_tab[0][10]_sse2:                       386.8 ( 4.43x)
put_qpel_pixels_tab[0][11]_c:                         1819.1 ( 1.00x)
put_qpel_pixels_tab[0][11]_mmxext:                     474.9 ( 3.83x)
put_qpel_pixels_tab[0][11]_sse2:                       404.5 ( 4.50x)
put_qpel_pixels_tab[0][12]_c:                          829.7 ( 1.00x)
put_qpel_pixels_tab[0][12]_mmxext:                     221.5 ( 3.75x)
put_qpel_pixels_tab[0][12]_sse2:                       138.7 ( 5.98x)
put_qpel_pixels_tab[0][13]_c:                         1892.8 ( 1.00x)
put_qpel_pixels_tab[0][13]_mmxext:                     494.4 ( 3.83x)
put_qpel_pixels_tab[0][13]_sse2:                       413.9 ( 4.57x)
put_qpel_pixels_tab[0][14]_c:                         1763.1 ( 1.00x)
put_qpel_pixels_tab[0][14]_mmxext:                     473.4 ( 3.72x)
put_qpel_pixels_tab[0][14]_sse2:                       377.8 ( 4.67x)
put_qpel_pixels_tab[0][15]_c:                         1896.4 ( 1.00x)
put_qpel_pixels_tab[0][15]_mmxext:                     492.5 ( 3.85x)
put_qpel_pixels_tab[0][15]_sse2:                       399.0 ( 4.75x)
put_qpel_pixels_tab[1][4]_c:                           198.6 ( 1.00x)
put_qpel_pixels_tab[1][4]_mmxext:                       60.9 ( 3.26x)
put_qpel_pixels_tab[1][4]_sse2:                         40.1 ( 4.95x)
put_qpel_pixels_tab[1][5]_c:                           471.4 ( 1.00x)
put_qpel_pixels_tab[1][5]_mmxext:                      131.8 ( 3.58x)
put_qpel_pixels_tab[1][5]_sse2:                        107.2 ( 4.40x)
put_qpel_pixels_tab[1][6]_c:                           440.3 ( 1.00x)
put_qpel_pixels_tab[1][6]_mmxext:                      126.3 ( 3.49x)
put_qpel_pixels_tab[1][6]_sse2:                        100.6 ( 4.38x)
put_qpel_pixels_tab[1][7]_c:                           469.2 ( 1.00x)
put_qpel_pixels_tab[1][7]_mmxext:                      131.7 ( 3.56x)
put_qpel_pixels_tab[1][7]_sse2:                        106.9 ( 4.39x)
put_qpel_pixels_tab[1][8]_c:                           194.2 ( 1.00x)
put_qpel_pixels_tab[1][8]_mmxext:                       52.9 ( 3.67x)
put_qpel_pixels_tab[1][8]_sse2:                         28.0 ( 6.95x)
put_qpel_pixels_tab[1][9]_c:                           464.6 ( 1.00x)
put_qpel_pixels_tab[1][9]_mmxext:                      125.1 ( 3.71x)
put_qpel_pixels_tab[1][9]_sse2:                        100.9 ( 4.60x)
put_qpel_pixels_tab[1][10]_c:                          433.8 ( 1.00x)
put_qpel_pixels_tab[1][10]_mmxext:                     118.2 ( 3.67x)
put_qpel_pixels_tab[1][10]_sse2:                        94.5 ( 4.59x)
put_qpel_pixels_tab[1][11]_c:                          463.9 ( 1.00x)
put_qpel_pixels_tab[1][11]_mmxext:                     125.5 ( 3.70x)
put_qpel_pixels_tab[1][11]_sse2:                       102.6 ( 4.52x)
put_qpel_pixels_tab[1][12]_c:                          199.2 ( 1.00x)
put_qpel_pixels_tab[1][12]_mmxext:                      63.7 ( 3.12x)
put_qpel_pixels_tab[1][12]_sse2:                        36.2 ( 5.50x)
put_qpel_pixels_tab[1][13]_c:                          475.6 ( 1.00x)
put_qpel_pixels_tab[1][13]_mmxext:                     139.5 ( 3.41x)
put_qpel_pixels_tab[1][13]_sse2:                       107.3 ( 4.43x)
put_qpel_pixels_tab[1][14]_c:                          441.9 ( 1.00x)
put_qpel_pixels_tab[1][14]_mmxext:                     126.9 ( 3.48x)
put_qpel_pixels_tab[1][14]_sse2:                       101.3 ( 4.36x)
put_qpel_pixels_tab[1][15]_c:                          475.9 ( 1.00x)
put_qpel_pixels_tab[1][15]_mmxext:                     131.9 ( 3.61x)
put_qpel_pixels_tab[1][15]_sse2:                       107.0 ( 4.45x)

The new functions (in qpeldsp.asm) occupy 8244B (the MMXEXT functions
which they will replace occupy only 6720B).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
405465700c avcodec/x86/qpeldsp: Don't allocate stack unnecessarily
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
188df9549c avcodec/x86/qpeldsp: Don't use too much stack
We only need (SIZE+1)*SIZE words.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
bcf7293a21 avcodec/x86/qpeldsp: Remove unused declaration
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
7b56259dd5 avcodec/x86/constants: Move ff_pw_{15,20} to qpeldsp.asm
Only used there.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
c2685234a6 avcodec/x86/qpeldsp_init: Deduplicate 8x8 and 16x16 code
Also split the big macro into smaller ones for the pure horizontal vs
the pure vertical and the mixed directions.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
cf79d8052d avcodec/x86/qpeldsp_init: Specify alignment properly
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
69906d31c5 avcodec/x86/qpeldsp_init: Don't use unnecessarily big stack buffer
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
d3bd1318b3 avcodec/x86/qpeldsp: Don't zero unnecessarily
This value is write-only.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
d46414b46b avcodec/x86/qpeldsp: Simplify resetting output pointer
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Martin Storsjö
963ea707e3 arm/rv40dsp: Add * on comment continuation lines in prototypes
This avoids that the assembly indenter script tries to indent these
lines as assembly code.
2026-04-29 13:53:07 +03:00
Martin Storsjö
0a86aead82 arm/vc1dsp: Fix a few cases of inconsistent indentation
The function ff_vc1_unescape_buffer_helper_neon intentionally
uses unusual indentation, to indicate different levels of
unrolling in the function.
2026-04-29 13:53:07 +03:00
Martin Storsjö
10a45072fc arm/jrevdct: Indent previously unindented assembly
The comments have been manually tweaked to line up properly.
2026-04-29 13:53:07 +03:00
Martin Storsjö
5e0f1b1eda arm/hevcdsp_qpel: Reindent code that seem to lack consistent indentation 2026-04-29 13:53:07 +03:00
Martin Storsjö
65d4c5bbe2 arm: Reindent asm that used consistent but differing styles
The qpel_filter macros in hevcdsp_qpel_neon.S have been
manually tweaked to keep reasonable indentation of the
comments.
2026-04-29 13:53:07 +03:00
Martin Storsjö
2325421904 arm/synth_filter_vfp: Fix indentation
This was done with manual adjustments; the reindentation
script doesn't handle the VFP/NOVFP macros at the start of
lines.
2026-04-29 13:53:07 +03:00
Ramiro Polla
8d9c1db95d arm/simple_idct_arm: Reindent previously unindented code 2026-04-29 13:53:07 +03:00
Martin Storsjö
a65ed248fd arm/simple_idct_armv6: Reindent previously consistent assembly to shared style
This has manual fixups, as the indenting script wants to
lowercase constants like W46 to w46, which breaks things.
2026-04-29 13:49:27 +03:00
Martin Storsjö
b27fd61020 arm/simple_idct_armv5te: Reindent previously consistent code to common style
This has manual fixups, as the indenting script wants to
lowercase constants like W26 to w26, which breaks things.
2026-04-29 13:49:27 +03:00
Martin Storsjö
8e199a2a9f arm/rv34dsp: Adjust macro argument indentation slightly
The previous form did neatly align with the lines above, but doesn't
match general indentation rules from our indentation script.
2026-04-29 13:49:27 +03:00
Martin Storsjö
d94e2b0f7c arm/hevcdsp: Fix misindented instructions in some macros 2026-04-29 13:49:27 +03:00
Martin Storsjö
7eaeb5ab4a arm: Fix indentation of stray individual misaligned instructions 2026-04-29 13:49:27 +03:00
Martin Storsjö
17765fe831 arm: Reindent assembly where it was off by one char 2026-04-29 13:49:27 +03:00
Marvin Scholz
f044c5e627 doc: remove unclear description
There is no caller when presuming that the user will use lavc for
decoding.
2026-04-28 14:31:19 +02:00
Marvin Scholz
c9937ff139 doc: mark functions related to AVCodecParameters
This makes these functions appear in the AVCodecParameters
documentation page, so they are easier to find.
2026-04-28 14:31:19 +02:00
Marvin Scholz
ab1a970bc0 doc: style changes for the AVCodecParameters
Mostly adding references and making the video/audio only
annotations not be the brief description.
2026-04-28 14:31:19 +02:00
Marvin Scholz
e4f6aa8611 avcodec/wmadec: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
dc7692b831 avcodec/aac: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
97ff804e21 avcodec/ac3dec: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
a384a4ff3a avcodec/ansi: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
5cee00b85f avcodec/argo: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
0f3fe9e2bf avcodec/avs: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
e5e12328bf avcodec/bethsoftvideo: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
0f81f78829 avcodec/bink: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
49c62c3337 avcodec/bintext: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
d578926366 avcodec/c39: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
7b94360e0e avcodec/cavs: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
c772decdd0 avcodec/dca: add break 2026-04-28 12:29:37 +00:00
Marvin Scholz
5cdbd0337f avcodec/dds: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
9a765c453a avcodec/dpxenc: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
b70d6b4f58 avcodec/dv: add break 2026-04-28 12:29:37 +00:00
Marvin Scholz
5a5742498b avcodec/dxa: add fall-through annotations 2026-04-28 12:29:37 +00:00