Commit graph

124283 commits

Author SHA1 Message Date
Dale Curtis
a7d42bfba8 avformat/mov: Limit maximum box size for mov_read_lhvc()
Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
2026-04-30 22:50:51 +00:00
Nil Fons Miret
e294b390a0 avfilter/vf_unsharp: fix amount scaling in the high-bit-depth path
The 16-bit kernel is dispatched for every non-8-bit pixel format
(9/10/12/16-bit content, all stored in uint16_t). It's supposed to
undo the Q16 scaling that set_filter_param() applies to `amount`:

    fp->amount = amount * 65536.0;

but the shift written in the kernel is `>> (8+nbits)`, which for the
nbits=16 instantiation of the macro comes out to `>> 24` instead of
`>> 16`. Because of this, on any non-8-bit input, unsharp applies ~1/256
of the user's requested strength and is effectively a no-op. The
8-bit kernel (nbits=8) happens to be correct because 8+8 == 16.

This commit also widens the intermediate product to int64 before the
shift, to avoid a potential overflow. Take a 16-bit pixel at the
edge of a sharp white/black region, with the user-facing `amount`
set to its declared maximum of 5.0.

    *srx       = 65535
    blur       = 32768
    diff       = *srx - blur                  = 32767
    amount_q16 = 5.0 * 65536                  = 327680

Then the kernel computes:

    product    = diff * amount_q16
               = 32767 * 327680               = 10,737,090,560     (~1.07e10)

which overflows INT32_MAX. Widening to int64 keeps the
multiplication in range; the subsequent `>> 16` brings it back to
sample range and the final cast to int32 is then safe. The widening
is a semantic no-op for 8/9/10/12-bit content where the product
always fits in int32 (worst case at 12-bit: 4095 * 327680 ~ 1.34e9).

Introduced by ee792ebe08 (2019-11-08, "avfilter/vf_unsharp: add 10bit
support"). The fate-filter-unsharp-yuv420p10 reference added in the
same series was generated from the broken kernel and is regenerated
here. fate-filter-unsharp (8-bit) is unaffected.

Repro:

    python3 -c "import numpy as np; y=np.tile(np.where(np.arange(128)//8 & 1, 512, 256).astype('<u2'), (128,1)); c=np.full((64,64), 512, '<u2'); open('in.yuv','wb').write(y.tobytes()+c.tobytes()*2)"

    ffmpeg -f rawvideo -pix_fmt yuv420p10le -s 128x128 -i in.yuv \
        -lavfi "split=2[a][b];[b]unsharp=la=1[bs];[a][bs]psnr" \
        -f null - 2>&1 | grep PSNR

Before: `PSNR y:66.50 ...` -- the filter is effectively a no-op,
        so the sharpened output matches the input almost exactly.
After:  `PSNR y:28.27 ...` -- the filter actually sharpens, so
        output and input differ as expected.

Signed-off-by: Nil Fons Miret <nilf@netflix.com>
Made-with: Cursor
2026-04-30 21:15:58 +00:00
depthfirst-dev[bot]
68ea660d83 avformat/mov: reject dimg references with zero entries
Reject dimg entries with a zero reference count in mov_read_iref_dimg().
This is the earliest point where the parser learns how many input images
a derived HEIF item references, so it is the right place to enforce the
invariant.

If entries == 0 is accepted here, the value is stored in HEIFGrid.nb_tiles,
later propagated by read_image_iovl() into AVStreamGroupTileGrid.nb_tiles,
and finally consumed in istg_parse_tile_grid(), which assumes at least one
tile and reads tg->offsets[tg->nb_tiles - 1]. With zero tiles, that
assumption breaks and leads to the out-of-bounds access seen in ASan.

Fixing the problem at the parser boundary is preferable to adding a later
workaround because it prevents creation of an invalid derived-image state
and stops that malformed state from reaching downstream consumers.

This is also consistent with the HEIF specification. Both iovl and grid
derived images are formed from one or more input images, and for grid the
dimg reference count must equal rows * columns; since rows and columns are
encoded as *_minus_one + 1, that count cannot be zero. A zero dimg entry
count is therefore invalid input and should be rejected when parsed.
2026-04-30 19:19:07 +00:00
Romain Beauxis
0f6ba39122 avfilter/vf_frei0r: guard against NULL string fields. 2026-04-30 08:33:31 -05:00
Andreas Rheinhardt
cc3ca17127 avcodec/x86/qpeldsp{,_init}: Use proper prefix
E.g. rename ff_put_mpeg4_qpel8_h_lowpass_ssse3 to
ff_mpeg4_put_qpel8_h_lowpass_ssse3.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
ca43bc6202 avcodec/x86/qpeldsp_init: Mark functions as hidden
It allows pic 32bit code to call the underlying
assembly functions directly, without loading
the GOT first; this saves 1245B of .text here
(for 32bit pic code).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
23d3116af9 avcodec/x86/qpeldsp: Add combination of h_lowpass + l2
If the subpel part of the horizontal component of
the motion vector is 1/4 or 3/4, the MPEG-4 qpel motion compensation
first computes the mc for the corresponding motion vector
with 1/2 horizontal subpel part and then averages this
with the left (for 1/4) or the right (for 3/4) source pixel.
These two stages are currently performed in two different functions,
involving a stack buffer as intermediate.

This means that horizontal prediction for every function with
a 1/4 or 3/4 horizontal subpel mv is more expensive code-size wise
(and also performance-wise) as it involves two calls. Given that
the horizontal lowpass functions are not that long, adding combinations
of h_lowpass+l2 actually reduces binary size: An increase of 1136B
in the asm files is more than offset by size reductions in
the wrappers: 1968B here when not using stack protection,
2256B when using stack protection.

Of course it also improves performance. Old benchmarks:
avg_qpel_pixels_tab[0][1]_ssse3:                       106.9 ( 8.69x)
avg_qpel_pixels_tab[0][3]_ssse3:                       105.5 ( 8.84x)
avg_qpel_pixels_tab[0][5]_ssse3:                       226.9 ( 8.57x)
avg_qpel_pixels_tab[0][7]_ssse3:                       231.1 ( 8.38x)
avg_qpel_pixels_tab[0][9]_ssse3:                       217.8 ( 9.04x)
avg_qpel_pixels_tab[0][11]_ssse3:                      214.9 ( 9.32x)
avg_qpel_pixels_tab[0][13]_ssse3:                      227.1 ( 8.48x)
avg_qpel_pixels_tab[0][15]_ssse3:                      236.1 ( 8.02x)

New benchmarks:
avg_qpel_pixels_tab[0][1]_ssse3:                        96.7 ( 9.65x)
avg_qpel_pixels_tab[0][3]_ssse3:                        96.6 ( 9.73x)
avg_qpel_pixels_tab[0][5]_ssse3:                       225.8 ( 8.61x)
avg_qpel_pixels_tab[0][7]_ssse3:                       228.4 ( 8.51x)
avg_qpel_pixels_tab[0][9]_ssse3:                       217.1 ( 9.05x)
avg_qpel_pixels_tab[0][11]_ssse3:                      217.8 ( 9.32x)
avg_qpel_pixels_tab[0][13]_ssse3:                      227.2 ( 8.54x)
avg_qpel_pixels_tab[0][15]_ssse3:                      220.5 ( 8.72x)

Note: The l2 functions are also used for vertical lowpass
functions, yet given that they are much bigger, duplicating
them would lead to massive code size increase.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
f946cac2d9 avcodec/x86/qpeldsp: Remove horizontal mmxext mc functions
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
1d040c527d avcodec/x86/qpeldsp: Add SSSE3 size 8 horizontal filter
Beats the mmxext version by a lot (in the following,
[1][1-3] refers to horizontal-only size 8 mc;
the _sse2 comparators for the other cases use mmxext
horizontal mc coupled with vertical SSE2 mc):

avg_qpel_pixels_tab[1][1]_c:                           223.9 ( 1.00x)
avg_qpel_pixels_tab[1][1]_mmxext:                       66.2 ( 3.38x)
avg_qpel_pixels_tab[1][1]_ssse3:                        36.8 ( 6.08x)
avg_qpel_pixels_tab[1][2]_c:                           251.0 ( 1.00x)
avg_qpel_pixels_tab[1][2]_mmxext:                       58.5 ( 4.29x)
avg_qpel_pixels_tab[1][2]_ssse3:                        25.5 ( 9.84x)
avg_qpel_pixels_tab[1][3]_c:                           226.9 ( 1.00x)
avg_qpel_pixels_tab[1][3]_mmxext:                       66.3 ( 3.42x)
avg_qpel_pixels_tab[1][3]_ssse3:                        35.8 ( 6.34x)
avg_qpel_pixels_tab[1][5]_c:                           473.9 ( 1.00x)
avg_qpel_pixels_tab[1][5]_sse2:                        110.7 ( 4.28x)
avg_qpel_pixels_tab[1][5]_ssse3:                        76.0 ( 6.24x)
avg_qpel_pixels_tab[1][6]_c:                           440.9 ( 1.00x)
avg_qpel_pixels_tab[1][6]_sse2:                        102.1 ( 4.32x)
avg_qpel_pixels_tab[1][6]_ssse3:                        67.1 ( 6.58x)
avg_qpel_pixels_tab[1][7]_c:                           473.8 ( 1.00x)
avg_qpel_pixels_tab[1][7]_sse2:                        108.0 ( 4.39x)
avg_qpel_pixels_tab[1][7]_ssse3:                        74.6 ( 6.35x)
avg_qpel_pixels_tab[1][9]_c:                           492.9 ( 1.00x)
avg_qpel_pixels_tab[1][9]_sse2:                        102.1 ( 4.83x)
avg_qpel_pixels_tab[1][9]_ssse3:                        67.1 ( 7.35x)
avg_qpel_pixels_tab[1][10]_c:                          465.6 ( 1.00x)
avg_qpel_pixels_tab[1][10]_sse2:                        94.9 ( 4.91x)
avg_qpel_pixels_tab[1][10]_ssse3:                       57.5 ( 8.10x)
avg_qpel_pixels_tab[1][11]_c:                          492.8 ( 1.00x)
avg_qpel_pixels_tab[1][11]_sse2:                       102.4 ( 4.81x)
avg_qpel_pixels_tab[1][11]_ssse3:                       68.7 ( 7.17x)
avg_qpel_pixels_tab[1][13]_c:                          476.6 ( 1.00x)
avg_qpel_pixels_tab[1][13]_sse2:                       108.6 ( 4.39x)
avg_qpel_pixels_tab[1][13]_ssse3:                       74.7 ( 6.38x)
avg_qpel_pixels_tab[1][14]_c:                          434.9 ( 1.00x)
avg_qpel_pixels_tab[1][14]_sse2:                       102.2 ( 4.25x)
avg_qpel_pixels_tab[1][14]_ssse3:                       66.6 ( 6.53x)
avg_qpel_pixels_tab[1][15]_c:                          474.1 ( 1.00x)
avg_qpel_pixels_tab[1][15]_sse2:                       107.9 ( 4.39x)
avg_qpel_pixels_tab[1][15]_ssse3:                       74.3 ( 6.38x)
put_no_rnd_qpel_pixels_tab[1][1]_c:                    222.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][1]_mmxext:                66.0 ( 3.37x)
put_no_rnd_qpel_pixels_tab[1][1]_ssse3:                 35.2 ( 6.31x)
put_no_rnd_qpel_pixels_tab[1][2]_c:                    212.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][2]_mmxext:                56.8 ( 3.74x)
put_no_rnd_qpel_pixels_tab[1][2]_ssse3:                 25.0 ( 8.48x)
put_no_rnd_qpel_pixels_tab[1][3]_c:                    224.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][3]_mmxext:                65.8 ( 3.41x)
put_no_rnd_qpel_pixels_tab[1][3]_ssse3:                 35.8 ( 6.26x)
put_no_rnd_qpel_pixels_tab[1][5]_c:                    460.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][5]_sse2:                 114.6 ( 4.01x)
put_no_rnd_qpel_pixels_tab[1][5]_ssse3:                 83.1 ( 5.53x)
put_no_rnd_qpel_pixels_tab[1][6]_c:                    438.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][6]_sse2:                 104.2 ( 4.21x)
put_no_rnd_qpel_pixels_tab[1][6]_ssse3:                 67.5 ( 6.50x)
put_no_rnd_qpel_pixels_tab[1][7]_c:                    458.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][7]_sse2:                 113.8 ( 4.02x)
put_no_rnd_qpel_pixels_tab[1][7]_ssse3:                 79.9 ( 5.73x)
put_no_rnd_qpel_pixels_tab[1][9]_c:                    439.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][9]_sse2:                 103.7 ( 4.23x)
put_no_rnd_qpel_pixels_tab[1][9]_ssse3:                 68.9 ( 6.37x)
put_no_rnd_qpel_pixels_tab[1][10]_c:                   427.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][10]_sse2:                 93.2 ( 4.58x)
put_no_rnd_qpel_pixels_tab[1][10]_ssse3:                57.9 ( 7.37x)
put_no_rnd_qpel_pixels_tab[1][11]_c:                   439.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][11]_sse2:                104.0 ( 4.23x)
put_no_rnd_qpel_pixels_tab[1][11]_ssse3:                69.2 ( 6.36x)
put_no_rnd_qpel_pixels_tab[1][13]_c:                   459.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][13]_sse2:                113.2 ( 4.06x)
put_no_rnd_qpel_pixels_tab[1][13]_ssse3:                83.8 ( 5.48x)
put_no_rnd_qpel_pixels_tab[1][14]_c:                   439.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][14]_sse2:                103.3 ( 4.25x)
put_no_rnd_qpel_pixels_tab[1][14]_ssse3:                67.9 ( 6.47x)
put_no_rnd_qpel_pixels_tab[1][15]_c:                   453.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][15]_sse2:                113.7 ( 3.99x)
put_no_rnd_qpel_pixels_tab[1][15]_ssse3:                80.0 ( 5.67x)
put_qpel_pixels_tab[1][1]_c:                           229.0 ( 1.00x)
put_qpel_pixels_tab[1][1]_mmxext:                       65.5 ( 3.50x)
put_qpel_pixels_tab[1][1]_ssse3:                        33.8 ( 6.77x)
put_qpel_pixels_tab[1][2]_c:                           212.5 ( 1.00x)
put_qpel_pixels_tab[1][2]_mmxext:                       56.6 ( 3.75x)
put_qpel_pixels_tab[1][2]_ssse3:                        23.4 ( 9.08x)
put_qpel_pixels_tab[1][3]_c:                           227.5 ( 1.00x)
put_qpel_pixels_tab[1][3]_mmxext:                       64.4 ( 3.53x)
put_qpel_pixels_tab[1][3]_ssse3:                        33.5 ( 6.79x)
put_qpel_pixels_tab[1][5]_c:                           466.5 ( 1.00x)
put_qpel_pixels_tab[1][5]_sse2:                        106.8 ( 4.37x)
put_qpel_pixels_tab[1][5]_ssse3:                        71.8 ( 6.50x)
put_qpel_pixels_tab[1][6]_c:                           438.7 ( 1.00x)
put_qpel_pixels_tab[1][6]_sse2:                        102.0 ( 4.30x)
put_qpel_pixels_tab[1][6]_ssse3:                        65.3 ( 6.72x)
put_qpel_pixels_tab[1][7]_c:                           466.0 ( 1.00x)
put_qpel_pixels_tab[1][7]_sse2:                        106.3 ( 4.38x)
put_qpel_pixels_tab[1][7]_ssse3:                        70.9 ( 6.57x)
put_qpel_pixels_tab[1][9]_c:                           456.0 ( 1.00x)
put_qpel_pixels_tab[1][9]_sse2:                        100.1 ( 4.55x)
put_qpel_pixels_tab[1][9]_ssse3:                        64.0 ( 7.13x)
put_qpel_pixels_tab[1][10]_c:                          425.1 ( 1.00x)
put_qpel_pixels_tab[1][10]_sse2:                        92.6 ( 4.59x)
put_qpel_pixels_tab[1][10]_ssse3:                       55.1 ( 7.71x)
put_qpel_pixels_tab[1][11]_c:                          452.7 ( 1.00x)
put_qpel_pixels_tab[1][11]_sse2:                        99.6 ( 4.55x)
put_qpel_pixels_tab[1][11]_ssse3:                       63.8 ( 7.09x)
put_qpel_pixels_tab[1][13]_c:                          471.2 ( 1.00x)
put_qpel_pixels_tab[1][13]_sse2:                       106.4 ( 4.43x)
put_qpel_pixels_tab[1][13]_ssse3:                       71.4 ( 6.60x)
put_qpel_pixels_tab[1][14]_c:                          439.7 ( 1.00x)
put_qpel_pixels_tab[1][14]_sse2:                       101.8 ( 4.32x)
put_qpel_pixels_tab[1][14]_ssse3:                       64.8 ( 6.79x)
put_qpel_pixels_tab[1][15]_c:                          467.8 ( 1.00x)
put_qpel_pixels_tab[1][15]_sse2:                       106.1 ( 4.41x)
put_qpel_pixels_tab[1][15]_ssse3:                       72.6 ( 6.44x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
c0e1c1d6b3 avcodec/x86/qpeldsp: Add SSSE3 size 16 horizontal filter
Beats the mmxext version by a lot (in the following,
[0][1-3] refers to horizontal-only size 16 mc;
the _sse2 comparators for the other cases use mmxext
horizontal mc coupled with vertical SSE2 mc):

avg_qpel_pixels_tab[0][1]_c:                           945.5 ( 1.00x)
avg_qpel_pixels_tab[0][1]_mmxext:                      262.6 ( 3.60x)
avg_qpel_pixels_tab[0][1]_ssse3:                       110.4 ( 8.57x)
avg_qpel_pixels_tab[0][2]_c:                          1042.1 ( 1.00x)
avg_qpel_pixels_tab[0][2]_mmxext:                      245.1 ( 4.25x)
avg_qpel_pixels_tab[0][2]_ssse3:                        91.7 (11.37x)
avg_qpel_pixels_tab[0][3]_c:                           941.8 ( 1.00x)
avg_qpel_pixels_tab[0][3]_mmxext:                      260.1 ( 3.62x)
avg_qpel_pixels_tab[0][3]_ssse3:                       110.1 ( 8.56x)
avg_qpel_pixels_tab[0][5]_c:                          1939.5 ( 1.00x)
avg_qpel_pixels_tab[0][5]_sse2:                        394.3 ( 4.92x)
avg_qpel_pixels_tab[0][5]_ssse3:                       247.4 ( 7.84x)
avg_qpel_pixels_tab[0][6]_c:                          1785.8 ( 1.00x)
avg_qpel_pixels_tab[0][6]_sse2:                        380.6 ( 4.69x)
avg_qpel_pixels_tab[0][6]_ssse3:                       221.1 ( 8.08x)
avg_qpel_pixels_tab[0][7]_c:                          1932.5 ( 1.00x)
avg_qpel_pixels_tab[0][7]_sse2:                        393.4 ( 4.91x)
avg_qpel_pixels_tab[0][7]_ssse3:                       238.8 ( 8.09x)
avg_qpel_pixels_tab[0][9]_c:                          1976.9 ( 1.00x)
avg_qpel_pixels_tab[0][9]_sse2:                        380.8 ( 5.19x)
avg_qpel_pixels_tab[0][9]_ssse3:                       223.3 ( 8.85x)
avg_qpel_pixels_tab[0][10]_c:                         1911.9 ( 1.00x)
avg_qpel_pixels_tab[0][10]_sse2:                       366.9 ( 5.21x)
avg_qpel_pixels_tab[0][10]_ssse3:                      207.0 ( 9.24x)
avg_qpel_pixels_tab[0][11]_c:                         2046.9 ( 1.00x)
avg_qpel_pixels_tab[0][11]_sse2:                       385.5 ( 5.31x)
avg_qpel_pixels_tab[0][11]_ssse3:                      227.9 ( 8.98x)
avg_qpel_pixels_tab[0][13]_c:                         1940.8 ( 1.00x)
avg_qpel_pixels_tab[0][13]_sse2:                       389.7 ( 4.98x)
avg_qpel_pixels_tab[0][13]_ssse3:                      244.2 ( 7.95x)
avg_qpel_pixels_tab[0][14]_c:                         1778.4 ( 1.00x)
avg_qpel_pixels_tab[0][14]_sse2:                       379.2 ( 4.69x)
avg_qpel_pixels_tab[0][14]_ssse3:                      223.5 ( 7.96x)
avg_qpel_pixels_tab[0][15]_c:                         1905.9 ( 1.00x)
avg_qpel_pixels_tab[0][15]_sse2:                       398.9 ( 4.78x)
avg_qpel_pixels_tab[0][15]_ssse3:                      238.3 ( 8.00x)
put_no_rnd_qpel_pixels_tab[0][1]_c:                    922.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][1]_mmxext:               275.0 ( 3.35x)
put_no_rnd_qpel_pixels_tab[0][1]_ssse3:                108.4 ( 8.51x)
put_no_rnd_qpel_pixels_tab[0][2]_c:                    889.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][2]_mmxext:               236.7 ( 3.76x)
put_no_rnd_qpel_pixels_tab[0][2]_ssse3:                 86.8 (10.25x)
put_no_rnd_qpel_pixels_tab[0][3]_c:                    915.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][3]_mmxext:               274.3 ( 3.34x)
put_no_rnd_qpel_pixels_tab[0][3]_ssse3:                108.2 ( 8.46x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2:                 400.0 ( 4.63x)
put_no_rnd_qpel_pixels_tab[0][5]_ssse3:                246.0 ( 7.53x)
put_no_rnd_qpel_pixels_tab[0][6]_c:                   1753.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2:                 382.5 ( 4.59x)
put_no_rnd_qpel_pixels_tab[0][6]_ssse3:                226.4 ( 7.75x)
put_no_rnd_qpel_pixels_tab[0][7]_c:                   1854.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2:                 393.5 ( 4.71x)
put_no_rnd_qpel_pixels_tab[0][7]_ssse3:                248.6 ( 7.46x)
put_no_rnd_qpel_pixels_tab[0][9]_c:                   1794.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2:                 382.2 ( 4.70x)
put_no_rnd_qpel_pixels_tab[0][9]_ssse3:                228.0 ( 7.87x)
put_no_rnd_qpel_pixels_tab[0][10]_c:                  1724.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][10]_sse2:                353.8 ( 4.88x)
put_no_rnd_qpel_pixels_tab[0][10]_ssse3:               206.5 ( 8.35x)
put_no_rnd_qpel_pixels_tab[0][11]_c:                  1796.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2:                378.1 ( 4.75x)
put_no_rnd_qpel_pixels_tab[0][11]_ssse3:               227.1 ( 7.91x)
put_no_rnd_qpel_pixels_tab[0][13]_c:                  1834.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2:                400.7 ( 4.58x)
put_no_rnd_qpel_pixels_tab[0][13]_ssse3:               244.2 ( 7.51x)
put_no_rnd_qpel_pixels_tab[0][14]_c:                  1755.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2:                387.2 ( 4.53x)
put_no_rnd_qpel_pixels_tab[0][14]_ssse3:               226.8 ( 7.74x)
put_no_rnd_qpel_pixels_tab[0][15]_c:                  1847.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2:                400.6 ( 4.61x)
put_no_rnd_qpel_pixels_tab[0][15]_ssse3:               246.1 ( 7.51x)
put_qpel_pixels_tab[0][1]_c:                           919.6 ( 1.00x)
put_qpel_pixels_tab[0][1]_mmxext:                      255.5 ( 3.60x)
put_qpel_pixels_tab[0][1]_ssse3:                       108.3 ( 8.49x)
put_qpel_pixels_tab[0][2]_c:                           883.9 ( 1.00x)
put_qpel_pixels_tab[0][2]_mmxext:                      238.1 ( 3.71x)
put_qpel_pixels_tab[0][2]_ssse3:                        86.7 (10.19x)
put_qpel_pixels_tab[0][3]_c:                           921.9 ( 1.00x)
put_qpel_pixels_tab[0][3]_mmxext:                      258.9 ( 3.56x)
put_qpel_pixels_tab[0][3]_ssse3:                       108.1 ( 8.53x)
put_qpel_pixels_tab[0][5]_c:                          1907.5 ( 1.00x)
put_qpel_pixels_tab[0][5]_sse2:                        384.2 ( 4.96x)
put_qpel_pixels_tab[0][5]_ssse3:                       234.8 ( 8.13x)
put_qpel_pixels_tab[0][6]_c:                          1757.4 ( 1.00x)
put_qpel_pixels_tab[0][6]_sse2:                        382.8 ( 4.59x)
put_qpel_pixels_tab[0][6]_ssse3:                       217.6 ( 8.08x)
put_qpel_pixels_tab[0][7]_c:                          1927.5 ( 1.00x)
put_qpel_pixels_tab[0][7]_sse2:                        384.6 ( 5.01x)
put_qpel_pixels_tab[0][7]_ssse3:                       231.2 ( 8.34x)
put_qpel_pixels_tab[0][9]_c:                          1832.1 ( 1.00x)
put_qpel_pixels_tab[0][9]_sse2:                        374.8 ( 4.89x)
put_qpel_pixels_tab[0][9]_ssse3:                       219.4 ( 8.35x)
put_qpel_pixels_tab[0][10]_c:                         1710.3 ( 1.00x)
put_qpel_pixels_tab[0][10]_sse2:                       384.5 ( 4.45x)
put_qpel_pixels_tab[0][10]_ssse3:                      202.9 ( 8.43x)
put_qpel_pixels_tab[0][11]_c:                         1825.0 ( 1.00x)
put_qpel_pixels_tab[0][11]_sse2:                       369.6 ( 4.94x)
put_qpel_pixels_tab[0][11]_ssse3:                      216.8 ( 8.42x)
put_qpel_pixels_tab[0][13]_c:                         1898.4 ( 1.00x)
put_qpel_pixels_tab[0][13]_sse2:                       384.9 ( 4.93x)
put_qpel_pixels_tab[0][13]_ssse3:                      238.6 ( 7.96x)
put_qpel_pixels_tab[0][14]_c:                         1779.1 ( 1.00x)
put_qpel_pixels_tab[0][14]_sse2:                       373.3 ( 4.77x)
put_qpel_pixels_tab[0][14]_ssse3:                      218.1 ( 8.16x)
put_qpel_pixels_tab[0][15]_c:                         1918.2 ( 1.00x)
put_qpel_pixels_tab[0][15]_sse2:                       385.3 ( 4.98x)
put_qpel_pixels_tab[0][15]_ssse3:                      236.8 ( 8.10x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
a3d747f344 avcodec/x86/qpeldsp{,_init}: Use SSE2 pixels16x16_l2 functions
put and avg versions have been added and used in H264
in b91081274f. This commit
adds the size 16 version of put_no_rnd and uses all three
of them in the SSE2 size 16 qpel functions (i.e. it uses
them in the ones that have a vertical component); it also
removes the 16x17 MMXEXT versions (which are no longer used).

This is particularly beneficial for put_no_rnd:
avg_qpel_pixels_tab[0][5]_c:                          1910.9 ( 1.00x)
avg_qpel_pixels_tab[0][5]_sse2 (old):                  405.1 ( 4.72x)
avg_qpel_pixels_tab[0][5]_sse2:                        392.9 ( 4.86x)
avg_qpel_pixels_tab[0][6]_c:                          1778.9 ( 1.00x)
avg_qpel_pixels_tab[0][6]_sse2 (old):                  385.5 ( 4.61x)
avg_qpel_pixels_tab[0][6]_sse2:                        374.9 ( 4.75x)
avg_qpel_pixels_tab[0][7]_c:                          1935.3 ( 1.00x)
avg_qpel_pixels_tab[0][7]_sse2 (old):                  403.1 ( 4.80x)
avg_qpel_pixels_tab[0][7]_sse2:                        391.6 ( 4.94x)
avg_qpel_pixels_tab[0][9]_c:                          1969.0 ( 1.00x)
avg_qpel_pixels_tab[0][9]_sse2 (old):                  384.1 ( 5.13x)
avg_qpel_pixels_tab[0][9]_sse2:                        380.3 ( 5.18x)
avg_qpel_pixels_tab[0][11]_c:                         2014.9 ( 1.00x)
avg_qpel_pixels_tab[0][11]_sse2 (old):                 385.6 ( 5.23x)
avg_qpel_pixels_tab[0][11]_sse2:                       380.2 ( 5.30x)
avg_qpel_pixels_tab[0][13]_c:                         1925.7 ( 1.00x)
avg_qpel_pixels_tab[0][13]_sse2 (old):                 406.1 ( 4.74x)
avg_qpel_pixels_tab[0][13]_sse2:                       390.4 ( 4.93x)
avg_qpel_pixels_tab[0][14]_c:                         1793.0 ( 1.00x)
avg_qpel_pixels_tab[0][14]_sse2 (old):                 389.6 ( 4.60x)
avg_qpel_pixels_tab[0][14]_sse2:                       377.1 ( 4.75x)
avg_qpel_pixels_tab[0][15]_c:                         1913.0 ( 1.00x)
avg_qpel_pixels_tab[0][15]_sse2 (old):                 404.2 ( 4.73x)
avg_qpel_pixels_tab[0][15]_sse2:                       390.8 ( 4.89x)
put_no_rnd_qpel_pixels_tab[0][5]_c:                   1864.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2 (old):           425.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2:                 396.2 ( 4.71x)
put_no_rnd_qpel_pixels_tab[0][6]_c:                   1767.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2 (old):           388.4 ( 4.55x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2:                 377.7 ( 4.68x)
put_no_rnd_qpel_pixels_tab[0][7]_c:                   1874.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2 (old):           427.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2:                 400.0 ( 4.69x)
put_no_rnd_qpel_pixels_tab[0][9]_c:                   1759.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2 (old):           393.0 ( 4.48x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2:                 379.7 ( 4.63x)
put_no_rnd_qpel_pixels_tab[0][11]_c:                  1820.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2 (old):          392.7 ( 4.64x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2:                377.4 ( 4.82x)
put_no_rnd_qpel_pixels_tab[0][13]_c:                  1841.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2 (old):          427.1 ( 4.31x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2:                395.9 ( 4.65x)
put_no_rnd_qpel_pixels_tab[0][14]_c:                  1761.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2 (old):          392.3 ( 4.49x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2:                375.9 ( 4.69x)
put_no_rnd_qpel_pixels_tab[0][15]_c:                  1869.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2 (old):          425.6 ( 4.39x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2:                397.3 ( 4.70x)
put_qpel_pixels_tab[0][5]_c:                          1888.2 ( 1.00x)
put_qpel_pixels_tab[0][5]_sse2 (old):                  396.5 ( 4.76x)
put_qpel_pixels_tab[0][5]_sse2:                        382.5 ( 4.94x)
put_qpel_pixels_tab[0][6]_c:                          1760.4 ( 1.00x)
put_qpel_pixels_tab[0][6]_sse2 (old):                  377.0 ( 4.67x)
put_qpel_pixels_tab[0][6]_sse2:                        372.1 ( 4.73x)
put_qpel_pixels_tab[0][7]_c:                          1927.6 ( 1.00x)
put_qpel_pixels_tab[0][7]_sse2 (old):                  396.5 ( 4.86x)
put_qpel_pixels_tab[0][7]_sse2:                        383.4 ( 5.03x)
put_qpel_pixels_tab[0][9]_c:                          1775.9 ( 1.00x)
put_qpel_pixels_tab[0][9]_sse2 (old):                  377.9 ( 4.70x)
put_qpel_pixels_tab[0][9]_sse2:                        372.3 ( 4.77x)
put_qpel_pixels_tab[0][11]_c:                         1809.0 ( 1.00x)
put_qpel_pixels_tab[0][11]_sse2 (old):                 374.6 ( 4.83x)
put_qpel_pixels_tab[0][11]_sse2:                       380.3 ( 4.76x)
put_qpel_pixels_tab[0][13]_c:                         1893.2 ( 1.00x)
put_qpel_pixels_tab[0][13]_sse2 (old):                 399.2 ( 4.74x)
put_qpel_pixels_tab[0][13]_sse2:                       384.7 ( 4.92x)
put_qpel_pixels_tab[0][14]_c:                         1756.2 ( 1.00x)
put_qpel_pixels_tab[0][14]_sse2 (old):                 377.9 ( 4.65x)
put_qpel_pixels_tab[0][14]_sse2:                       374.4 ( 4.69x)
put_qpel_pixels_tab[0][15]_c:                         1922.8 ( 1.00x)
put_qpel_pixels_tab[0][15]_sse2 (old):                 399.0 ( 4.82x)
put_qpel_pixels_tab[0][15]_sse2:                       387.8 ( 4.96x)

The purely vertical size 16 mc functions now no longer use any MMX.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
dad0c01076 avcodec/x86/qpeldsp: Remove vertical MMXEXT mc functions
Superseded by SSE2.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
9beecb2670 avcodec/x86/qpeldsp: Add SSE2 vertical lowpass functions
Benchmarks ([4], [8] and [12] are pure vertical functions
and therefore show the biggest improvements):

avg_qpel_pixels_tab[0][4]_c:                           844.5 ( 1.00x)
avg_qpel_pixels_tab[0][4]_mmxext:                      225.5 ( 3.74x)
avg_qpel_pixels_tab[0][4]_sse2:                        146.6 ( 5.76x)
avg_qpel_pixels_tab[0][5]_c:                          1915.9 ( 1.00x)
avg_qpel_pixels_tab[0][5]_mmxext:                      499.6 ( 3.83x)
avg_qpel_pixels_tab[0][5]_sse2:                        405.5 ( 4.72x)
avg_qpel_pixels_tab[0][6]_c:                          1775.9 ( 1.00x)
avg_qpel_pixels_tab[0][6]_mmxext:                      484.9 ( 3.66x)
avg_qpel_pixels_tab[0][6]_sse2:                        385.4 ( 4.61x)
avg_qpel_pixels_tab[0][7]_c:                          1937.0 ( 1.00x)
avg_qpel_pixels_tab[0][7]_mmxext:                      501.3 ( 3.86x)
avg_qpel_pixels_tab[0][7]_sse2:                        403.6 ( 4.80x)
avg_qpel_pixels_tab[0][8]_c:                           976.7 ( 1.00x)
avg_qpel_pixels_tab[0][8]_mmxext:                      216.9 ( 4.50x)
avg_qpel_pixels_tab[0][8]_sse2:                        113.1 ( 8.64x)
avg_qpel_pixels_tab[0][9]_c:                          1971.8 ( 1.00x)
avg_qpel_pixels_tab[0][9]_mmxext:                      494.9 ( 3.98x)
avg_qpel_pixels_tab[0][9]_sse2:                        388.3 ( 5.08x)
avg_qpel_pixels_tab[0][10]_c:                         1900.8 ( 1.00x)
avg_qpel_pixels_tab[0][10]_mmxext:                     476.4 ( 3.99x)
avg_qpel_pixels_tab[0][10]_sse2:                       362.4 ( 5.24x)
avg_qpel_pixels_tab[0][11]_c:                         2003.3 ( 1.00x)
avg_qpel_pixels_tab[0][11]_mmxext:                     496.5 ( 4.04x)
avg_qpel_pixels_tab[0][11]_sse2:                       385.9 ( 5.19x)
avg_qpel_pixels_tab[0][12]_c:                          841.8 ( 1.00x)
avg_qpel_pixels_tab[0][12]_mmxext:                     226.7 ( 3.71x)
avg_qpel_pixels_tab[0][12]_sse2:                       143.3 ( 5.87x)
avg_qpel_pixels_tab[0][13]_c:                         1929.0 ( 1.00x)
avg_qpel_pixels_tab[0][13]_mmxext:                     499.6 ( 3.86x)
avg_qpel_pixels_tab[0][13]_sse2:                       412.1 ( 4.68x)
avg_qpel_pixels_tab[0][14]_c:                         1777.9 ( 1.00x)
avg_qpel_pixels_tab[0][14]_mmxext:                     484.8 ( 3.67x)
avg_qpel_pixels_tab[0][14]_sse2:                       385.9 ( 4.61x)
avg_qpel_pixels_tab[0][15]_c:                         1914.8 ( 1.00x)
avg_qpel_pixels_tab[0][15]_mmxext:                     501.8 ( 3.82x)
avg_qpel_pixels_tab[0][15]_sse2:                       405.0 ( 4.73x)
avg_qpel_pixels_tab[1][4]_c:                           203.4 ( 1.00x)
avg_qpel_pixels_tab[1][4]_mmxext:                       64.7 ( 3.14x)
avg_qpel_pixels_tab[1][4]_sse2:                         40.3 ( 5.05x)
avg_qpel_pixels_tab[1][5]_c:                           488.8 ( 1.00x)
avg_qpel_pixels_tab[1][5]_mmxext:                      134.6 ( 3.63x)
avg_qpel_pixels_tab[1][5]_sse2:                        108.5 ( 4.50x)
avg_qpel_pixels_tab[1][6]_c:                           448.2 ( 1.00x)
avg_qpel_pixels_tab[1][6]_mmxext:                      128.8 ( 3.48x)
avg_qpel_pixels_tab[1][6]_sse2:                        102.5 ( 4.37x)
avg_qpel_pixels_tab[1][7]_c:                           489.6 ( 1.00x)
avg_qpel_pixels_tab[1][7]_mmxext:                      134.5 ( 3.64x)
avg_qpel_pixels_tab[1][7]_sse2:                        108.8 ( 4.50x)
avg_qpel_pixels_tab[1][8]_c:                           223.8 ( 1.00x)
avg_qpel_pixels_tab[1][8]_mmxext:                       57.5 ( 3.89x)
avg_qpel_pixels_tab[1][8]_sse2:                         36.3 ( 6.16x)
avg_qpel_pixels_tab[1][9]_c:                           496.6 ( 1.00x)
avg_qpel_pixels_tab[1][9]_mmxext:                      129.8 ( 3.82x)
avg_qpel_pixels_tab[1][9]_sse2:                        105.1 ( 4.72x)
avg_qpel_pixels_tab[1][10]_c:                          466.1 ( 1.00x)
avg_qpel_pixels_tab[1][10]_mmxext:                     123.2 ( 3.78x)
avg_qpel_pixels_tab[1][10]_sse2:                        99.1 ( 4.70x)
avg_qpel_pixels_tab[1][11]_c:                          497.9 ( 1.00x)
avg_qpel_pixels_tab[1][11]_mmxext:                     129.9 ( 3.83x)
avg_qpel_pixels_tab[1][11]_sse2:                       105.4 ( 4.72x)
avg_qpel_pixels_tab[1][12]_c:                          203.5 ( 1.00x)
avg_qpel_pixels_tab[1][12]_mmxext:                      63.8 ( 3.19x)
avg_qpel_pixels_tab[1][12]_sse2:                        38.8 ( 5.25x)
avg_qpel_pixels_tab[1][13]_c:                          487.9 ( 1.00x)
avg_qpel_pixels_tab[1][13]_mmxext:                     134.7 ( 3.62x)
avg_qpel_pixels_tab[1][13]_sse2:                       108.4 ( 4.50x)
avg_qpel_pixels_tab[1][14]_c:                          447.4 ( 1.00x)
avg_qpel_pixels_tab[1][14]_mmxext:                     128.2 ( 3.49x)
avg_qpel_pixels_tab[1][14]_sse2:                       102.4 ( 4.37x)
avg_qpel_pixels_tab[1][15]_c:                          487.5 ( 1.00x)
avg_qpel_pixels_tab[1][15]_mmxext:                     134.0 ( 3.64x)
avg_qpel_pixels_tab[1][15]_sse2:                       109.9 ( 4.44x)

put_no_rnd_qpel_pixels_tab[0][4]_c:                    825.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][4]_mmxext:               242.5 ( 3.40x)
put_no_rnd_qpel_pixels_tab[0][4]_sse2:                 136.0 ( 6.07x)
put_no_rnd_qpel_pixels_tab[0][5]_c:                   1837.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][5]_mmxext:               542.5 ( 3.39x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2:                 446.5 ( 4.11x)
put_no_rnd_qpel_pixels_tab[0][6]_c:                   1766.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_mmxext:               493.6 ( 3.58x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2:                 394.6 ( 4.48x)
put_no_rnd_qpel_pixels_tab[0][7]_c:                   1877.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_mmxext:               541.9 ( 3.46x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2:                 447.6 ( 4.19x)
put_no_rnd_qpel_pixels_tab[0][8]_c:                    785.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][8]_mmxext:               206.2 ( 3.81x)
put_no_rnd_qpel_pixels_tab[0][8]_sse2:                 101.6 ( 7.73x)
put_no_rnd_qpel_pixels_tab[0][9]_c:                   1772.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_mmxext:               489.5 ( 3.62x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2:                 394.8 ( 4.49x)
put_no_rnd_qpel_pixels_tab[0][10]_c:                  1711.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][10]_mmxext:              461.2 ( 3.71x)
put_no_rnd_qpel_pixels_tab[0][10]_sse2:                357.9 ( 4.78x)
put_no_rnd_qpel_pixels_tab[0][11]_c:                  1815.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_mmxext:              490.8 ( 3.70x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2:                394.0 ( 4.61x)
put_no_rnd_qpel_pixels_tab[0][12]_c:                   824.8 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][12]_mmxext:              242.9 ( 3.40x)
put_no_rnd_qpel_pixels_tab[0][12]_sse2:                135.3 ( 6.10x)
put_no_rnd_qpel_pixels_tab[0][13]_c:                  1843.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_mmxext:              545.4 ( 3.38x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2:                444.9 ( 4.14x)
put_no_rnd_qpel_pixels_tab[0][14]_c:                  1758.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_mmxext:              497.7 ( 3.53x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2:                393.5 ( 4.47x)
put_no_rnd_qpel_pixels_tab[0][15]_c:                  1861.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_mmxext:              545.0 ( 3.42x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2:                445.7 ( 4.18x)
put_no_rnd_qpel_pixels_tab[1][4]_c:                    198.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][4]_mmxext:                64.3 ( 3.08x)
put_no_rnd_qpel_pixels_tab[1][4]_sse2:                  39.8 ( 4.98x)
put_no_rnd_qpel_pixels_tab[1][5]_c:                    460.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][5]_mmxext:               137.2 ( 3.36x)
put_no_rnd_qpel_pixels_tab[1][5]_sse2:                 113.5 ( 4.06x)
put_no_rnd_qpel_pixels_tab[1][6]_c:                    441.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][6]_mmxext:               126.7 ( 3.49x)
put_no_rnd_qpel_pixels_tab[1][6]_sse2:                 103.7 ( 4.26x)
put_no_rnd_qpel_pixels_tab[1][7]_c:                    465.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][7]_mmxext:               137.7 ( 3.38x)
put_no_rnd_qpel_pixels_tab[1][7]_sse2:                 114.0 ( 4.09x)
put_no_rnd_qpel_pixels_tab[1][8]_c:                    193.8 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][8]_mmxext:                52.1 ( 3.72x)
put_no_rnd_qpel_pixels_tab[1][8]_sse2:                  27.8 ( 6.97x)
put_no_rnd_qpel_pixels_tab[1][9]_c:                    450.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][9]_mmxext:               126.2 ( 3.57x)
put_no_rnd_qpel_pixels_tab[1][9]_sse2:                 104.3 ( 4.32x)
put_no_rnd_qpel_pixels_tab[1][10]_c:                   436.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][10]_mmxext:              118.1 ( 3.69x)
put_no_rnd_qpel_pixels_tab[1][10]_sse2:                 92.4 ( 4.73x)
put_no_rnd_qpel_pixels_tab[1][11]_c:                   453.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][11]_mmxext:              128.7 ( 3.52x)
put_no_rnd_qpel_pixels_tab[1][11]_sse2:                103.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[1][12]_c:                   201.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][12]_mmxext:               64.2 ( 3.13x)
put_no_rnd_qpel_pixels_tab[1][12]_sse2:                 39.6 ( 5.08x)
put_no_rnd_qpel_pixels_tab[1][13]_c:                   461.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][13]_mmxext:              137.6 ( 3.36x)
put_no_rnd_qpel_pixels_tab[1][13]_sse2:                113.4 ( 4.07x)
put_no_rnd_qpel_pixels_tab[1][14]_c:                   442.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][14]_mmxext:              127.0 ( 3.49x)
put_no_rnd_qpel_pixels_tab[1][14]_sse2:                102.2 ( 4.33x)
put_no_rnd_qpel_pixels_tab[1][15]_c:                   462.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][15]_mmxext:              139.5 ( 3.32x)
put_no_rnd_qpel_pixels_tab[1][15]_sse2:                113.3 ( 4.09x)

put_qpel_pixels_tab[0][4]_c:                           824.6 ( 1.00x)
put_qpel_pixels_tab[0][4]_mmxext:                      220.1 ( 3.75x)
put_qpel_pixels_tab[0][4]_sse2:                        137.8 ( 5.98x)
put_qpel_pixels_tab[0][5]_c:                          1892.0 ( 1.00x)
put_qpel_pixels_tab[0][5]_mmxext:                      508.0 ( 3.72x)
put_qpel_pixels_tab[0][5]_sse2:                        408.6 ( 4.63x)
put_qpel_pixels_tab[0][6]_c:                          1758.0 ( 1.00x)
put_qpel_pixels_tab[0][6]_mmxext:                      476.7 ( 3.69x)
put_qpel_pixels_tab[0][6]_sse2:                        381.4 ( 4.61x)
put_qpel_pixels_tab[0][7]_c:                          1924.3 ( 1.00x)
put_qpel_pixels_tab[0][7]_mmxext:                      495.1 ( 3.89x)
put_qpel_pixels_tab[0][7]_sse2:                        417.2 ( 4.61x)
put_qpel_pixels_tab[0][8]_c:                           772.1 ( 1.00x)
put_qpel_pixels_tab[0][8]_mmxext:                      197.5 ( 3.91x)
put_qpel_pixels_tab[0][8]_sse2:                        118.4 ( 6.52x)
put_qpel_pixels_tab[0][9]_c:                          1778.2 ( 1.00x)
put_qpel_pixels_tab[0][9]_mmxext:                      476.7 ( 3.73x)
put_qpel_pixels_tab[0][9]_sse2:                        379.6 ( 4.68x)
put_qpel_pixels_tab[0][10]_c:                         1714.6 ( 1.00x)
put_qpel_pixels_tab[0][10]_mmxext:                     460.7 ( 3.72x)
put_qpel_pixels_tab[0][10]_sse2:                       386.8 ( 4.43x)
put_qpel_pixels_tab[0][11]_c:                         1819.1 ( 1.00x)
put_qpel_pixels_tab[0][11]_mmxext:                     474.9 ( 3.83x)
put_qpel_pixels_tab[0][11]_sse2:                       404.5 ( 4.50x)
put_qpel_pixels_tab[0][12]_c:                          829.7 ( 1.00x)
put_qpel_pixels_tab[0][12]_mmxext:                     221.5 ( 3.75x)
put_qpel_pixels_tab[0][12]_sse2:                       138.7 ( 5.98x)
put_qpel_pixels_tab[0][13]_c:                         1892.8 ( 1.00x)
put_qpel_pixels_tab[0][13]_mmxext:                     494.4 ( 3.83x)
put_qpel_pixels_tab[0][13]_sse2:                       413.9 ( 4.57x)
put_qpel_pixels_tab[0][14]_c:                         1763.1 ( 1.00x)
put_qpel_pixels_tab[0][14]_mmxext:                     473.4 ( 3.72x)
put_qpel_pixels_tab[0][14]_sse2:                       377.8 ( 4.67x)
put_qpel_pixels_tab[0][15]_c:                         1896.4 ( 1.00x)
put_qpel_pixels_tab[0][15]_mmxext:                     492.5 ( 3.85x)
put_qpel_pixels_tab[0][15]_sse2:                       399.0 ( 4.75x)
put_qpel_pixels_tab[1][4]_c:                           198.6 ( 1.00x)
put_qpel_pixels_tab[1][4]_mmxext:                       60.9 ( 3.26x)
put_qpel_pixels_tab[1][4]_sse2:                         40.1 ( 4.95x)
put_qpel_pixels_tab[1][5]_c:                           471.4 ( 1.00x)
put_qpel_pixels_tab[1][5]_mmxext:                      131.8 ( 3.58x)
put_qpel_pixels_tab[1][5]_sse2:                        107.2 ( 4.40x)
put_qpel_pixels_tab[1][6]_c:                           440.3 ( 1.00x)
put_qpel_pixels_tab[1][6]_mmxext:                      126.3 ( 3.49x)
put_qpel_pixels_tab[1][6]_sse2:                        100.6 ( 4.38x)
put_qpel_pixels_tab[1][7]_c:                           469.2 ( 1.00x)
put_qpel_pixels_tab[1][7]_mmxext:                      131.7 ( 3.56x)
put_qpel_pixels_tab[1][7]_sse2:                        106.9 ( 4.39x)
put_qpel_pixels_tab[1][8]_c:                           194.2 ( 1.00x)
put_qpel_pixels_tab[1][8]_mmxext:                       52.9 ( 3.67x)
put_qpel_pixels_tab[1][8]_sse2:                         28.0 ( 6.95x)
put_qpel_pixels_tab[1][9]_c:                           464.6 ( 1.00x)
put_qpel_pixels_tab[1][9]_mmxext:                      125.1 ( 3.71x)
put_qpel_pixels_tab[1][9]_sse2:                        100.9 ( 4.60x)
put_qpel_pixels_tab[1][10]_c:                          433.8 ( 1.00x)
put_qpel_pixels_tab[1][10]_mmxext:                     118.2 ( 3.67x)
put_qpel_pixels_tab[1][10]_sse2:                        94.5 ( 4.59x)
put_qpel_pixels_tab[1][11]_c:                          463.9 ( 1.00x)
put_qpel_pixels_tab[1][11]_mmxext:                     125.5 ( 3.70x)
put_qpel_pixels_tab[1][11]_sse2:                       102.6 ( 4.52x)
put_qpel_pixels_tab[1][12]_c:                          199.2 ( 1.00x)
put_qpel_pixels_tab[1][12]_mmxext:                      63.7 ( 3.12x)
put_qpel_pixels_tab[1][12]_sse2:                        36.2 ( 5.50x)
put_qpel_pixels_tab[1][13]_c:                          475.6 ( 1.00x)
put_qpel_pixels_tab[1][13]_mmxext:                     139.5 ( 3.41x)
put_qpel_pixels_tab[1][13]_sse2:                       107.3 ( 4.43x)
put_qpel_pixels_tab[1][14]_c:                          441.9 ( 1.00x)
put_qpel_pixels_tab[1][14]_mmxext:                     126.9 ( 3.48x)
put_qpel_pixels_tab[1][14]_sse2:                       101.3 ( 4.36x)
put_qpel_pixels_tab[1][15]_c:                          475.9 ( 1.00x)
put_qpel_pixels_tab[1][15]_mmxext:                     131.9 ( 3.61x)
put_qpel_pixels_tab[1][15]_sse2:                       107.0 ( 4.45x)

The new functions (in qpeldsp.asm) occupy 8244B (the MMXEXT functions
which they will replace occupy only 6720B).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
405465700c avcodec/x86/qpeldsp: Don't allocate stack unnecessarily
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
188df9549c avcodec/x86/qpeldsp: Don't use too much stack
We only need (SIZE+1)*SIZE words.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
bcf7293a21 avcodec/x86/qpeldsp: Remove unused declaration
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
7b56259dd5 avcodec/x86/constants: Move ff_pw_{15,20} to qpeldsp.asm
Only used there.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
c2685234a6 avcodec/x86/qpeldsp_init: Deduplicate 8x8 and 16x16 code
Also split the big macro into smaller ones for the pure horizontal vs
the pure vertical and the mixed directions.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
cf79d8052d avcodec/x86/qpeldsp_init: Specify alignment properly
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
69906d31c5 avcodec/x86/qpeldsp_init: Don't use unnecessarily big stack buffer
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
d3bd1318b3 avcodec/x86/qpeldsp: Don't zero unnecessarily
This value is write-only.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
d46414b46b avcodec/x86/qpeldsp: Simplify resetting output pointer
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Stefan Breunig
9172ab1245 fate/filter-video: add frei0r_src test
An installation of frei0r-plugins is required to run the tests,
which is usually seperate from the build headers. Some systems
have it packaged (e.g. apt install frei0r-plugins). An upstream
release extracted to FREI0R_PATH also works.

Signed-off-by: Stefan Breunig <stefan-ffmpeg-devel@breunig.xyz>
2026-04-30 03:46:18 +00:00
Nicolas Dato
3aa5d957d1 avformat/dashdec: fix previous commit where I inadvertently removed the case when calc_next_seg_no_from_timelines returned -1 and move_timelines wasn't called
Signed-off-by: Nicolas Dato <nicolas.dato@gmail.com>
2026-04-29 23:54:37 +00:00
Nicolas Dato
8a8bde6a54 avformat/dashdec: fix calculation and usage of cur_seq_no, fixing issue 22335
Functions like calc_cur_seg_no, calc_min_seg_no, and calc_max_seg_no calculated
the segment number taking into account the first_seq_no.
However, functions like get_segment_start_time_based_on_timeline and
calc_cur_seg_no didn't take first_seq_no into account.
This made dashdec believe that the cur_seq_no was always less than min_seq_no,
logging 'old fragment' and calling calc_cur_seq_no.

In live dash streams with some startNumber, that call to calc_cur_seq_no after
the 'old fragment' log made ffmpeg reposition itself 60 seconds before the
current time whenever the manifest reloaded.
This made ffmpeg skip segments, specially when the manifest reloaded slower
than the segments duration, resulting in a new manifest with more than one new
segment.

Signed-off-by: Nicolas Dato <nicolas.dato@gmail.com>
2026-04-29 23:54:37 +00:00
Michael Niedermayer
c25673fe70 avformat/mpegts: Fix memleak of pes_filter.opaque
Fixes: 490257166/clusterfuzz-testcase-minimized-ffmpeg_dem_MPEGTS_fuzzer-4815675538604032

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-04-29 20:50:21 +00:00
James Almer
2e6af10481 avformat/dashdec: copy stream groups from input representations
Signed-off-by: James Almer <jamrial@gmail.com>
2026-04-29 14:00:03 +00:00
James Almer
8fad6dcfd9 avformat/dashdec: support more than one underlying stream per Representation
Some Dash manifests contain Representations within an Adaptation Set that
reference an underlying mp4 context that contain more than the stream it
describes, as is the case of LCEVC enhancements.

Despite the fact open_demux_for_component() loops through all streams in the
underlying context, the rest of the demuxer is writen assuming only the
stream described by the corresponding representation will be present, which
results in completely wrong stream index assignments.

Signed-off-by: James Almer <jamrial@gmail.com>
2026-04-29 14:00:03 +00:00
Martin Storsjö
397c7c7524 tools/check_arm_indent: Run formatting on arm, in addition to aarch64
Add exceptions for files that aren't handled well (or that would
require more manual cleanups to make the output look good).
2026-04-29 13:53:07 +03:00
Martin Storsjö
f6b21eca5e tools/check_arm_indent: Add missing ;; in switch case, fix indentation 2026-04-29 13:53:07 +03:00
Martin Storsjö
963ea707e3 arm/rv40dsp: Add * on comment continuation lines in prototypes
This avoids that the assembly indenter script tries to indent these
lines as assembly code.
2026-04-29 13:53:07 +03:00
Martin Storsjö
0a86aead82 arm/vc1dsp: Fix a few cases of inconsistent indentation
The function ff_vc1_unescape_buffer_helper_neon intentionally
uses unusual indentation, to indicate different levels of
unrolling in the function.
2026-04-29 13:53:07 +03:00
Martin Storsjö
10a45072fc arm/jrevdct: Indent previously unindented assembly
The comments have been manually tweaked to line up properly.
2026-04-29 13:53:07 +03:00
Martin Storsjö
5e0f1b1eda arm/hevcdsp_qpel: Reindent code that seem to lack consistent indentation 2026-04-29 13:53:07 +03:00
Martin Storsjö
65d4c5bbe2 arm: Reindent asm that used consistent but differing styles
The qpel_filter macros in hevcdsp_qpel_neon.S have been
manually tweaked to keep reasonable indentation of the
comments.
2026-04-29 13:53:07 +03:00
Martin Storsjö
2325421904 arm/synth_filter_vfp: Fix indentation
This was done with manual adjustments; the reindentation
script doesn't handle the VFP/NOVFP macros at the start of
lines.
2026-04-29 13:53:07 +03:00
Ramiro Polla
8d9c1db95d arm/simple_idct_arm: Reindent previously unindented code 2026-04-29 13:53:07 +03:00
Martin Storsjö
a65ed248fd arm/simple_idct_armv6: Reindent previously consistent assembly to shared style
This has manual fixups, as the indenting script wants to
lowercase constants like W46 to w46, which breaks things.
2026-04-29 13:49:27 +03:00
Martin Storsjö
b27fd61020 arm/simple_idct_armv5te: Reindent previously consistent code to common style
This has manual fixups, as the indenting script wants to
lowercase constants like W26 to w26, which breaks things.
2026-04-29 13:49:27 +03:00
Martin Storsjö
8e199a2a9f arm/rv34dsp: Adjust macro argument indentation slightly
The previous form did neatly align with the lines above, but doesn't
match general indentation rules from our indentation script.
2026-04-29 13:49:27 +03:00
Martin Storsjö
9653588441 libswscale/arm: Switch consistent indentation to common style
Some of these files aligned instructions to 4/24 columns, while
we commonly indent arm/aarch64 assembly to 8/24 columns.
Some of these files also used a different alignment for the
operands.
2026-04-29 13:49:27 +03:00
Martin Storsjö
c5a3cb00b7 libswresample/arm: Change to the common indentation size
These files consistently aligned instructions to 4/24 columns,
while we commonly indent arm/aarch64 assembly to 8/24 columns.
2026-04-29 13:49:27 +03:00
Martin Storsjö
25d703dd2a libavutil/arm: Fix indentation in asm.S 2026-04-29 13:49:27 +03:00
Martin Storsjö
d94e2b0f7c arm/hevcdsp: Fix misindented instructions in some macros 2026-04-29 13:49:27 +03:00
Martin Storsjö
7eaeb5ab4a arm: Fix indentation of stray individual misaligned instructions 2026-04-29 13:49:27 +03:00
Martin Storsjö
17765fe831 arm: Reindent assembly where it was off by one char 2026-04-29 13:49:27 +03:00
Martin Storsjö
946e80fde7 libswscale/arm: Lowercase the "LSL" keyword 2026-04-29 13:49:27 +03:00
Martin Storsjö
ea7079074c tools/indent_arm_assembly: Don't indent "foo .req bar" lines like an instruction
These are used a bit in our arm assembly, while they're used much
less in our aarch64 assembly.
2026-04-29 13:49:27 +03:00
Martin Storsjö
cd7a3cd799 tools/indent_arm_assembly: Recognize more comment forms, for skipping lowercasing
When we try to lowercase register names (e.g. Q0 -> q0) we avoid
doing that for parts of the code that are comments, as comments
occasionally contain pseudocode that contain such mentions that
aren't register names, but pseudocode/reference code variables.
See 7ebb6c54eb for more details
about that.

In addition to recognizing comments starting with //, also
recognize /* and @ (which is a comment char in arm assembly, but
not in aarch64).
2026-04-29 13:49:27 +03:00
Michael Niedermayer
7c67748537 avformat/mov: check extradata in mov_read_dops()
We do want to limit an attackers ability to change once parsed structures.
So once extradata (or another array) is finished and possibly has been used we do not
want to allow an attacker to change it.

This reduces the attack surface

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-04-29 00:46:47 +00:00