Dale Curtis
a7d42bfba8
avformat/mov: Limit maximum box size for mov_read_lhvc()
...
Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
2026-04-30 22:50:51 +00:00
Nil Fons Miret
e294b390a0
avfilter/vf_unsharp: fix amount scaling in the high-bit-depth path
...
The 16-bit kernel is dispatched for every non-8-bit pixel format
(9/10/12/16-bit content, all stored in uint16_t). It's supposed to
undo the Q16 scaling that set_filter_param() applies to `amount`:
fp->amount = amount * 65536.0;
but the shift written in the kernel is `>> (8+nbits)`, which for the
nbits=16 instantiation of the macro comes out to `>> 24` instead of
`>> 16`. Because of this, on any non-8-bit input, unsharp applies ~1/256
of the user's requested strength and is effectively a no-op. The
8-bit kernel (nbits=8) happens to be correct because 8+8 == 16.
This commit also widens the intermediate product to int64 before the
shift, to avoid a potential overflow. Take a 16-bit pixel at the
edge of a sharp white/black region, with the user-facing `amount`
set to its declared maximum of 5.0.
*srx = 65535
blur = 32768
diff = *srx - blur = 32767
amount_q16 = 5.0 * 65536 = 327680
Then the kernel computes:
product = diff * amount_q16
= 32767 * 327680 = 10,737,090,560 (~1.07e10)
which overflows INT32_MAX. Widening to int64 keeps the
multiplication in range; the subsequent `>> 16` brings it back to
sample range and the final cast to int32 is then safe. The widening
is a semantic no-op for 8/9/10/12-bit content where the product
always fits in int32 (worst case at 12-bit: 4095 * 327680 ~ 1.34e9).
Introduced by ee792ebe08 (2019-11-08, "avfilter/vf_unsharp: add 10bit
support"). The fate-filter-unsharp-yuv420p10 reference added in the
same series was generated from the broken kernel and is regenerated
here. fate-filter-unsharp (8-bit) is unaffected.
Repro:
python3 -c "import numpy as np; y=np.tile(np.where(np.arange(128)//8 & 1, 512, 256).astype('<u2'), (128,1)); c=np.full((64,64), 512, '<u2'); open('in.yuv','wb').write(y.tobytes()+c.tobytes()*2)"
ffmpeg -f rawvideo -pix_fmt yuv420p10le -s 128x128 -i in.yuv \
-lavfi "split=2[a][b];[b]unsharp=la=1[bs];[a][bs]psnr" \
-f null - 2>&1 | grep PSNR
Before: `PSNR y:66.50 ...` -- the filter is effectively a no-op,
so the sharpened output matches the input almost exactly.
After: `PSNR y:28.27 ...` -- the filter actually sharpens, so
output and input differ as expected.
Signed-off-by: Nil Fons Miret <nilf@netflix.com>
Made-with: Cursor
2026-04-30 21:15:58 +00:00
depthfirst-dev[bot]
68ea660d83
avformat/mov: reject dimg references with zero entries
...
Reject dimg entries with a zero reference count in mov_read_iref_dimg().
This is the earliest point where the parser learns how many input images
a derived HEIF item references, so it is the right place to enforce the
invariant.
If entries == 0 is accepted here, the value is stored in HEIFGrid.nb_tiles,
later propagated by read_image_iovl() into AVStreamGroupTileGrid.nb_tiles,
and finally consumed in istg_parse_tile_grid(), which assumes at least one
tile and reads tg->offsets[tg->nb_tiles - 1]. With zero tiles, that
assumption breaks and leads to the out-of-bounds access seen in ASan.
Fixing the problem at the parser boundary is preferable to adding a later
workaround because it prevents creation of an invalid derived-image state
and stops that malformed state from reaching downstream consumers.
This is also consistent with the HEIF specification. Both iovl and grid
derived images are formed from one or more input images, and for grid the
dimg reference count must equal rows * columns; since rows and columns are
encoded as *_minus_one + 1, that count cannot be zero. A zero dimg entry
count is therefore invalid input and should be rejected when parsed.
2026-04-30 19:19:07 +00:00
Romain Beauxis
0f6ba39122
avfilter/vf_frei0r: guard against NULL string fields.
2026-04-30 08:33:31 -05:00
Andreas Rheinhardt
cc3ca17127
avcodec/x86/qpeldsp{,_init}: Use proper prefix
...
E.g. rename ff_put_mpeg4_qpel8_h_lowpass_ssse3 to
ff_mpeg4_put_qpel8_h_lowpass_ssse3.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
ca43bc6202
avcodec/x86/qpeldsp_init: Mark functions as hidden
...
It allows pic 32bit code to call the underlying
assembly functions directly, without loading
the GOT first; this saves 1245B of .text here
(for 32bit pic code).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
23d3116af9
avcodec/x86/qpeldsp: Add combination of h_lowpass + l2
...
If the subpel part of the horizontal component of
the motion vector is 1/4 or 3/4, the MPEG-4 qpel motion compensation
first computes the mc for the corresponding motion vector
with 1/2 horizontal subpel part and then averages this
with the left (for 1/4) or the right (for 3/4) source pixel.
These two stages are currently performed in two different functions,
involving a stack buffer as intermediate.
This means that horizontal prediction for every function with
a 1/4 or 3/4 horizontal subpel mv is more expensive code-size wise
(and also performance-wise) as it involves two calls. Given that
the horizontal lowpass functions are not that long, adding combinations
of h_lowpass+l2 actually reduces binary size: An increase of 1136B
in the asm files is more than offset by size reductions in
the wrappers: 1968B here when not using stack protection,
2256B when using stack protection.
Of course it also improves performance. Old benchmarks:
avg_qpel_pixels_tab[0][1]_ssse3: 106.9 ( 8.69x)
avg_qpel_pixels_tab[0][3]_ssse3: 105.5 ( 8.84x)
avg_qpel_pixels_tab[0][5]_ssse3: 226.9 ( 8.57x)
avg_qpel_pixels_tab[0][7]_ssse3: 231.1 ( 8.38x)
avg_qpel_pixels_tab[0][9]_ssse3: 217.8 ( 9.04x)
avg_qpel_pixels_tab[0][11]_ssse3: 214.9 ( 9.32x)
avg_qpel_pixels_tab[0][13]_ssse3: 227.1 ( 8.48x)
avg_qpel_pixels_tab[0][15]_ssse3: 236.1 ( 8.02x)
New benchmarks:
avg_qpel_pixels_tab[0][1]_ssse3: 96.7 ( 9.65x)
avg_qpel_pixels_tab[0][3]_ssse3: 96.6 ( 9.73x)
avg_qpel_pixels_tab[0][5]_ssse3: 225.8 ( 8.61x)
avg_qpel_pixels_tab[0][7]_ssse3: 228.4 ( 8.51x)
avg_qpel_pixels_tab[0][9]_ssse3: 217.1 ( 9.05x)
avg_qpel_pixels_tab[0][11]_ssse3: 217.8 ( 9.32x)
avg_qpel_pixels_tab[0][13]_ssse3: 227.2 ( 8.54x)
avg_qpel_pixels_tab[0][15]_ssse3: 220.5 ( 8.72x)
Note: The l2 functions are also used for vertical lowpass
functions, yet given that they are much bigger, duplicating
them would lead to massive code size increase.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
f946cac2d9
avcodec/x86/qpeldsp: Remove horizontal mmxext mc functions
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
1d040c527d
avcodec/x86/qpeldsp: Add SSSE3 size 8 horizontal filter
...
Beats the mmxext version by a lot (in the following,
[1][1-3] refers to horizontal-only size 8 mc;
the _sse2 comparators for the other cases use mmxext
horizontal mc coupled with vertical SSE2 mc):
avg_qpel_pixels_tab[1][1]_c: 223.9 ( 1.00x)
avg_qpel_pixels_tab[1][1]_mmxext: 66.2 ( 3.38x)
avg_qpel_pixels_tab[1][1]_ssse3: 36.8 ( 6.08x)
avg_qpel_pixels_tab[1][2]_c: 251.0 ( 1.00x)
avg_qpel_pixels_tab[1][2]_mmxext: 58.5 ( 4.29x)
avg_qpel_pixels_tab[1][2]_ssse3: 25.5 ( 9.84x)
avg_qpel_pixels_tab[1][3]_c: 226.9 ( 1.00x)
avg_qpel_pixels_tab[1][3]_mmxext: 66.3 ( 3.42x)
avg_qpel_pixels_tab[1][3]_ssse3: 35.8 ( 6.34x)
avg_qpel_pixels_tab[1][5]_c: 473.9 ( 1.00x)
avg_qpel_pixels_tab[1][5]_sse2: 110.7 ( 4.28x)
avg_qpel_pixels_tab[1][5]_ssse3: 76.0 ( 6.24x)
avg_qpel_pixels_tab[1][6]_c: 440.9 ( 1.00x)
avg_qpel_pixels_tab[1][6]_sse2: 102.1 ( 4.32x)
avg_qpel_pixels_tab[1][6]_ssse3: 67.1 ( 6.58x)
avg_qpel_pixels_tab[1][7]_c: 473.8 ( 1.00x)
avg_qpel_pixels_tab[1][7]_sse2: 108.0 ( 4.39x)
avg_qpel_pixels_tab[1][7]_ssse3: 74.6 ( 6.35x)
avg_qpel_pixels_tab[1][9]_c: 492.9 ( 1.00x)
avg_qpel_pixels_tab[1][9]_sse2: 102.1 ( 4.83x)
avg_qpel_pixels_tab[1][9]_ssse3: 67.1 ( 7.35x)
avg_qpel_pixels_tab[1][10]_c: 465.6 ( 1.00x)
avg_qpel_pixels_tab[1][10]_sse2: 94.9 ( 4.91x)
avg_qpel_pixels_tab[1][10]_ssse3: 57.5 ( 8.10x)
avg_qpel_pixels_tab[1][11]_c: 492.8 ( 1.00x)
avg_qpel_pixels_tab[1][11]_sse2: 102.4 ( 4.81x)
avg_qpel_pixels_tab[1][11]_ssse3: 68.7 ( 7.17x)
avg_qpel_pixels_tab[1][13]_c: 476.6 ( 1.00x)
avg_qpel_pixels_tab[1][13]_sse2: 108.6 ( 4.39x)
avg_qpel_pixels_tab[1][13]_ssse3: 74.7 ( 6.38x)
avg_qpel_pixels_tab[1][14]_c: 434.9 ( 1.00x)
avg_qpel_pixels_tab[1][14]_sse2: 102.2 ( 4.25x)
avg_qpel_pixels_tab[1][14]_ssse3: 66.6 ( 6.53x)
avg_qpel_pixels_tab[1][15]_c: 474.1 ( 1.00x)
avg_qpel_pixels_tab[1][15]_sse2: 107.9 ( 4.39x)
avg_qpel_pixels_tab[1][15]_ssse3: 74.3 ( 6.38x)
put_no_rnd_qpel_pixels_tab[1][1]_c: 222.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][1]_mmxext: 66.0 ( 3.37x)
put_no_rnd_qpel_pixels_tab[1][1]_ssse3: 35.2 ( 6.31x)
put_no_rnd_qpel_pixels_tab[1][2]_c: 212.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][2]_mmxext: 56.8 ( 3.74x)
put_no_rnd_qpel_pixels_tab[1][2]_ssse3: 25.0 ( 8.48x)
put_no_rnd_qpel_pixels_tab[1][3]_c: 224.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][3]_mmxext: 65.8 ( 3.41x)
put_no_rnd_qpel_pixels_tab[1][3]_ssse3: 35.8 ( 6.26x)
put_no_rnd_qpel_pixels_tab[1][5]_c: 460.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][5]_sse2: 114.6 ( 4.01x)
put_no_rnd_qpel_pixels_tab[1][5]_ssse3: 83.1 ( 5.53x)
put_no_rnd_qpel_pixels_tab[1][6]_c: 438.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][6]_sse2: 104.2 ( 4.21x)
put_no_rnd_qpel_pixels_tab[1][6]_ssse3: 67.5 ( 6.50x)
put_no_rnd_qpel_pixels_tab[1][7]_c: 458.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][7]_sse2: 113.8 ( 4.02x)
put_no_rnd_qpel_pixels_tab[1][7]_ssse3: 79.9 ( 5.73x)
put_no_rnd_qpel_pixels_tab[1][9]_c: 439.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][9]_sse2: 103.7 ( 4.23x)
put_no_rnd_qpel_pixels_tab[1][9]_ssse3: 68.9 ( 6.37x)
put_no_rnd_qpel_pixels_tab[1][10]_c: 427.0 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][10]_sse2: 93.2 ( 4.58x)
put_no_rnd_qpel_pixels_tab[1][10]_ssse3: 57.9 ( 7.37x)
put_no_rnd_qpel_pixels_tab[1][11]_c: 439.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][11]_sse2: 104.0 ( 4.23x)
put_no_rnd_qpel_pixels_tab[1][11]_ssse3: 69.2 ( 6.36x)
put_no_rnd_qpel_pixels_tab[1][13]_c: 459.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][13]_sse2: 113.2 ( 4.06x)
put_no_rnd_qpel_pixels_tab[1][13]_ssse3: 83.8 ( 5.48x)
put_no_rnd_qpel_pixels_tab[1][14]_c: 439.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][14]_sse2: 103.3 ( 4.25x)
put_no_rnd_qpel_pixels_tab[1][14]_ssse3: 67.9 ( 6.47x)
put_no_rnd_qpel_pixels_tab[1][15]_c: 453.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][15]_sse2: 113.7 ( 3.99x)
put_no_rnd_qpel_pixels_tab[1][15]_ssse3: 80.0 ( 5.67x)
put_qpel_pixels_tab[1][1]_c: 229.0 ( 1.00x)
put_qpel_pixels_tab[1][1]_mmxext: 65.5 ( 3.50x)
put_qpel_pixels_tab[1][1]_ssse3: 33.8 ( 6.77x)
put_qpel_pixels_tab[1][2]_c: 212.5 ( 1.00x)
put_qpel_pixels_tab[1][2]_mmxext: 56.6 ( 3.75x)
put_qpel_pixels_tab[1][2]_ssse3: 23.4 ( 9.08x)
put_qpel_pixels_tab[1][3]_c: 227.5 ( 1.00x)
put_qpel_pixels_tab[1][3]_mmxext: 64.4 ( 3.53x)
put_qpel_pixels_tab[1][3]_ssse3: 33.5 ( 6.79x)
put_qpel_pixels_tab[1][5]_c: 466.5 ( 1.00x)
put_qpel_pixels_tab[1][5]_sse2: 106.8 ( 4.37x)
put_qpel_pixels_tab[1][5]_ssse3: 71.8 ( 6.50x)
put_qpel_pixels_tab[1][6]_c: 438.7 ( 1.00x)
put_qpel_pixels_tab[1][6]_sse2: 102.0 ( 4.30x)
put_qpel_pixels_tab[1][6]_ssse3: 65.3 ( 6.72x)
put_qpel_pixels_tab[1][7]_c: 466.0 ( 1.00x)
put_qpel_pixels_tab[1][7]_sse2: 106.3 ( 4.38x)
put_qpel_pixels_tab[1][7]_ssse3: 70.9 ( 6.57x)
put_qpel_pixels_tab[1][9]_c: 456.0 ( 1.00x)
put_qpel_pixels_tab[1][9]_sse2: 100.1 ( 4.55x)
put_qpel_pixels_tab[1][9]_ssse3: 64.0 ( 7.13x)
put_qpel_pixels_tab[1][10]_c: 425.1 ( 1.00x)
put_qpel_pixels_tab[1][10]_sse2: 92.6 ( 4.59x)
put_qpel_pixels_tab[1][10]_ssse3: 55.1 ( 7.71x)
put_qpel_pixels_tab[1][11]_c: 452.7 ( 1.00x)
put_qpel_pixels_tab[1][11]_sse2: 99.6 ( 4.55x)
put_qpel_pixels_tab[1][11]_ssse3: 63.8 ( 7.09x)
put_qpel_pixels_tab[1][13]_c: 471.2 ( 1.00x)
put_qpel_pixels_tab[1][13]_sse2: 106.4 ( 4.43x)
put_qpel_pixels_tab[1][13]_ssse3: 71.4 ( 6.60x)
put_qpel_pixels_tab[1][14]_c: 439.7 ( 1.00x)
put_qpel_pixels_tab[1][14]_sse2: 101.8 ( 4.32x)
put_qpel_pixels_tab[1][14]_ssse3: 64.8 ( 6.79x)
put_qpel_pixels_tab[1][15]_c: 467.8 ( 1.00x)
put_qpel_pixels_tab[1][15]_sse2: 106.1 ( 4.41x)
put_qpel_pixels_tab[1][15]_ssse3: 72.6 ( 6.44x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
c0e1c1d6b3
avcodec/x86/qpeldsp: Add SSSE3 size 16 horizontal filter
...
Beats the mmxext version by a lot (in the following,
[0][1-3] refers to horizontal-only size 16 mc;
the _sse2 comparators for the other cases use mmxext
horizontal mc coupled with vertical SSE2 mc):
avg_qpel_pixels_tab[0][1]_c: 945.5 ( 1.00x)
avg_qpel_pixels_tab[0][1]_mmxext: 262.6 ( 3.60x)
avg_qpel_pixels_tab[0][1]_ssse3: 110.4 ( 8.57x)
avg_qpel_pixels_tab[0][2]_c: 1042.1 ( 1.00x)
avg_qpel_pixels_tab[0][2]_mmxext: 245.1 ( 4.25x)
avg_qpel_pixels_tab[0][2]_ssse3: 91.7 (11.37x)
avg_qpel_pixels_tab[0][3]_c: 941.8 ( 1.00x)
avg_qpel_pixels_tab[0][3]_mmxext: 260.1 ( 3.62x)
avg_qpel_pixels_tab[0][3]_ssse3: 110.1 ( 8.56x)
avg_qpel_pixels_tab[0][5]_c: 1939.5 ( 1.00x)
avg_qpel_pixels_tab[0][5]_sse2: 394.3 ( 4.92x)
avg_qpel_pixels_tab[0][5]_ssse3: 247.4 ( 7.84x)
avg_qpel_pixels_tab[0][6]_c: 1785.8 ( 1.00x)
avg_qpel_pixels_tab[0][6]_sse2: 380.6 ( 4.69x)
avg_qpel_pixels_tab[0][6]_ssse3: 221.1 ( 8.08x)
avg_qpel_pixels_tab[0][7]_c: 1932.5 ( 1.00x)
avg_qpel_pixels_tab[0][7]_sse2: 393.4 ( 4.91x)
avg_qpel_pixels_tab[0][7]_ssse3: 238.8 ( 8.09x)
avg_qpel_pixels_tab[0][9]_c: 1976.9 ( 1.00x)
avg_qpel_pixels_tab[0][9]_sse2: 380.8 ( 5.19x)
avg_qpel_pixels_tab[0][9]_ssse3: 223.3 ( 8.85x)
avg_qpel_pixels_tab[0][10]_c: 1911.9 ( 1.00x)
avg_qpel_pixels_tab[0][10]_sse2: 366.9 ( 5.21x)
avg_qpel_pixels_tab[0][10]_ssse3: 207.0 ( 9.24x)
avg_qpel_pixels_tab[0][11]_c: 2046.9 ( 1.00x)
avg_qpel_pixels_tab[0][11]_sse2: 385.5 ( 5.31x)
avg_qpel_pixels_tab[0][11]_ssse3: 227.9 ( 8.98x)
avg_qpel_pixels_tab[0][13]_c: 1940.8 ( 1.00x)
avg_qpel_pixels_tab[0][13]_sse2: 389.7 ( 4.98x)
avg_qpel_pixels_tab[0][13]_ssse3: 244.2 ( 7.95x)
avg_qpel_pixels_tab[0][14]_c: 1778.4 ( 1.00x)
avg_qpel_pixels_tab[0][14]_sse2: 379.2 ( 4.69x)
avg_qpel_pixels_tab[0][14]_ssse3: 223.5 ( 7.96x)
avg_qpel_pixels_tab[0][15]_c: 1905.9 ( 1.00x)
avg_qpel_pixels_tab[0][15]_sse2: 398.9 ( 4.78x)
avg_qpel_pixels_tab[0][15]_ssse3: 238.3 ( 8.00x)
put_no_rnd_qpel_pixels_tab[0][1]_c: 922.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][1]_mmxext: 275.0 ( 3.35x)
put_no_rnd_qpel_pixels_tab[0][1]_ssse3: 108.4 ( 8.51x)
put_no_rnd_qpel_pixels_tab[0][2]_c: 889.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][2]_mmxext: 236.7 ( 3.76x)
put_no_rnd_qpel_pixels_tab[0][2]_ssse3: 86.8 (10.25x)
put_no_rnd_qpel_pixels_tab[0][3]_c: 915.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][3]_mmxext: 274.3 ( 3.34x)
put_no_rnd_qpel_pixels_tab[0][3]_ssse3: 108.2 ( 8.46x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2: 400.0 ( 4.63x)
put_no_rnd_qpel_pixels_tab[0][5]_ssse3: 246.0 ( 7.53x)
put_no_rnd_qpel_pixels_tab[0][6]_c: 1753.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2: 382.5 ( 4.59x)
put_no_rnd_qpel_pixels_tab[0][6]_ssse3: 226.4 ( 7.75x)
put_no_rnd_qpel_pixels_tab[0][7]_c: 1854.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2: 393.5 ( 4.71x)
put_no_rnd_qpel_pixels_tab[0][7]_ssse3: 248.6 ( 7.46x)
put_no_rnd_qpel_pixels_tab[0][9]_c: 1794.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2: 382.2 ( 4.70x)
put_no_rnd_qpel_pixels_tab[0][9]_ssse3: 228.0 ( 7.87x)
put_no_rnd_qpel_pixels_tab[0][10]_c: 1724.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][10]_sse2: 353.8 ( 4.88x)
put_no_rnd_qpel_pixels_tab[0][10]_ssse3: 206.5 ( 8.35x)
put_no_rnd_qpel_pixels_tab[0][11]_c: 1796.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2: 378.1 ( 4.75x)
put_no_rnd_qpel_pixels_tab[0][11]_ssse3: 227.1 ( 7.91x)
put_no_rnd_qpel_pixels_tab[0][13]_c: 1834.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2: 400.7 ( 4.58x)
put_no_rnd_qpel_pixels_tab[0][13]_ssse3: 244.2 ( 7.51x)
put_no_rnd_qpel_pixels_tab[0][14]_c: 1755.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2: 387.2 ( 4.53x)
put_no_rnd_qpel_pixels_tab[0][14]_ssse3: 226.8 ( 7.74x)
put_no_rnd_qpel_pixels_tab[0][15]_c: 1847.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2: 400.6 ( 4.61x)
put_no_rnd_qpel_pixels_tab[0][15]_ssse3: 246.1 ( 7.51x)
put_qpel_pixels_tab[0][1]_c: 919.6 ( 1.00x)
put_qpel_pixels_tab[0][1]_mmxext: 255.5 ( 3.60x)
put_qpel_pixels_tab[0][1]_ssse3: 108.3 ( 8.49x)
put_qpel_pixels_tab[0][2]_c: 883.9 ( 1.00x)
put_qpel_pixels_tab[0][2]_mmxext: 238.1 ( 3.71x)
put_qpel_pixels_tab[0][2]_ssse3: 86.7 (10.19x)
put_qpel_pixels_tab[0][3]_c: 921.9 ( 1.00x)
put_qpel_pixels_tab[0][3]_mmxext: 258.9 ( 3.56x)
put_qpel_pixels_tab[0][3]_ssse3: 108.1 ( 8.53x)
put_qpel_pixels_tab[0][5]_c: 1907.5 ( 1.00x)
put_qpel_pixels_tab[0][5]_sse2: 384.2 ( 4.96x)
put_qpel_pixels_tab[0][5]_ssse3: 234.8 ( 8.13x)
put_qpel_pixels_tab[0][6]_c: 1757.4 ( 1.00x)
put_qpel_pixels_tab[0][6]_sse2: 382.8 ( 4.59x)
put_qpel_pixels_tab[0][6]_ssse3: 217.6 ( 8.08x)
put_qpel_pixels_tab[0][7]_c: 1927.5 ( 1.00x)
put_qpel_pixels_tab[0][7]_sse2: 384.6 ( 5.01x)
put_qpel_pixels_tab[0][7]_ssse3: 231.2 ( 8.34x)
put_qpel_pixels_tab[0][9]_c: 1832.1 ( 1.00x)
put_qpel_pixels_tab[0][9]_sse2: 374.8 ( 4.89x)
put_qpel_pixels_tab[0][9]_ssse3: 219.4 ( 8.35x)
put_qpel_pixels_tab[0][10]_c: 1710.3 ( 1.00x)
put_qpel_pixels_tab[0][10]_sse2: 384.5 ( 4.45x)
put_qpel_pixels_tab[0][10]_ssse3: 202.9 ( 8.43x)
put_qpel_pixels_tab[0][11]_c: 1825.0 ( 1.00x)
put_qpel_pixels_tab[0][11]_sse2: 369.6 ( 4.94x)
put_qpel_pixels_tab[0][11]_ssse3: 216.8 ( 8.42x)
put_qpel_pixels_tab[0][13]_c: 1898.4 ( 1.00x)
put_qpel_pixels_tab[0][13]_sse2: 384.9 ( 4.93x)
put_qpel_pixels_tab[0][13]_ssse3: 238.6 ( 7.96x)
put_qpel_pixels_tab[0][14]_c: 1779.1 ( 1.00x)
put_qpel_pixels_tab[0][14]_sse2: 373.3 ( 4.77x)
put_qpel_pixels_tab[0][14]_ssse3: 218.1 ( 8.16x)
put_qpel_pixels_tab[0][15]_c: 1918.2 ( 1.00x)
put_qpel_pixels_tab[0][15]_sse2: 385.3 ( 4.98x)
put_qpel_pixels_tab[0][15]_ssse3: 236.8 ( 8.10x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
a3d747f344
avcodec/x86/qpeldsp{,_init}: Use SSE2 pixels16x16_l2 functions
...
put and avg versions have been added and used in H264
in b91081274f . This commit
adds the size 16 version of put_no_rnd and uses all three
of them in the SSE2 size 16 qpel functions (i.e. it uses
them in the ones that have a vertical component); it also
removes the 16x17 MMXEXT versions (which are no longer used).
This is particularly beneficial for put_no_rnd:
avg_qpel_pixels_tab[0][5]_c: 1910.9 ( 1.00x)
avg_qpel_pixels_tab[0][5]_sse2 (old): 405.1 ( 4.72x)
avg_qpel_pixels_tab[0][5]_sse2: 392.9 ( 4.86x)
avg_qpel_pixels_tab[0][6]_c: 1778.9 ( 1.00x)
avg_qpel_pixels_tab[0][6]_sse2 (old): 385.5 ( 4.61x)
avg_qpel_pixels_tab[0][6]_sse2: 374.9 ( 4.75x)
avg_qpel_pixels_tab[0][7]_c: 1935.3 ( 1.00x)
avg_qpel_pixels_tab[0][7]_sse2 (old): 403.1 ( 4.80x)
avg_qpel_pixels_tab[0][7]_sse2: 391.6 ( 4.94x)
avg_qpel_pixels_tab[0][9]_c: 1969.0 ( 1.00x)
avg_qpel_pixels_tab[0][9]_sse2 (old): 384.1 ( 5.13x)
avg_qpel_pixels_tab[0][9]_sse2: 380.3 ( 5.18x)
avg_qpel_pixels_tab[0][11]_c: 2014.9 ( 1.00x)
avg_qpel_pixels_tab[0][11]_sse2 (old): 385.6 ( 5.23x)
avg_qpel_pixels_tab[0][11]_sse2: 380.2 ( 5.30x)
avg_qpel_pixels_tab[0][13]_c: 1925.7 ( 1.00x)
avg_qpel_pixels_tab[0][13]_sse2 (old): 406.1 ( 4.74x)
avg_qpel_pixels_tab[0][13]_sse2: 390.4 ( 4.93x)
avg_qpel_pixels_tab[0][14]_c: 1793.0 ( 1.00x)
avg_qpel_pixels_tab[0][14]_sse2 (old): 389.6 ( 4.60x)
avg_qpel_pixels_tab[0][14]_sse2: 377.1 ( 4.75x)
avg_qpel_pixels_tab[0][15]_c: 1913.0 ( 1.00x)
avg_qpel_pixels_tab[0][15]_sse2 (old): 404.2 ( 4.73x)
avg_qpel_pixels_tab[0][15]_sse2: 390.8 ( 4.89x)
put_no_rnd_qpel_pixels_tab[0][5]_c: 1864.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2 (old): 425.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2: 396.2 ( 4.71x)
put_no_rnd_qpel_pixels_tab[0][6]_c: 1767.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2 (old): 388.4 ( 4.55x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2: 377.7 ( 4.68x)
put_no_rnd_qpel_pixels_tab[0][7]_c: 1874.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2 (old): 427.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2: 400.0 ( 4.69x)
put_no_rnd_qpel_pixels_tab[0][9]_c: 1759.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2 (old): 393.0 ( 4.48x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2: 379.7 ( 4.63x)
put_no_rnd_qpel_pixels_tab[0][11]_c: 1820.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2 (old): 392.7 ( 4.64x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2: 377.4 ( 4.82x)
put_no_rnd_qpel_pixels_tab[0][13]_c: 1841.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2 (old): 427.1 ( 4.31x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2: 395.9 ( 4.65x)
put_no_rnd_qpel_pixels_tab[0][14]_c: 1761.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2 (old): 392.3 ( 4.49x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2: 375.9 ( 4.69x)
put_no_rnd_qpel_pixels_tab[0][15]_c: 1869.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2 (old): 425.6 ( 4.39x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2: 397.3 ( 4.70x)
put_qpel_pixels_tab[0][5]_c: 1888.2 ( 1.00x)
put_qpel_pixels_tab[0][5]_sse2 (old): 396.5 ( 4.76x)
put_qpel_pixels_tab[0][5]_sse2: 382.5 ( 4.94x)
put_qpel_pixels_tab[0][6]_c: 1760.4 ( 1.00x)
put_qpel_pixels_tab[0][6]_sse2 (old): 377.0 ( 4.67x)
put_qpel_pixels_tab[0][6]_sse2: 372.1 ( 4.73x)
put_qpel_pixels_tab[0][7]_c: 1927.6 ( 1.00x)
put_qpel_pixels_tab[0][7]_sse2 (old): 396.5 ( 4.86x)
put_qpel_pixels_tab[0][7]_sse2: 383.4 ( 5.03x)
put_qpel_pixels_tab[0][9]_c: 1775.9 ( 1.00x)
put_qpel_pixels_tab[0][9]_sse2 (old): 377.9 ( 4.70x)
put_qpel_pixels_tab[0][9]_sse2: 372.3 ( 4.77x)
put_qpel_pixels_tab[0][11]_c: 1809.0 ( 1.00x)
put_qpel_pixels_tab[0][11]_sse2 (old): 374.6 ( 4.83x)
put_qpel_pixels_tab[0][11]_sse2: 380.3 ( 4.76x)
put_qpel_pixels_tab[0][13]_c: 1893.2 ( 1.00x)
put_qpel_pixels_tab[0][13]_sse2 (old): 399.2 ( 4.74x)
put_qpel_pixels_tab[0][13]_sse2: 384.7 ( 4.92x)
put_qpel_pixels_tab[0][14]_c: 1756.2 ( 1.00x)
put_qpel_pixels_tab[0][14]_sse2 (old): 377.9 ( 4.65x)
put_qpel_pixels_tab[0][14]_sse2: 374.4 ( 4.69x)
put_qpel_pixels_tab[0][15]_c: 1922.8 ( 1.00x)
put_qpel_pixels_tab[0][15]_sse2 (old): 399.0 ( 4.82x)
put_qpel_pixels_tab[0][15]_sse2: 387.8 ( 4.96x)
The purely vertical size 16 mc functions now no longer use any MMX.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
dad0c01076
avcodec/x86/qpeldsp: Remove vertical MMXEXT mc functions
...
Superseded by SSE2.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
9beecb2670
avcodec/x86/qpeldsp: Add SSE2 vertical lowpass functions
...
Benchmarks ([4], [8] and [12] are pure vertical functions
and therefore show the biggest improvements):
avg_qpel_pixels_tab[0][4]_c: 844.5 ( 1.00x)
avg_qpel_pixels_tab[0][4]_mmxext: 225.5 ( 3.74x)
avg_qpel_pixels_tab[0][4]_sse2: 146.6 ( 5.76x)
avg_qpel_pixels_tab[0][5]_c: 1915.9 ( 1.00x)
avg_qpel_pixels_tab[0][5]_mmxext: 499.6 ( 3.83x)
avg_qpel_pixels_tab[0][5]_sse2: 405.5 ( 4.72x)
avg_qpel_pixels_tab[0][6]_c: 1775.9 ( 1.00x)
avg_qpel_pixels_tab[0][6]_mmxext: 484.9 ( 3.66x)
avg_qpel_pixels_tab[0][6]_sse2: 385.4 ( 4.61x)
avg_qpel_pixels_tab[0][7]_c: 1937.0 ( 1.00x)
avg_qpel_pixels_tab[0][7]_mmxext: 501.3 ( 3.86x)
avg_qpel_pixels_tab[0][7]_sse2: 403.6 ( 4.80x)
avg_qpel_pixels_tab[0][8]_c: 976.7 ( 1.00x)
avg_qpel_pixels_tab[0][8]_mmxext: 216.9 ( 4.50x)
avg_qpel_pixels_tab[0][8]_sse2: 113.1 ( 8.64x)
avg_qpel_pixels_tab[0][9]_c: 1971.8 ( 1.00x)
avg_qpel_pixels_tab[0][9]_mmxext: 494.9 ( 3.98x)
avg_qpel_pixels_tab[0][9]_sse2: 388.3 ( 5.08x)
avg_qpel_pixels_tab[0][10]_c: 1900.8 ( 1.00x)
avg_qpel_pixels_tab[0][10]_mmxext: 476.4 ( 3.99x)
avg_qpel_pixels_tab[0][10]_sse2: 362.4 ( 5.24x)
avg_qpel_pixels_tab[0][11]_c: 2003.3 ( 1.00x)
avg_qpel_pixels_tab[0][11]_mmxext: 496.5 ( 4.04x)
avg_qpel_pixels_tab[0][11]_sse2: 385.9 ( 5.19x)
avg_qpel_pixels_tab[0][12]_c: 841.8 ( 1.00x)
avg_qpel_pixels_tab[0][12]_mmxext: 226.7 ( 3.71x)
avg_qpel_pixels_tab[0][12]_sse2: 143.3 ( 5.87x)
avg_qpel_pixels_tab[0][13]_c: 1929.0 ( 1.00x)
avg_qpel_pixels_tab[0][13]_mmxext: 499.6 ( 3.86x)
avg_qpel_pixels_tab[0][13]_sse2: 412.1 ( 4.68x)
avg_qpel_pixels_tab[0][14]_c: 1777.9 ( 1.00x)
avg_qpel_pixels_tab[0][14]_mmxext: 484.8 ( 3.67x)
avg_qpel_pixels_tab[0][14]_sse2: 385.9 ( 4.61x)
avg_qpel_pixels_tab[0][15]_c: 1914.8 ( 1.00x)
avg_qpel_pixels_tab[0][15]_mmxext: 501.8 ( 3.82x)
avg_qpel_pixels_tab[0][15]_sse2: 405.0 ( 4.73x)
avg_qpel_pixels_tab[1][4]_c: 203.4 ( 1.00x)
avg_qpel_pixels_tab[1][4]_mmxext: 64.7 ( 3.14x)
avg_qpel_pixels_tab[1][4]_sse2: 40.3 ( 5.05x)
avg_qpel_pixels_tab[1][5]_c: 488.8 ( 1.00x)
avg_qpel_pixels_tab[1][5]_mmxext: 134.6 ( 3.63x)
avg_qpel_pixels_tab[1][5]_sse2: 108.5 ( 4.50x)
avg_qpel_pixels_tab[1][6]_c: 448.2 ( 1.00x)
avg_qpel_pixels_tab[1][6]_mmxext: 128.8 ( 3.48x)
avg_qpel_pixels_tab[1][6]_sse2: 102.5 ( 4.37x)
avg_qpel_pixels_tab[1][7]_c: 489.6 ( 1.00x)
avg_qpel_pixels_tab[1][7]_mmxext: 134.5 ( 3.64x)
avg_qpel_pixels_tab[1][7]_sse2: 108.8 ( 4.50x)
avg_qpel_pixels_tab[1][8]_c: 223.8 ( 1.00x)
avg_qpel_pixels_tab[1][8]_mmxext: 57.5 ( 3.89x)
avg_qpel_pixels_tab[1][8]_sse2: 36.3 ( 6.16x)
avg_qpel_pixels_tab[1][9]_c: 496.6 ( 1.00x)
avg_qpel_pixels_tab[1][9]_mmxext: 129.8 ( 3.82x)
avg_qpel_pixels_tab[1][9]_sse2: 105.1 ( 4.72x)
avg_qpel_pixels_tab[1][10]_c: 466.1 ( 1.00x)
avg_qpel_pixels_tab[1][10]_mmxext: 123.2 ( 3.78x)
avg_qpel_pixels_tab[1][10]_sse2: 99.1 ( 4.70x)
avg_qpel_pixels_tab[1][11]_c: 497.9 ( 1.00x)
avg_qpel_pixels_tab[1][11]_mmxext: 129.9 ( 3.83x)
avg_qpel_pixels_tab[1][11]_sse2: 105.4 ( 4.72x)
avg_qpel_pixels_tab[1][12]_c: 203.5 ( 1.00x)
avg_qpel_pixels_tab[1][12]_mmxext: 63.8 ( 3.19x)
avg_qpel_pixels_tab[1][12]_sse2: 38.8 ( 5.25x)
avg_qpel_pixels_tab[1][13]_c: 487.9 ( 1.00x)
avg_qpel_pixels_tab[1][13]_mmxext: 134.7 ( 3.62x)
avg_qpel_pixels_tab[1][13]_sse2: 108.4 ( 4.50x)
avg_qpel_pixels_tab[1][14]_c: 447.4 ( 1.00x)
avg_qpel_pixels_tab[1][14]_mmxext: 128.2 ( 3.49x)
avg_qpel_pixels_tab[1][14]_sse2: 102.4 ( 4.37x)
avg_qpel_pixels_tab[1][15]_c: 487.5 ( 1.00x)
avg_qpel_pixels_tab[1][15]_mmxext: 134.0 ( 3.64x)
avg_qpel_pixels_tab[1][15]_sse2: 109.9 ( 4.44x)
put_no_rnd_qpel_pixels_tab[0][4]_c: 825.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][4]_mmxext: 242.5 ( 3.40x)
put_no_rnd_qpel_pixels_tab[0][4]_sse2: 136.0 ( 6.07x)
put_no_rnd_qpel_pixels_tab[0][5]_c: 1837.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][5]_mmxext: 542.5 ( 3.39x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2: 446.5 ( 4.11x)
put_no_rnd_qpel_pixels_tab[0][6]_c: 1766.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_mmxext: 493.6 ( 3.58x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2: 394.6 ( 4.48x)
put_no_rnd_qpel_pixels_tab[0][7]_c: 1877.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_mmxext: 541.9 ( 3.46x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2: 447.6 ( 4.19x)
put_no_rnd_qpel_pixels_tab[0][8]_c: 785.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][8]_mmxext: 206.2 ( 3.81x)
put_no_rnd_qpel_pixels_tab[0][8]_sse2: 101.6 ( 7.73x)
put_no_rnd_qpel_pixels_tab[0][9]_c: 1772.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_mmxext: 489.5 ( 3.62x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2: 394.8 ( 4.49x)
put_no_rnd_qpel_pixels_tab[0][10]_c: 1711.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][10]_mmxext: 461.2 ( 3.71x)
put_no_rnd_qpel_pixels_tab[0][10]_sse2: 357.9 ( 4.78x)
put_no_rnd_qpel_pixels_tab[0][11]_c: 1815.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_mmxext: 490.8 ( 3.70x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2: 394.0 ( 4.61x)
put_no_rnd_qpel_pixels_tab[0][12]_c: 824.8 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][12]_mmxext: 242.9 ( 3.40x)
put_no_rnd_qpel_pixels_tab[0][12]_sse2: 135.3 ( 6.10x)
put_no_rnd_qpel_pixels_tab[0][13]_c: 1843.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_mmxext: 545.4 ( 3.38x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2: 444.9 ( 4.14x)
put_no_rnd_qpel_pixels_tab[0][14]_c: 1758.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_mmxext: 497.7 ( 3.53x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2: 393.5 ( 4.47x)
put_no_rnd_qpel_pixels_tab[0][15]_c: 1861.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_mmxext: 545.0 ( 3.42x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2: 445.7 ( 4.18x)
put_no_rnd_qpel_pixels_tab[1][4]_c: 198.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][4]_mmxext: 64.3 ( 3.08x)
put_no_rnd_qpel_pixels_tab[1][4]_sse2: 39.8 ( 4.98x)
put_no_rnd_qpel_pixels_tab[1][5]_c: 460.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][5]_mmxext: 137.2 ( 3.36x)
put_no_rnd_qpel_pixels_tab[1][5]_sse2: 113.5 ( 4.06x)
put_no_rnd_qpel_pixels_tab[1][6]_c: 441.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][6]_mmxext: 126.7 ( 3.49x)
put_no_rnd_qpel_pixels_tab[1][6]_sse2: 103.7 ( 4.26x)
put_no_rnd_qpel_pixels_tab[1][7]_c: 465.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][7]_mmxext: 137.7 ( 3.38x)
put_no_rnd_qpel_pixels_tab[1][7]_sse2: 114.0 ( 4.09x)
put_no_rnd_qpel_pixels_tab[1][8]_c: 193.8 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][8]_mmxext: 52.1 ( 3.72x)
put_no_rnd_qpel_pixels_tab[1][8]_sse2: 27.8 ( 6.97x)
put_no_rnd_qpel_pixels_tab[1][9]_c: 450.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][9]_mmxext: 126.2 ( 3.57x)
put_no_rnd_qpel_pixels_tab[1][9]_sse2: 104.3 ( 4.32x)
put_no_rnd_qpel_pixels_tab[1][10]_c: 436.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][10]_mmxext: 118.1 ( 3.69x)
put_no_rnd_qpel_pixels_tab[1][10]_sse2: 92.4 ( 4.73x)
put_no_rnd_qpel_pixels_tab[1][11]_c: 453.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][11]_mmxext: 128.7 ( 3.52x)
put_no_rnd_qpel_pixels_tab[1][11]_sse2: 103.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[1][12]_c: 201.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][12]_mmxext: 64.2 ( 3.13x)
put_no_rnd_qpel_pixels_tab[1][12]_sse2: 39.6 ( 5.08x)
put_no_rnd_qpel_pixels_tab[1][13]_c: 461.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][13]_mmxext: 137.6 ( 3.36x)
put_no_rnd_qpel_pixels_tab[1][13]_sse2: 113.4 ( 4.07x)
put_no_rnd_qpel_pixels_tab[1][14]_c: 442.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][14]_mmxext: 127.0 ( 3.49x)
put_no_rnd_qpel_pixels_tab[1][14]_sse2: 102.2 ( 4.33x)
put_no_rnd_qpel_pixels_tab[1][15]_c: 462.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][15]_mmxext: 139.5 ( 3.32x)
put_no_rnd_qpel_pixels_tab[1][15]_sse2: 113.3 ( 4.09x)
put_qpel_pixels_tab[0][4]_c: 824.6 ( 1.00x)
put_qpel_pixels_tab[0][4]_mmxext: 220.1 ( 3.75x)
put_qpel_pixels_tab[0][4]_sse2: 137.8 ( 5.98x)
put_qpel_pixels_tab[0][5]_c: 1892.0 ( 1.00x)
put_qpel_pixels_tab[0][5]_mmxext: 508.0 ( 3.72x)
put_qpel_pixels_tab[0][5]_sse2: 408.6 ( 4.63x)
put_qpel_pixels_tab[0][6]_c: 1758.0 ( 1.00x)
put_qpel_pixels_tab[0][6]_mmxext: 476.7 ( 3.69x)
put_qpel_pixels_tab[0][6]_sse2: 381.4 ( 4.61x)
put_qpel_pixels_tab[0][7]_c: 1924.3 ( 1.00x)
put_qpel_pixels_tab[0][7]_mmxext: 495.1 ( 3.89x)
put_qpel_pixels_tab[0][7]_sse2: 417.2 ( 4.61x)
put_qpel_pixels_tab[0][8]_c: 772.1 ( 1.00x)
put_qpel_pixels_tab[0][8]_mmxext: 197.5 ( 3.91x)
put_qpel_pixels_tab[0][8]_sse2: 118.4 ( 6.52x)
put_qpel_pixels_tab[0][9]_c: 1778.2 ( 1.00x)
put_qpel_pixels_tab[0][9]_mmxext: 476.7 ( 3.73x)
put_qpel_pixels_tab[0][9]_sse2: 379.6 ( 4.68x)
put_qpel_pixels_tab[0][10]_c: 1714.6 ( 1.00x)
put_qpel_pixels_tab[0][10]_mmxext: 460.7 ( 3.72x)
put_qpel_pixels_tab[0][10]_sse2: 386.8 ( 4.43x)
put_qpel_pixels_tab[0][11]_c: 1819.1 ( 1.00x)
put_qpel_pixels_tab[0][11]_mmxext: 474.9 ( 3.83x)
put_qpel_pixels_tab[0][11]_sse2: 404.5 ( 4.50x)
put_qpel_pixels_tab[0][12]_c: 829.7 ( 1.00x)
put_qpel_pixels_tab[0][12]_mmxext: 221.5 ( 3.75x)
put_qpel_pixels_tab[0][12]_sse2: 138.7 ( 5.98x)
put_qpel_pixels_tab[0][13]_c: 1892.8 ( 1.00x)
put_qpel_pixels_tab[0][13]_mmxext: 494.4 ( 3.83x)
put_qpel_pixels_tab[0][13]_sse2: 413.9 ( 4.57x)
put_qpel_pixels_tab[0][14]_c: 1763.1 ( 1.00x)
put_qpel_pixels_tab[0][14]_mmxext: 473.4 ( 3.72x)
put_qpel_pixels_tab[0][14]_sse2: 377.8 ( 4.67x)
put_qpel_pixels_tab[0][15]_c: 1896.4 ( 1.00x)
put_qpel_pixels_tab[0][15]_mmxext: 492.5 ( 3.85x)
put_qpel_pixels_tab[0][15]_sse2: 399.0 ( 4.75x)
put_qpel_pixels_tab[1][4]_c: 198.6 ( 1.00x)
put_qpel_pixels_tab[1][4]_mmxext: 60.9 ( 3.26x)
put_qpel_pixels_tab[1][4]_sse2: 40.1 ( 4.95x)
put_qpel_pixels_tab[1][5]_c: 471.4 ( 1.00x)
put_qpel_pixels_tab[1][5]_mmxext: 131.8 ( 3.58x)
put_qpel_pixels_tab[1][5]_sse2: 107.2 ( 4.40x)
put_qpel_pixels_tab[1][6]_c: 440.3 ( 1.00x)
put_qpel_pixels_tab[1][6]_mmxext: 126.3 ( 3.49x)
put_qpel_pixels_tab[1][6]_sse2: 100.6 ( 4.38x)
put_qpel_pixels_tab[1][7]_c: 469.2 ( 1.00x)
put_qpel_pixels_tab[1][7]_mmxext: 131.7 ( 3.56x)
put_qpel_pixels_tab[1][7]_sse2: 106.9 ( 4.39x)
put_qpel_pixels_tab[1][8]_c: 194.2 ( 1.00x)
put_qpel_pixels_tab[1][8]_mmxext: 52.9 ( 3.67x)
put_qpel_pixels_tab[1][8]_sse2: 28.0 ( 6.95x)
put_qpel_pixels_tab[1][9]_c: 464.6 ( 1.00x)
put_qpel_pixels_tab[1][9]_mmxext: 125.1 ( 3.71x)
put_qpel_pixels_tab[1][9]_sse2: 100.9 ( 4.60x)
put_qpel_pixels_tab[1][10]_c: 433.8 ( 1.00x)
put_qpel_pixels_tab[1][10]_mmxext: 118.2 ( 3.67x)
put_qpel_pixels_tab[1][10]_sse2: 94.5 ( 4.59x)
put_qpel_pixels_tab[1][11]_c: 463.9 ( 1.00x)
put_qpel_pixels_tab[1][11]_mmxext: 125.5 ( 3.70x)
put_qpel_pixels_tab[1][11]_sse2: 102.6 ( 4.52x)
put_qpel_pixels_tab[1][12]_c: 199.2 ( 1.00x)
put_qpel_pixels_tab[1][12]_mmxext: 63.7 ( 3.12x)
put_qpel_pixels_tab[1][12]_sse2: 36.2 ( 5.50x)
put_qpel_pixels_tab[1][13]_c: 475.6 ( 1.00x)
put_qpel_pixels_tab[1][13]_mmxext: 139.5 ( 3.41x)
put_qpel_pixels_tab[1][13]_sse2: 107.3 ( 4.43x)
put_qpel_pixels_tab[1][14]_c: 441.9 ( 1.00x)
put_qpel_pixels_tab[1][14]_mmxext: 126.9 ( 3.48x)
put_qpel_pixels_tab[1][14]_sse2: 101.3 ( 4.36x)
put_qpel_pixels_tab[1][15]_c: 475.9 ( 1.00x)
put_qpel_pixels_tab[1][15]_mmxext: 131.9 ( 3.61x)
put_qpel_pixels_tab[1][15]_sse2: 107.0 ( 4.45x)
The new functions (in qpeldsp.asm) occupy 8244B (the MMXEXT functions
which they will replace occupy only 6720B).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
405465700c
avcodec/x86/qpeldsp: Don't allocate stack unnecessarily
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
188df9549c
avcodec/x86/qpeldsp: Don't use too much stack
...
We only need (SIZE+1)*SIZE words.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
bcf7293a21
avcodec/x86/qpeldsp: Remove unused declaration
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
Andreas Rheinhardt
7b56259dd5
avcodec/x86/constants: Move ff_pw_{15,20} to qpeldsp.asm
...
Only used there.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
c2685234a6
avcodec/x86/qpeldsp_init: Deduplicate 8x8 and 16x16 code
...
Also split the big macro into smaller ones for the pure horizontal vs
the pure vertical and the mixed directions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
cf79d8052d
avcodec/x86/qpeldsp_init: Specify alignment properly
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
69906d31c5
avcodec/x86/qpeldsp_init: Don't use unnecessarily big stack buffer
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
d3bd1318b3
avcodec/x86/qpeldsp: Don't zero unnecessarily
...
This value is write-only.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Andreas Rheinhardt
d46414b46b
avcodec/x86/qpeldsp: Simplify resetting output pointer
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:32 +02:00
Stefan Breunig
9172ab1245
fate/filter-video: add frei0r_src test
...
An installation of frei0r-plugins is required to run the tests,
which is usually seperate from the build headers. Some systems
have it packaged (e.g. apt install frei0r-plugins). An upstream
release extracted to FREI0R_PATH also works.
Signed-off-by: Stefan Breunig <stefan-ffmpeg-devel@breunig.xyz>
2026-04-30 03:46:18 +00:00
Nicolas Dato
3aa5d957d1
avformat/dashdec: fix previous commit where I inadvertently removed the case when calc_next_seg_no_from_timelines returned -1 and move_timelines wasn't called
...
Signed-off-by: Nicolas Dato <nicolas.dato@gmail.com>
2026-04-29 23:54:37 +00:00
Nicolas Dato
8a8bde6a54
avformat/dashdec: fix calculation and usage of cur_seq_no, fixing issue 22335
...
Functions like calc_cur_seg_no, calc_min_seg_no, and calc_max_seg_no calculated
the segment number taking into account the first_seq_no.
However, functions like get_segment_start_time_based_on_timeline and
calc_cur_seg_no didn't take first_seq_no into account.
This made dashdec believe that the cur_seq_no was always less than min_seq_no,
logging 'old fragment' and calling calc_cur_seq_no.
In live dash streams with some startNumber, that call to calc_cur_seq_no after
the 'old fragment' log made ffmpeg reposition itself 60 seconds before the
current time whenever the manifest reloaded.
This made ffmpeg skip segments, specially when the manifest reloaded slower
than the segments duration, resulting in a new manifest with more than one new
segment.
Signed-off-by: Nicolas Dato <nicolas.dato@gmail.com>
2026-04-29 23:54:37 +00:00
Michael Niedermayer
c25673fe70
avformat/mpegts: Fix memleak of pes_filter.opaque
...
Fixes: 490257166/clusterfuzz-testcase-minimized-ffmpeg_dem_MPEGTS_fuzzer-4815675538604032
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-04-29 20:50:21 +00:00
James Almer
2e6af10481
avformat/dashdec: copy stream groups from input representations
...
Signed-off-by: James Almer <jamrial@gmail.com>
2026-04-29 14:00:03 +00:00
James Almer
8fad6dcfd9
avformat/dashdec: support more than one underlying stream per Representation
...
Some Dash manifests contain Representations within an Adaptation Set that
reference an underlying mp4 context that contain more than the stream it
describes, as is the case of LCEVC enhancements.
Despite the fact open_demux_for_component() loops through all streams in the
underlying context, the rest of the demuxer is writen assuming only the
stream described by the corresponding representation will be present, which
results in completely wrong stream index assignments.
Signed-off-by: James Almer <jamrial@gmail.com>
2026-04-29 14:00:03 +00:00
Martin Storsjö
397c7c7524
tools/check_arm_indent: Run formatting on arm, in addition to aarch64
...
Add exceptions for files that aren't handled well (or that would
require more manual cleanups to make the output look good).
2026-04-29 13:53:07 +03:00
Martin Storsjö
f6b21eca5e
tools/check_arm_indent: Add missing ;; in switch case, fix indentation
2026-04-29 13:53:07 +03:00
Martin Storsjö
963ea707e3
arm/rv40dsp: Add * on comment continuation lines in prototypes
...
This avoids that the assembly indenter script tries to indent these
lines as assembly code.
2026-04-29 13:53:07 +03:00
Martin Storsjö
0a86aead82
arm/vc1dsp: Fix a few cases of inconsistent indentation
...
The function ff_vc1_unescape_buffer_helper_neon intentionally
uses unusual indentation, to indicate different levels of
unrolling in the function.
2026-04-29 13:53:07 +03:00
Martin Storsjö
10a45072fc
arm/jrevdct: Indent previously unindented assembly
...
The comments have been manually tweaked to line up properly.
2026-04-29 13:53:07 +03:00
Martin Storsjö
5e0f1b1eda
arm/hevcdsp_qpel: Reindent code that seem to lack consistent indentation
2026-04-29 13:53:07 +03:00
Martin Storsjö
65d4c5bbe2
arm: Reindent asm that used consistent but differing styles
...
The qpel_filter macros in hevcdsp_qpel_neon.S have been
manually tweaked to keep reasonable indentation of the
comments.
2026-04-29 13:53:07 +03:00
Martin Storsjö
2325421904
arm/synth_filter_vfp: Fix indentation
...
This was done with manual adjustments; the reindentation
script doesn't handle the VFP/NOVFP macros at the start of
lines.
2026-04-29 13:53:07 +03:00
Ramiro Polla
8d9c1db95d
arm/simple_idct_arm: Reindent previously unindented code
2026-04-29 13:53:07 +03:00
Martin Storsjö
a65ed248fd
arm/simple_idct_armv6: Reindent previously consistent assembly to shared style
...
This has manual fixups, as the indenting script wants to
lowercase constants like W46 to w46, which breaks things.
2026-04-29 13:49:27 +03:00
Martin Storsjö
b27fd61020
arm/simple_idct_armv5te: Reindent previously consistent code to common style
...
This has manual fixups, as the indenting script wants to
lowercase constants like W26 to w26, which breaks things.
2026-04-29 13:49:27 +03:00
Martin Storsjö
8e199a2a9f
arm/rv34dsp: Adjust macro argument indentation slightly
...
The previous form did neatly align with the lines above, but doesn't
match general indentation rules from our indentation script.
2026-04-29 13:49:27 +03:00
Martin Storsjö
9653588441
libswscale/arm: Switch consistent indentation to common style
...
Some of these files aligned instructions to 4/24 columns, while
we commonly indent arm/aarch64 assembly to 8/24 columns.
Some of these files also used a different alignment for the
operands.
2026-04-29 13:49:27 +03:00
Martin Storsjö
c5a3cb00b7
libswresample/arm: Change to the common indentation size
...
These files consistently aligned instructions to 4/24 columns,
while we commonly indent arm/aarch64 assembly to 8/24 columns.
2026-04-29 13:49:27 +03:00
Martin Storsjö
25d703dd2a
libavutil/arm: Fix indentation in asm.S
2026-04-29 13:49:27 +03:00
Martin Storsjö
d94e2b0f7c
arm/hevcdsp: Fix misindented instructions in some macros
2026-04-29 13:49:27 +03:00
Martin Storsjö
7eaeb5ab4a
arm: Fix indentation of stray individual misaligned instructions
2026-04-29 13:49:27 +03:00
Martin Storsjö
17765fe831
arm: Reindent assembly where it was off by one char
2026-04-29 13:49:27 +03:00
Martin Storsjö
946e80fde7
libswscale/arm: Lowercase the "LSL" keyword
2026-04-29 13:49:27 +03:00
Martin Storsjö
ea7079074c
tools/indent_arm_assembly: Don't indent "foo .req bar" lines like an instruction
...
These are used a bit in our arm assembly, while they're used much
less in our aarch64 assembly.
2026-04-29 13:49:27 +03:00
Martin Storsjö
cd7a3cd799
tools/indent_arm_assembly: Recognize more comment forms, for skipping lowercasing
...
When we try to lowercase register names (e.g. Q0 -> q0) we avoid
doing that for parts of the code that are comments, as comments
occasionally contain pseudocode that contain such mentions that
aren't register names, but pseudocode/reference code variables.
See 7ebb6c54eb for more details
about that.
In addition to recognizing comments starting with //, also
recognize /* and @ (which is a comment char in arm assembly, but
not in aarch64).
2026-04-29 13:49:27 +03:00
Michael Niedermayer
7c67748537
avformat/mov: check extradata in mov_read_dops()
...
We do want to limit an attackers ability to change once parsed structures.
So once extradata (or another array) is finished and possibly has been used we do not
want to allow an attacker to change it.
This reduces the attack surface
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-04-29 00:46:47 +00:00