Commit graph

53243 commits

Author SHA1 Message Date
Andreas Rheinhardt
3144652588 avcodec/x86/lossless_videoencdsp_init: Don't read too often
sub_median_pred_mmxext() calculates a predictor from the left, top
and topleft pixel values. The topleft values need to be initialized
differently for the first loop initialization than for the others
in order to avoid reading ptr[-1]. So it has been initialized before
the loop and then read again at the end of the loop, so that the last
value read was never used. Yet this can lead to reads beyond the end
of the buffer, e.g. with
ffmpeg -cpuflags mmx+mmxext -f lavfi -i "color=size=64x4,format=yuv420p" \
-vf vflip -c:v ffvhuff -pred median -frames 1 -f null -

Fix this by not reading the value at the end of the loop.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-14 10:16:29 +01:00
Andreas Rheinhardt
2b9aea7756 avcodec/x86/lossless_videoencdsp_init: Don't read from before the buffer
sub_median_pred_mmxext() calculates a predictor from the left, top
and topleft pixel values. The left value is simply read via
ptr[-1], although this is not guaranteed to be inside the buffer
in case of negative strides. This happens e.g. with

ffmpeg -i fate-suite/mpeg2/dvd_single_frame.vob -vf vflip \
       -c:v magicyuv -pred median -f null -

Fix this by reading the first value like the topleft value.
Also change the documentation of sub_median_pred to reflect this
change (and the one from 791b5954bc).

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-14 10:16:25 +01:00
Rémi Denis-Courmont
71db4f3cc1 lavc/llvidencdsp: R-V V sub_median_pred
SpacemiT X60:
sub_median_pred_c:                                  297862.8 ( 1.00x)
sub_median_pred_rvb_b:                              101992.2 ( 2.92x)
sub_median_pred_rvv_i32:                              4820.0 (61.80x)
2025-12-14 10:33:40 +02:00
Rémi Denis-Courmont
87190fff6e lavc/llvidencdsp: R-V B sub_median_pred
SiFive U74:
sub_median_pred_c:                                  238947.3 ( 1.00x)
sub_median_pred_rvb_b:                              106686.9 ( 2.24x)

SpacemiT X60:
sub_median_pred_c:                                  297862.8 ( 1.00x)
sub_median_pred_rvb_b:                              101992.2 ( 2.92x)
2025-12-14 10:33:40 +02:00
Tomasz Szumski
08db850159 avcodec: add JPEG-XS decoder and encoder using libsvtjpegxs
Co-Authored-by: James Almer <jamrial@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-13 19:00:35 -03:00
James Almer
52c097065c avcodec: add a JPEG-XS parser
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-13 18:45:17 -03:00
Tomasz Szumski
4243e6c870 avcodec/codec_id: add JPEG-XS
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-13 18:45:17 -03:00
Lynne
9e8e34d475
vulkan_ffv1: remove unused RCT shader files
The 2 files were made redundant when the RCT was merged into encode/decode.
2025-12-13 22:12:26 +01:00
Lynne
5bb9cd23b7
vulkan_dpx: fix GRAY16BE and big-endian marked 8-bit samples 2025-12-13 21:35:56 +01:00
Lynne
c3291993eb
vulkan_ffv1: use proper rounded divisions for plane width and height
Fixes #20314
2025-12-13 19:12:24 +01:00
Lynne
91deb96d3c
vulkan_decode: don't set unnecessary function pointers for FFHWAccel
Invalidate is not used for SDR decoders, since they don't use session
parameters.
2025-12-13 19:12:24 +01:00
Lynne
72e83b42d1
vulkan_decode: clean up decoder initialization
Now that we don't reset on every seek, we can simplify it.
2025-12-13 19:12:24 +01:00
Lynne
018ba6b612
vulkan_decode: do not reset the decoder when flushing
The issue is that .flush gets called asynchronously, and modifies the
video session state while its being used for decoding. This did not
result in issues since all known vendors do not keep important state
there, but its not compliant with the specs.

Its not necessary to flush the decoder at all when seeking,
so simply don't.

Fixes #20487
2025-12-13 19:12:20 +01:00
Andreas Rheinhardt
3da2a21710 avcodec/hq_hqadata: Avoid relocations for HQProfiles
Reviewed-by: Marton Balint <cus@passwd.hu>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-13 05:57:47 +01:00
Andreas Rheinhardt
2718874724 avcodec/hq_hqadata: Remove padding from tables
Each table needs only tab_w*tab_h*2 entries.

Reviewed-by: Marton Balint <cus@passwd.hu>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-13 05:55:44 +01:00
Andreas Rheinhardt
0cf187471f avcodec/hq_hqa: Don't rederive value
perm gets incremented in the loop in such a manner that
it already has the value it is set to here except for
the first loop iteration.

Reviewed-by: Marton Balint <cus@passwd.hu>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-13 05:55:20 +01:00
Ruikai Peng
c48b8ebbbb avcodec/vulkan: fix DPX unpack offset
The DPX Vulkan unpack shader computes a word offset as

    uint off = (line_off + pix_off >> 5);

Due to GLSL operator precedence this is evaluated as
line_off + (pix_off >> 5) rather than (line_off + pix_off) >> 5.
Since line_off is in bits while off is a 32-bit word index,
scanlines beyond y=0 use an inflated offset and the shader reads
past the end of the DPX slice buffer.

Parenthesize the expression so that the sum is shifted as intended:

    uint off = (line_off + pix_off) >> 5;

This corrects the unpacked data and removes the CRC mismatch
observed between the software and Vulkan DPX decoders for
mispacked 12-bit DPX samples. The GPU OOB read itself is only
observable indirectly via this corruption since it occurs inside
the shader.

Repro on x86_64 with Vulkan/llvmpipe (531ce713a0):

    ./configure --cc=clang --disable-optimizations --disable-stripping \
        --enable-debug=3 --disable-doc --disable-ffplay \
        --enable-vulkan --enable-libshaderc \
        --enable-hwaccel=dpx_vulkan \
        --extra-cflags='-fsanitize=address -fno-omit-frame-pointer' \
        --extra-ldflags='-fsanitize=address' && make

    VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.json

PoC: packed 12-bit DPX with the packing flag cleared so the unpack
shader runs (4x64 gbrp12le), e.g. poc12_packed0.dpx.

Software decode:

    ./ffmpeg -v error -i poc12_packed0.dpx -f framecrc -
    -> 0, ..., 1536, 0x26cf81c2

Vulkan hwaccel decode:

    VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.json \
    ./ffmpeg -v error -init_hw_device vulkan \
        -hwaccel vulkan -hwaccel_output_format vulkan \
        -i poc12_packed0.dpx \
        -vf hwdownload,format=gbrp12le -f framecrc -
    -> 0, ..., 1536, 0x71e10a51

The only difference between the two runs is the Vulkan unpack
shader, and the stable CRC mismatch indicates that it is reading
past the intended DPX slice region.

Regression since: 531ce713a0
Found-by: Pwno
2025-12-12 20:13:16 +00:00
James Almer
9c14527f1a avcodec/vvc/refs: export in-band LCEVC side data in frames
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-12 15:21:49 -03:00
James Almer
94c491287c avcodec/vvc/sei: parse Registered and Unregistered SEI messages
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-12 15:21:48 -03:00
James Almer
6dad70507f avcodec/cbs_sei: store a pointer to the start of Registered and Unregistered SEI messages
Required for the following commit, where a parsing function expects the buffer
to include the country code bytes.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-12 15:21:48 -03:00
James Almer
b6655e9594 avcodec/dpx: make the lack of break in a switch case explicit
Should fix CID 1676036

Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-12 18:18:46 +00:00
Cameron Gutman
0637a28dc0 lavc/vulkan_video: fix leak on CreateVideoSessionKHR failure
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
2025-12-12 12:43:00 +00:00
Cameron Gutman
4e4677bf58 lavc/vulkan_video: fix double-free if ff_vk_decode_init() fails
ff_vk_video_common_init() calls ff_vk_video_common_uninit() on failure
which leaves dangling object handles. Those get freed again when the
destructor of FFVulkanDecodeShared calls ff_vk_video_common_uninit()
again.

Signed-off-by: Cameron Gutman <aicommander@gmail.com>
2025-12-12 12:43:00 +00:00
Andreas Rheinhardt
a72e01b4ec avcodec/ppc/vc1dsp_altivec: Don't read too much data
vc1_inv_trans_8x4_altivec() is supposed to process a block
of 8x4 words, yet it read and processed eight lines. This led
to ASAN failures (see [1]) that this commit intends to fix.
It should also lead to performance improvements, but I don't have
real hardware to bench it.

[1]: https://fate.ffmpeg.org/report.cgi?time=20251207214004&slot=ppc64-linux-gcc-14.3-asan

Reviewed-by: Sean McGovern <gseanmcg@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-12 09:44:01 +01:00
James Almer
04df80f973 avcodec/cavs_parser: check return value of init_get_bits8
Fixes Coverity issue CID 1676035

Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-11 20:01:01 -03:00
Rémi Denis-Courmont
a4cb6c724b lavc/llvidencdsp: R-V V sub_left_predict
SpacemiT X60:
sub_left_predict_c:                                  51836.0 ( 1.00x)
sub_left_predict_rvv_i32:                             5843.1 ( 8.87x)
2025-12-11 17:24:38 +02:00
Leo Izen
37858dc6bd
avcodec/libjxlenc: add EXIF box to output
We already parse the EXIF side data to extract the orientation, so we
should add it to the output file as an EXIF box.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2025-12-11 05:38:36 -05:00
Leo Izen
e349118b4c
avcodec/libjxlenc: avoid calling functions inside if statements
It leads to messier, less readable code, and can also lead to bugs.
I prefer this code style.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2025-12-11 05:38:35 -05:00
Leo Izen
6ec4b3a9cb
avcodec/libjxlenc: give display matrix sidedata priority
Before this commit, we ignore the display matrix side data if any EXIF
side data is present, even if that side data contains no orientation
tag. This allows us to calculate the orientation from the display
matrix sidedata first, if present. Ideally the decoder will have
removed the orientation tag upon decoding and attached the data as
display matrix side data instead, so this makes our orientation code
respect this behavior.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2025-12-11 05:38:33 -05:00
Hyunjun Ko
6726359326 vulkan_vp9: fix subsampling source and show_frame flag 2025-12-10 18:41:20 +00:00
Kacper Michajłow
04a46a2ae4 avcodec/d3d12va_encode_av1: don't ignore return value
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-12-08 21:31:13 +00:00
Kacper Michajłow
f4fc14fb38 avcodec/d3d12va_encode_av1: fix size_t format specifier 2025-12-08 21:31:13 +00:00
Kacper Michajłow
5b2bd6f88d avcodec/d3d12va_encode_av1: remove unused variables
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-12-08 21:31:13 +00:00
Kacper Michajłow
1f7182a991 avcodec/libx265: add explicit enum cast to suppress compiler warnings
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-12-08 21:31:13 +00:00
Kacper Michajłow
eaa2b3d4be avcodec/libsvtav1: add explicit enum cast to suppress compiler warnings
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-12-08 21:31:13 +00:00
Kacper Michajłow
490af2d4cf avcodec/libaomdec: add explicit enum cast to suppress compiler warnings
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-12-08 21:31:13 +00:00
Andreas Rheinhardt
dc843cdd9a avcodec/x86/vp9mc: Reindent after the previous commit
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-08 19:35:07 +01:00
Andreas Rheinhardt
65e71b0837 avcodec/x86/vp9mc: Deduplicate coefficient tables
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-08 19:35:01 +01:00
Andreas Rheinhardt
38e2174ce4 avcodec/x86/vp9mc: Avoid MMX regs in width 4 hor 8tap funcs
Using wider registers (and pshufb) allows to halve the number of
pmaddubsw used. It is also ABI compliant (no more missing emms).

Old benchmarks:
vp9_avg_8tap_smooth_4h_8bpp_c:                          97.6 ( 1.00x)
vp9_avg_8tap_smooth_4h_8bpp_ssse3:                      15.0 ( 6.52x)
vp9_avg_8tap_smooth_4hv_8bpp_c:                        342.9 ( 1.00x)
vp9_avg_8tap_smooth_4hv_8bpp_ssse3:                     54.0 ( 6.35x)
vp9_put_8tap_smooth_4h_8bpp_c:                          94.9 ( 1.00x)
vp9_put_8tap_smooth_4h_8bpp_ssse3:                      14.2 ( 6.67x)
vp9_put_8tap_smooth_4hv_8bpp_c:                        325.9 ( 1.00x)
vp9_put_8tap_smooth_4hv_8bpp_ssse3:                     52.5 ( 6.20x)

New benchmarks:
vp9_avg_8tap_smooth_4h_8bpp_c:                          97.6 ( 1.00x)
vp9_avg_8tap_smooth_4h_8bpp_ssse3:                      10.8 ( 9.08x)
vp9_avg_8tap_smooth_4hv_8bpp_c:                        342.4 ( 1.00x)
vp9_avg_8tap_smooth_4hv_8bpp_ssse3:                     38.8 ( 8.82x)
vp9_put_8tap_smooth_4h_8bpp_c:                          94.7 ( 1.00x)
vp9_put_8tap_smooth_4h_8bpp_ssse3:                       9.7 ( 9.75x)
vp9_put_8tap_smooth_4hv_8bpp_c:                        321.7 ( 1.00x)
vp9_put_8tap_smooth_4hv_8bpp_ssse3:                     37.0 ( 8.69x)

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-08 19:34:35 +01:00
Andreas Rheinhardt
dd5dc254ff avcodec/x86/vp9mc: Avoid reloads, MMX regs in width 4 vert 8tap func
Four rows of four bytes fit into one xmm register; therefore
one can arrange the rows as follows (A,B,C: first, second, third etc.
row)

xmm0: ABABABAB BCBCBCBC
xmm1: CDCDCDCD DEDEDEDE
xmm2: EFEFEFEF FGFGFGFG
xmm3: GHGHGHGH HIHIHIHI

and use four pmaddubsw to calculate two rows in parallel. The history
fits into four registers, making this possible even on 32bit systems.

Old benchmarks (Unix 64):
vp9_avg_8tap_smooth_4v_8bpp_c:                         105.5 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3:                      16.4 ( 6.44x)
vp9_put_8tap_smooth_4v_8bpp_c:                          99.3 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3:                      15.4 ( 6.44x)

New benchmarks (Unix 64):
vp9_avg_8tap_smooth_4v_8bpp_c:                         105.0 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3:                      11.8 ( 8.90x)
vp9_put_8tap_smooth_4v_8bpp_c:                          99.7 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3:                      10.7 ( 9.30x)

Old benchmarks (x86-32):
vp9_avg_8tap_smooth_4v_8bpp_c:                         138.2 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3:                      28.0 ( 4.93x)
vp9_put_8tap_smooth_4v_8bpp_c:                         123.6 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3:                      28.0 ( 4.41x)

New benchmarks (x86-32):
vp9_avg_8tap_smooth_4v_8bpp_c:                         139.0 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3:                      20.1 ( 6.92x)
vp9_put_8tap_smooth_4v_8bpp_c:                         124.5 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3:                      19.9 ( 6.26x)

Loading the constants into registers did not turn out to be advantageous
here (not to mention Win64, where this would necessitate saving
and restoring ever more register); probably because there are only two
loop iterations.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-08 19:31:59 +01:00
Andreas Rheinhardt
36204fbc3c avcodec/vp9itxfm{,_16bpp}: Remove MMXEXT functions overridden by SSSE3
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMXEXT functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-08 19:27:51 +01:00
Andreas Rheinhardt
ea37f49aed avcodec/vp9intrapred: Remove MMXEXT functions overridden by SSSE3
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMXEXT functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-08 19:27:44 +01:00
Andreas Rheinhardt
6e418af810 avcodec/vp9mc: Remove MMXEXT functions overridden by SSSE3
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMXEXT functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-08 19:27:05 +01:00
Kacper Michajłow
5b5d51cbc1 avcodec/x86/h264_idct: fix version check for NASM 3 and newer
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-12-08 17:43:29 +00:00
Oliver Chang
9849a274df avcodec/dpx: Fix heap-buffer-overflow in 16-bit decoding
Fixes a heap-buffer-overflow in `libavcodec/dpx.c` triggered by a stale
`unpadded_10bit` flag in the `DPXDecContext`. This flag, set for 10-bit
unpadded frames, persisted across `decode_frame` calls. If a subsequent
frame was 16-bit, the stale flag caused incorrect buffer size
validation, allowing truncated buffers to pass checks designed for
smaller 10-bit packed data. This led to an out-of-bounds read in
`av_image_copy_plane` during 16-bit decoding.

The fix explicitly resets `dpx->unpadded_10bit = 0` at the start of
`decode_frame` to ensure correct validation for each frame.

Fixes: https://issues.oss-fuzz.com/issues/464471792
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array read
Fixes: 464471792/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DPX_DEC_fuzzer-5275522210004992
2025-12-07 19:41:02 +00:00
Rémi Denis-Courmont
10ea5f8b99 lavc/h264idct: R-V V 9-bit h264_luma_dc_dequant_idct
Note that, like the C reference, the same function can be used for
larger bit depths.
2025-12-07 20:27:35 +02:00
Rémi Denis-Courmont
d69a36a8d1 lavc/h264idct: R-V V 8-bit h264_luma_dc_dequant_idct
This does not improve performance with current hardware due to the poor
performance of segmented accesses. Performance should be slightly better
with expensive or near-future hardware that I don't have, however it is
still limited by two other factors:
- There are only 4 elements.
- The final stores are necessarily indexed and hit multiple cache lines,
  thus as slow as scalar.
2025-12-07 20:27:35 +02:00
Rémi Denis-Courmont
f222eb2b08 lavc/mpv_unquantize: R-V V H.263 DCT unquantize
SpacemiT X60:
dct_unquantize_h263_inter_c:                           417.8 ( 1.00x)
dct_unquantize_h263_inter_rvv_i32:                      66.0 ( 6.33x)
dct_unquantize_h263_intra_c:                           140.2 ( 1.00x)
dct_unquantize_h263_intra_rvv_i32:                      67.7 ( 2.07x)

Note that the C benchmarks are not stable, depending heavily on the
number of coefficients picked by the RNG. The R-V V benchmarks are
however very stable and generally better than C's.
2025-12-07 20:20:38 +02:00
averne
c384b1e803 vulkan/prores: use vkCmdClearColorImage
The VK spec forbids using clear commands on YUV images,
so we need to allocate separate per-plane images.
This removes the need for a separate reset shader.
2025-12-07 18:17:36 +00:00
James Almer
00caeba050 avcodec: rename avcodec_receive_frame2 to avcodec_receive_frame_flags
It's a name that communicates its functionality in a better way.
Since the function was introduced very recently, we can safely rename it.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-07 12:47:46 -03:00