Commit graph

53271 commits

Author SHA1 Message Date
Rémi Denis-Courmont
55200f999c lavc/mathops: R-V B optimisation for mid_pred
If Zbb is enabled at compilation (e.g. Ubuntu), the compiler should
compile the new C mid_pred() function correctly. But if Zbb is *not*
enabled (e.g. Debian), then we can at least fallback at run-time.

On SiFive-U74, before:
sub_median_pred_c:                                    1331.9 ( 1.00x)
sub_median_pred_rvb_b:                                 881.8 ( 1.51x)

After:
sub_median_pred_c:                                    1133.1 ( 1.00x)
sub_median_pred_rvb_b:                                 875.7 ( 1.29x)
2025-12-19 19:56:13 +02:00
Rémi Denis-Courmont
ccd7e66f9e lavc/mathops: remove bespoke Arm mid_pred()
The C codegen is as good if not slightly better than the assembler at
this point.
2025-12-19 19:56:13 +02:00
Rémi Denis-Courmont
8dccb380cf lavc/mathops: simplify mid_pred()
This reduces the minimum instruction emission for mid_pred()
(i.e. median of 3) down to:
- 3 comparisons and 4 conditional moves, or
- 4 min/max.

With that the compiler can eliminate any branch. This optimal
situation is attainable with Clang 21 on Arm64, RVA22 and x86,
with GCC 15 on Arm64 and x86 (RVA22 goes from 2 to 1 branch).
These optimisations also work on Arm32 and LoongArch.

The same algorithm is already implemented via inline assembler for some
architectures such as x86 and Arm32, but notably not Arm64 and RVA22.
Besides, using C code allows the compiler to schedule instruction
properly.

Even on architectures with neither conditional moves nor min/max, this
leads to a visible performance improvement for C code, as seen here for
RVA20 code running on SiFive-U74:

Before:
sub_median_pred_c:                                    1657.5 ( 1.00x)
sub_median_pred_rvb_b:                                 875.9 ( 1.89x)

After:
sub_median_pred_c:                                    1331.9 ( 1.00x)
sub_median_pred_rvb_b:                                 881.8 ( 1.51x)

Note that this commit leaves the x86 and Arm32 code intact so it has
no effects on those ISA's.
2025-12-19 19:50:56 +02:00
James Almer
78c75d546a avcodec/apv_parser: add support for AU assembly
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-18 01:24:35 +00:00
Dariusz Marcinkiewicz
c41155e614 libavcodec: support frame dropping in libvpxenc
vp8 encoder can be configured to drop frames, when e.g. bitrate
overshoot is detected. At present the code responsible for
managing an internal fifo assumes that we will get an output frame per
each frame fed into encoder. That is not the case if the encoder can
decide to drop frames.

Running:
ffmpeg -stream_loop 100 -i dash_video3.webm -c:v libvpx -b:v 50k
-drop-threshold 20 -screen-content-mode 2 output.webm

results in lots of warnings like:
[libvpx @ 0x563fd8aba100] Mismatching timestamps: libvpx 2187 queued
2185; this is a bug, please report it
[libvpx @ 0x563fd8aba100] Mismatching timestamps: libvpx 2189 queued
2186; this is a bug, please report it

followed by:
[vost#0:0/libvpx @ 0x563fd8ab9b40] [enc:libvpx @ 0x563fd8aba080] Error
submitting video frame to the encoder
[vost#0:0/libvpx @ 0x563fd8ab9b40] [enc:libvpx @ 0x563fd8aba080] Error
encoding a frame: No space left on device
[vost#0:0/libvpx @ 0x563fd8ab9b40] Task finished with error code: -28
(No space left on device)
[vost#0:0/libvpx @ 0x563fd8ab9b40] Terminating thread with return code
-28 (No space left on device)

The reason for the above error is that each dropped frame leaves an
extra item in the fifo, which eventually overflows.

The proposed fix is to keep popping elements from the fifo until the
one with the matching pts is found. A side effect of this change is that
the code no longer considers pts mismatch to be a bug.

This has likely regressed around 5bda4ec6c3
when fifo started to be universally used.

Signed-off-by: Dariusz Marcinkiewicz <darekm@google.com>
2025-12-16 21:53:10 +00:00
James Almer
1c804b349e avcodec/jpegxs_parser: fix bitstream assembly logic
JPEG-XS streams can have the bytes corresponding to certain markers as part of
slice data, and no considerations were made for it, so we need to add checks
for false positives.

This fixes assembling several samples.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-16 10:38:56 -03:00
Rémi Denis-Courmont
29185f708f lavc/riscv: fix dependency for llvidencdsp 2025-12-15 18:40:13 +02:00
Rémi Denis-Courmont
acb38d320b lavc/llvidencdsp: fix R-V V sub_left_predict
The code assumed that the destination buffer was zeroed, a misbehaviour
with which checkasm is bug-compatible as it zeroes the destination
buffer. The fixed code is even faster:

SpacemiT X60:
sub_left_predict_c:                                  51792.5 ( 1.00x)
sub_left_predict_rvv_i32:                             3504.4 (14.78x)
2025-12-15 18:35:58 +02:00
averne
b9078c0939 vulkan/prores: copy constant tables to shared memory
The shader needs ~3 loads per DCT coeff.
This data was not observed to get efficiently stored
in the upper cached levels, loading it explicitely in
shared memory fixes that.

Also reduce code size by moving the bitstream
initialization outside of the switch/case.
2025-12-15 12:29:00 +00:00
averne
00914cc3ef vulkan/prores: increase bitstream caching
Now caches 64B of data when the reader hits the refill codepath
2025-12-15 12:29:00 +00:00
averne
a2475d16ed lavc/vulkan/common: allow configurable bitstream caching in shared memory 2025-12-15 12:29:00 +00:00
James Almer
35e68c6492 avcodec/libsvtjpegxsdec: only return AVERROR codes
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-14 18:22:04 -03:00
James Almer
4269d69df1 avcodec/libsvtjpegxsdec: use AVCodecContext.lowres instead of a private option
It was evidently an oversight.

Found-by: Andreas Rheinhardt.
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-14 18:22:04 -03:00
James Almer
330984579b avcodec/libsvtjpegxsdec: move some stack structs to the private decoder context
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-14 18:22:04 -03:00
James Almer
2903a3c1ec avcodec/libsvtjpegxsdec: reindent after the previous changes
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-14 18:22:04 -03:00
James Almer
4e9f5e2f3d avcodec/libsvtjpegxsdec: support parameter changes
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-14 18:22:03 -03:00
James Almer
7bd793c647 avcodec/libsvtjpegxsdec: remove chunk decoding code
That's the job of the parser.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-14 18:22:03 -03:00
James Almer
695b717944 avcodec/libsvtjpegxsdec: Replace divisions by shifts
Based on a patch by Andreas Rheinhardt

Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-14 18:21:58 -03:00
James Almer
c639ea5eeb avcodec/libsvtjpegxsenc: set bitrate to a sane default if unset
Better than failing with an impossibly low bitrate.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-14 17:34:57 -03:00
Andreas Rheinhardt
6f9849cbe5 avcodec/libsvtjpegxsenc: Replace divisions by shifts
Also simplify setting alloc_size.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-14 21:06:03 +01:00
Andreas Rheinhardt
7efd09813f avcodec/libsvtjpegxsenc: Don't copy unnecessarily
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-14 21:06:00 +01:00
Andreas Rheinhardt
af70b24692 avcodec/libsvtjpegxs{dec,enc}: Don't call av_cpu_count() multiple times
(Like the old code, the new code limits the number of threads to 64,
even when the user explicitly set a higher thread count. I don't know
whether it is intentional to apply this limit even when the user
explicitly supplied the number of threads.)

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-14 21:05:57 +01:00
Andreas Rheinhardt
47179c3452 avcodec/libsvtjpegxsenc: Remove dead code
The pixel format has already been checked generically.
This also fixes the bug that the earlier code ignored
the return value of set_pix_fmt().

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-14 21:05:54 +01:00
Andreas Rheinhardt
9ed64db6a5 avcodec/libsvtjpegxs{dec,enc}: Don't get log level multiple times
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-14 21:05:33 +01:00
Andreas Rheinhardt
f96829b5bf avcodec/x86/lossless_videoencdsp_init: Remove pointless av_unused
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-14 10:16:46 +01:00
Andreas Rheinhardt
abe6ba17fa avcodec/x86/lossless_videoencdsp: Port sub_median_pred to NASM
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-14 10:16:43 +01:00
Andreas Rheinhardt
9ba33cc198 avcodec/x86/lossless_videoencdsp_init: Avoid special-casing first pixel
Old benchmarks:
sub_median_pred_c:                                     404.1 ( 1.00x)
sub_median_pred_sse2:                                   20.5 (19.67x)

New benchmarks:
sub_median_pred_c:                                     408.5 ( 1.00x)
sub_median_pred_sse2:                                   19.2 (21.27x)

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-14 10:16:40 +01:00
Andreas Rheinhardt
3a3e7080f1 avcodec/x86/lossless_videoencdsp_init: Port sub_median_pred to SSE2
Old benchmarks:
sub_median_pred_c:                                     405.7 ( 1.00x)
sub_median_pred_mmxext:                                 35.1 (11.57x)

New benchmarks:
sub_median_pred_c:                                     404.1 ( 1.00x)
sub_median_pred_sse2:                                   20.5 (19.67x)

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-14 10:16:35 +01:00
Andreas Rheinhardt
3144652588 avcodec/x86/lossless_videoencdsp_init: Don't read too often
sub_median_pred_mmxext() calculates a predictor from the left, top
and topleft pixel values. The topleft values need to be initialized
differently for the first loop initialization than for the others
in order to avoid reading ptr[-1]. So it has been initialized before
the loop and then read again at the end of the loop, so that the last
value read was never used. Yet this can lead to reads beyond the end
of the buffer, e.g. with
ffmpeg -cpuflags mmx+mmxext -f lavfi -i "color=size=64x4,format=yuv420p" \
-vf vflip -c:v ffvhuff -pred median -frames 1 -f null -

Fix this by not reading the value at the end of the loop.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-14 10:16:29 +01:00
Andreas Rheinhardt
2b9aea7756 avcodec/x86/lossless_videoencdsp_init: Don't read from before the buffer
sub_median_pred_mmxext() calculates a predictor from the left, top
and topleft pixel values. The left value is simply read via
ptr[-1], although this is not guaranteed to be inside the buffer
in case of negative strides. This happens e.g. with

ffmpeg -i fate-suite/mpeg2/dvd_single_frame.vob -vf vflip \
       -c:v magicyuv -pred median -f null -

Fix this by reading the first value like the topleft value.
Also change the documentation of sub_median_pred to reflect this
change (and the one from 791b5954bc).

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-14 10:16:25 +01:00
Rémi Denis-Courmont
71db4f3cc1 lavc/llvidencdsp: R-V V sub_median_pred
SpacemiT X60:
sub_median_pred_c:                                  297862.8 ( 1.00x)
sub_median_pred_rvb_b:                              101992.2 ( 2.92x)
sub_median_pred_rvv_i32:                              4820.0 (61.80x)
2025-12-14 10:33:40 +02:00
Rémi Denis-Courmont
87190fff6e lavc/llvidencdsp: R-V B sub_median_pred
SiFive U74:
sub_median_pred_c:                                  238947.3 ( 1.00x)
sub_median_pred_rvb_b:                              106686.9 ( 2.24x)

SpacemiT X60:
sub_median_pred_c:                                  297862.8 ( 1.00x)
sub_median_pred_rvb_b:                              101992.2 ( 2.92x)
2025-12-14 10:33:40 +02:00
Tomasz Szumski
08db850159 avcodec: add JPEG-XS decoder and encoder using libsvtjpegxs
Co-Authored-by: James Almer <jamrial@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-13 19:00:35 -03:00
James Almer
52c097065c avcodec: add a JPEG-XS parser
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-13 18:45:17 -03:00
Tomasz Szumski
4243e6c870 avcodec/codec_id: add JPEG-XS
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-13 18:45:17 -03:00
Lynne
9e8e34d475
vulkan_ffv1: remove unused RCT shader files
The 2 files were made redundant when the RCT was merged into encode/decode.
2025-12-13 22:12:26 +01:00
Lynne
5bb9cd23b7
vulkan_dpx: fix GRAY16BE and big-endian marked 8-bit samples 2025-12-13 21:35:56 +01:00
Lynne
c3291993eb
vulkan_ffv1: use proper rounded divisions for plane width and height
Fixes #20314
2025-12-13 19:12:24 +01:00
Lynne
91deb96d3c
vulkan_decode: don't set unnecessary function pointers for FFHWAccel
Invalidate is not used for SDR decoders, since they don't use session
parameters.
2025-12-13 19:12:24 +01:00
Lynne
72e83b42d1
vulkan_decode: clean up decoder initialization
Now that we don't reset on every seek, we can simplify it.
2025-12-13 19:12:24 +01:00
Lynne
018ba6b612
vulkan_decode: do not reset the decoder when flushing
The issue is that .flush gets called asynchronously, and modifies the
video session state while its being used for decoding. This did not
result in issues since all known vendors do not keep important state
there, but its not compliant with the specs.

Its not necessary to flush the decoder at all when seeking,
so simply don't.

Fixes #20487
2025-12-13 19:12:20 +01:00
Andreas Rheinhardt
3da2a21710 avcodec/hq_hqadata: Avoid relocations for HQProfiles
Reviewed-by: Marton Balint <cus@passwd.hu>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-13 05:57:47 +01:00
Andreas Rheinhardt
2718874724 avcodec/hq_hqadata: Remove padding from tables
Each table needs only tab_w*tab_h*2 entries.

Reviewed-by: Marton Balint <cus@passwd.hu>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-13 05:55:44 +01:00
Andreas Rheinhardt
0cf187471f avcodec/hq_hqa: Don't rederive value
perm gets incremented in the loop in such a manner that
it already has the value it is set to here except for
the first loop iteration.

Reviewed-by: Marton Balint <cus@passwd.hu>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-13 05:55:20 +01:00
Ruikai Peng
c48b8ebbbb avcodec/vulkan: fix DPX unpack offset
The DPX Vulkan unpack shader computes a word offset as

    uint off = (line_off + pix_off >> 5);

Due to GLSL operator precedence this is evaluated as
line_off + (pix_off >> 5) rather than (line_off + pix_off) >> 5.
Since line_off is in bits while off is a 32-bit word index,
scanlines beyond y=0 use an inflated offset and the shader reads
past the end of the DPX slice buffer.

Parenthesize the expression so that the sum is shifted as intended:

    uint off = (line_off + pix_off) >> 5;

This corrects the unpacked data and removes the CRC mismatch
observed between the software and Vulkan DPX decoders for
mispacked 12-bit DPX samples. The GPU OOB read itself is only
observable indirectly via this corruption since it occurs inside
the shader.

Repro on x86_64 with Vulkan/llvmpipe (531ce713a0):

    ./configure --cc=clang --disable-optimizations --disable-stripping \
        --enable-debug=3 --disable-doc --disable-ffplay \
        --enable-vulkan --enable-libshaderc \
        --enable-hwaccel=dpx_vulkan \
        --extra-cflags='-fsanitize=address -fno-omit-frame-pointer' \
        --extra-ldflags='-fsanitize=address' && make

    VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.json

PoC: packed 12-bit DPX with the packing flag cleared so the unpack
shader runs (4x64 gbrp12le), e.g. poc12_packed0.dpx.

Software decode:

    ./ffmpeg -v error -i poc12_packed0.dpx -f framecrc -
    -> 0, ..., 1536, 0x26cf81c2

Vulkan hwaccel decode:

    VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.json \
    ./ffmpeg -v error -init_hw_device vulkan \
        -hwaccel vulkan -hwaccel_output_format vulkan \
        -i poc12_packed0.dpx \
        -vf hwdownload,format=gbrp12le -f framecrc -
    -> 0, ..., 1536, 0x71e10a51

The only difference between the two runs is the Vulkan unpack
shader, and the stable CRC mismatch indicates that it is reading
past the intended DPX slice region.

Regression since: 531ce713a0
Found-by: Pwno
2025-12-12 20:13:16 +00:00
James Almer
9c14527f1a avcodec/vvc/refs: export in-band LCEVC side data in frames
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-12 15:21:49 -03:00
James Almer
94c491287c avcodec/vvc/sei: parse Registered and Unregistered SEI messages
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-12 15:21:48 -03:00
James Almer
6dad70507f avcodec/cbs_sei: store a pointer to the start of Registered and Unregistered SEI messages
Required for the following commit, where a parsing function expects the buffer
to include the country code bytes.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-12 15:21:48 -03:00
James Almer
b6655e9594 avcodec/dpx: make the lack of break in a switch case explicit
Should fix CID 1676036

Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-12 18:18:46 +00:00
Cameron Gutman
0637a28dc0 lavc/vulkan_video: fix leak on CreateVideoSessionKHR failure
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
2025-12-12 12:43:00 +00:00