seq_decode is used to ensure that a picture and all of its reference
pictures use the same SPS. Any time the SPS changes, seq_decode should
be incremented. Prior to this patch, seq_decode was incremented in
frame_context_setup, which is called after the SPS is potentially
changed in decode_sps. Should the decoder encounter an error between
changing the SPS and incrementing seq_decode, the SPS could be modified
while seq_decode was not incremented, which could lead to invalid
reference pictures and various downstream issues. By instead updating
seq_decode within the picture set manager, we ensure seq_decode and the
SPS are always updated in tandem.
This loosens the coupling between CBS and the decoder by no longer using
CodedBitstreamH266Context (containing the most recently parsed PSs & PH)
to retrieve the PSs & PH in the decoder. Doing so is beneficial in two
ways:
1. It improves robustness to the case in which an AVPacket doesn't
contain precisely one PU.
2. It allows the decoder parameter set manager to properly handle the
case in which a single PU (erroneously) contains conflicting
parameter sets.
Signed-off-by: Frank Plowman <post@frankplowman.com>
Right now, the private contexts of every decoder supporting
H.274 film grain synthesis (namely H.264, HEVC and VVC)
contain a H274FilmGrainDatabase; said structure is very large
700442B before this commit) and takes up the overwhelming
majority of said contexts: Removing it reduces sizeof(H264Context)
by 92.88%, sizeof(HEVCContext) by 97.78% and sizeof(VVCContext)
by 99.86%. This is especially important for H.264 and HEVC
when using frame-threading.
The content of said film grain database does not depend on
any input parameter; it is shareable between all its users and
could be hardcoded in the binary (but isn't, because it is so huge).
This commit adds a database with static storage duration to h274.c
and uses it instead of the elements in the private contexts above.
It is still lazily initialized as-needed; a mutex is used
for the necessary synchronization. An alternative would be to use
an AV_ONCE to initialize the whole database either in the decoders'
init function (which would be wasteful given that most videos
don't use film grain synthesis) or in ff_h274_apply_film_grain().
Reviewed-by: Niklas Haas <ffmpeg@haasn.dev>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When checking for filmgrain here, needs_fg can be true even when
film_grain_characteristics is NULL (when aom_film_grain.enable is true),
therefore this check could end up dereferencing film_grain_characteristics
even though it is NULL.
Fix CID 1648347
This commit adds a reference to the buffer as an argument to
start_frame, and adapts all existing code.
This allows for asynchronous hardware accelerators to skip
copying packet data by referencing it.
In the fail: block of decode_nal_units, a check as to whether fc->ref is
nonzero is used. Before this patch, fc->ref was set to NULL in
frame_context_setup. The issue is that, by the time frame_context_setup
is called, falliable functions (namely slices_realloc and
ff_vvc_decode_frame_ps) have already been called. Therefore, there
could arise a situation in which the fc->ref test of decode_nal_units'
fail: block is performed while fc->ref has an invalid value. This seems
to be particularly prevalent in situations where the FrameContexts are
being reused. The patch resolves the issue by moving the assignment of
fc->ref to NULL to the very top of decode_nal_units, before any falliable
functions are called.
Signed-off-by: Frank Plowman <post@frankplowman.com>
Reduces memory consumption by ~4MB for 1080p video with a maximum delay
of 16 frames by packing various information related to MIP:
* intra_mip_flag, 1 bit
* intra_mip_transposed_flag, 1 bit
* intra_mip_mode, 4 bits
Into a single byte.
Co-authored-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: Frank Plowman <post@frankplowman.com>
This change will save approximately 531 MB for an 8K clip when processed with 16 threads.
The calculation is as follows:
7680 * 4320 * sizeof(int) * 2 * 2 * 16 / (4 * 4).
This reverts commit 110d8549d5.
I have been working through fixing bugs, particularly crashes I've
found using a fuzzer, in the VVC decoder for the past few months.
While I won't claim it is now bug-free, it is considerably more
resilient than it was and I think in a position to have the
experimental flag removed for release 7.1.
Additionally, most of the Main 10 features of VVC which were missing
version of the decoder released in 7.0 have now been implemented.
This includes the most major missing features: IBC, subpictures and RPR.
Signed-off-by: Frank Plowman <post@frankplowman.com>
From Jun Zhao <mypopydev@gmail.com>:
> Should we relocate this to the decoder? Other codecs typically set this
> parameter in the decoder.
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
memset tables in the main thread can become a bottleneck for the decoder.
For example, if it takes 1% of the processing time for one core, the maximum achievable FPS will be 100.
Move the memeset to worker threads will fix the issue.
For luma, qp can only change at the CU level, so the qp tab size is related to the CU.
For chroma, considering the joint CbCr, the QP tab size is related to the TU.
Fixes: signed integer overflow: 1107820800 + 1107820800 cannot be represented in type 'int'
Fixes: left shift of 1091059712 by 6 places cannot be represented in type 'int'
Fixes: 69910/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VVC_fuzzer-5162839971528704
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Allocations in the following lines depend on the pixel shift, and so
these buffers must be reallocated if the pixel shift changes. Patch
fixes segmentation faults in fuzzed bitstreams.
Signed-off-by: Frank Plowman <post@frankplowman.com>
For RPR, the current frame may reference a frame with a different resolution.
Therefore, we need to consider frame scaling when we wait for reference pixels.
Fixes: CID1560042 Unchecked return value
Sponsored-by: Sovereign Tech Fund
Reviewed-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
For some error bitstreams, a CTU belongs to two slices/entry points.
If the decoder initializes and submmits the CTU task twice, it may crash the program
or cause it to enter an infinite loop.
Reported-by: Frank Plowman <post@frankplowman.com>
The native VVC decoder does not yet support quality/spatial/multiview
scalability. Bitstreams requiring this feature could cause crashes.
Patch fixes this by skipping NAL units which are not in the base layer,
warning the user while doing so.
Signed-off-by: Frank Plowman <post@frankplowman.com>
Signed-off-by: James Almer <jamrial@gmail.com>
The size variable here is taken as gospel for the bounds of the input
buffer in later logic. Clamp it to ensure that the returned region
does not extend past that allocated in the underlying GetBitContext,
even in the case entry point offsets are signalled in the bitstream.
Also assert this for good measure.
Signed-off-by: Frank Plowman <post@frankplowman.com>
A namespace is unnecessary here given that all these files
are already in the vvc subfolder.
Reviewed-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>