The criteria for slice structure validity is similar to that of
subpicture structure validity that we saw not too long ago [1].
The relationship between tiles and slices must satisfy the following
properties:
* Exhaustivity. All tiles in a picture must belong to a slice. The
tiles cover the picture, so this implies the slices must cover the
picture.
* Mutual exclusivity. No tile may belong to more than one slice, i.e.
slices may not overlap.
In most cases these properties are guaranteed by the syntax. There is
one noticable exception however: when pps_tile_idx_delta_present_flag is
equal to one, each slice is associated with a syntax element
pps_tile_idx_delta_val[i] which "specifies the difference between the
tile index of the tile containing the first CTU in the ( i + 1 )-th
rectangular slice and the tile index of the tile containing the first
CTU in the i-th rectangular slice" [2]. When these syntax elements are
present, the i-th slice can begin anywhere and the usual guarantees
provided by the syntax are lost.
The patch detects slice structures which violate either of the two
properties above, and are therefore invalid, while building the
slice map. Should the slice map be determined to be invalid, an
AVERROR_INVALIDDATA is returned. This prevents issues including
segmentation faults when trying to decode, invalid bitstreams.
[1]: https://ffmpeg.org//pipermail/ffmpeg-devel/2024-October/334470.html
[2]: H.266 (V3) Section 7.4.3.5, Picture parameter set RBSP semantics
Signed-off-by: Frank Plowman <post@frankplowman.com>
When the current subpicture has sps_subpic_treated_as_pic_flag equal to
1, motion vectors are cropped such that they cannot point to other
subpictures. This was accounted for in the prediction logic, but not
in pred_get_y, which is used by the scheduling logic to determine which
parts of the reference pictures must have been reconstructed before
inter prediction of a subsequent frame may begin. Consequently, where a
motion vector pointed to a location significantly above the current
subpicture, there was the possibility of a race condition. Patch fixes
this by cropping the motion vector to the current subpicture in
pred_get_y.
Signed-off-by: Frank Plowman <post@frankplowman.com>
When a stream has ALF filtering enabled but not CC-ALF, the CC-ALF set indexes alf->ctb_cc_idc are being read uninitialized during ALF filtering.
This change initializes alf->ctb_cc_idc whenever ALF is enabled.
Ref. https://trac.ffmpeg.org/ticket/11325
In 7.4.3.20 of H.266 (V3), there are two similarly-named variables:
scalingMatrixDcPred and ScalingMatrixDcRec. The old code set
ScalingMatrixDcRec, rather than scalingMatrixDcPred, in the first two
branches of the conditions on scaling_list_copy_mode_flag[id] and
aps->scaling_list_pred_mode_flag[id]. This could lead to decode
mismatches in sequences with explicitly-signalled scaling lists.
Signed-off-by: Frank Plowman <post@frankplowman.com>
Reduces memory consumption by ~4MB for 1080p video with a maximum delay
of 16 frames by packing various information related to MIP:
* intra_mip_flag, 1 bit
* intra_mip_transposed_flag, 1 bit
* intra_mip_mode, 4 bits
Into a single byte.
Co-authored-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: Frank Plowman <post@frankplowman.com>
H.266 (V3) section 7.4.12.8: "The value of lMvd[ compIdx ] shall be in
the range of −2^{17} to 2^{17} − 1, inclusive."
Signed-off-by: Frank Plowman <post@frankplowman.com>
Per H.266 (V3) section 7.4.3.19, LmcsMaxBinIdx is set equal to
15 - lmcs_delta_max_bin_idx. The previous code instead had it equal to
15 - lmcs_min_bin_idx. This could cause decoder mismatches.
Signed-off-by: Frank Plowman <post@frankplowman.com>
Previously, the code only stored the MIP mode and transpose flag in the
relevant tables at the top-left corner of the CU. This information ends
up being retrieved in ff_vvc_intra_pred_* not based on the CU position
but instead the transform unit position (specifically, using the x0 and
y0 from get_luma_predict_unit). There might be multiple transform units
in a CU, hence the top-left corner of the transform unit might not
coincide with the top-left corner of the CU. Consequently, we need to
store the MIP information at all positions in the CU, not only its
top-left corner, as we already do for the MIP flag.
Signed-off-by: Frank Plowman <post@frankplowman.com>
The final parameter of check_available determines whether the motion
estimation region constraints imposed in section 8.5.2.3 of H.266 (V3)
on MVP candidates apply to the current candidate or not. In the case of
IBC spatial merge candidates they do not, as their availability is
dependent only on the criteria described in sections 8.6.2.3 and 6.4.4,
which do not include this constraint on the motion estimation region.
Signed-off-by: Frank Plowman <post@frankplowman.com>
The v3 spec reserves CTU size 256. Currently, we use an uint8_t* table to hold cb_width and cb_height.
If a CTU size of 256 is not split, cb_width and cb_height will overflow to 0.
To avoid switching to uint16_t, rejecting CTU size 256 provides a simple solution.
If the current stage reports an error, some variables may not be correctly initialized.
Scheduling the next stage could lead to the use of uninitialized variables.
According to section 6.3.3 "Spatial or component-wise partitionings,"
CTUs should fully cover slices with no overlaps, gaps, or additions.
No overlaps are ensured by task_init_parse.
No gaps and no additions are ensured by this patch.
Co-authored-by: Frank Plowman <post@frankplowman.com>
This change will save approximately 531 MB for an 8K clip when processed with 16 threads.
The calculation is as follows:
7680 * 4320 * sizeof(int) * 2 * 2 * 16 / (4 * 4).
The deblock boundary strength stage utilizes ~5% of CPU resources for 8K clips.
It's worth considering it as a standalone stage. This stage has been relocated
to follow the parser process, allowing us to reuse CUs and TUs before releasing them.
Due to the nature of multithreading, using a "ready check" mechanism may introduce a deadlock. For example:
Suppose all tasks have been submitted to the executor, and the last thread checks the entire list and finds
no ready tasks. It then goes to sleep, waiting for a new task. However, for some multithreading-related reason,
a task becomes ready after the check. Since no other thread is aware of this and no new tasks are being added to
the executor, a deadlock occurs.
In VVC, this function is unnecessary because we use a scoreboard. All tasks submitted to the executor are ready tasks.
We still need several refactors to improve the current VVC decoder's performance,
which will frequently break the API/ABI. To mitigate this, we've copied the executor from
avutil to avcodec. Once the API/ABI is stable, we will move this class back to avutil
ff_vvc_output_frame is called before actually decoding. It's possible
for ff_vvc_output_frame to select current frame to output. If current
frame is nonref frame, it will be released by ff_vvc_unref_frame.
Fix this by always marking the current frame with
VVC_FRAME_FLAG_SHORT_REF, as is done by the HEVC decoder.
This reverts commit 110d8549d5.
I have been working through fixing bugs, particularly crashes I've
found using a fuzzer, in the VVC decoder for the past few months.
While I won't claim it is now bug-free, it is considerably more
resilient than it was and I think in a position to have the
experimental flag removed for release 7.1.
Additionally, most of the Main 10 features of VVC which were missing
version of the decoder released in 7.0 have now been implemented.
This includes the most major missing features: IBC, subpictures and RPR.
Signed-off-by: Frank Plowman <post@frankplowman.com>
Fixes:
https://fate.ffmpeg.org/report.cgi?slot=x86_64-archlinux-gcc-tsan&time=20240823175808
Reproduction steps:
./configure --enable-memory-poisoning --toolchain=gcc-tsan --disable-stripping && make fate-vvc
Root cause:
We hold the current frame's lock while updating progress for other frames,
which also requires acquiring other frame locks. This could potentially lead to a deadlock.
However, I don't think this will happen in practice because progress updates are one-way, with no cyclic dependencies.
But we need this patch to make FATE happy.
The previous logic relied on the subpicture boundaries coinciding with
the tile boundaries. Per 6.3.1 of H.266 (V3), vertical subpicture
boundaries are always tile boundaries however the same cannot be said
for horizontal subpicture boundaries. Furthermore, it is possible to
construct an illegal bitstream where vertical subpicture boundaries are
not coincident with tile boundaries. In these cases, the condition of
the while loop would never be satisfied resulting in an OOB read on
col_bd/row_bd.
Patch fixes this issue by replacing != with <, thereby not requiring
subpicture boundaries and tile boundaries to be coincident.
Signed-off-by: Frank Plowman <post@frankplowman.com>
fix
==135000== Conditional jump or move depends on uninitialised value(s)
==135000== at 0x169FF95: vvc_deblock_bs (filter.c:699)
and
==135000== Conditional jump or move depends on uninitialised value(s)
==135000== at 0x16A2E72: ff_vvc_alf_filter (filter.c:1217)
Reported-by: James Almer <jamrial@gmail.com>
From Jun Zhao <mypopydev@gmail.com>:
> Should we relocate this to the decoder? Other codecs typically set this
> parameter in the decoder.
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
memset tables in the main thread can become a bottleneck for the decoder.
For example, if it takes 1% of the processing time for one core, the maximum achievable FPS will be 100.
Move the memeset to worker threads will fix the issue.
For luma, qp can only change at the CU level, so the qp tab size is related to the CU.
For chroma, considering the joint CbCr, the QP tab size is related to the TU.