If the current stage reports an error, some variables may not be correctly initialized.
Scheduling the next stage could lead to the use of uninitialized variables.
According to section 6.3.3 "Spatial or component-wise partitionings,"
CTUs should fully cover slices with no overlaps, gaps, or additions.
No overlaps are ensured by task_init_parse.
No gaps and no additions are ensured by this patch.
Co-authored-by: Frank Plowman <post@frankplowman.com>
The deblock boundary strength stage utilizes ~5% of CPU resources for 8K clips.
It's worth considering it as a standalone stage. This stage has been relocated
to follow the parser process, allowing us to reuse CUs and TUs before releasing them.
Due to the nature of multithreading, using a "ready check" mechanism may introduce a deadlock. For example:
Suppose all tasks have been submitted to the executor, and the last thread checks the entire list and finds
no ready tasks. It then goes to sleep, waiting for a new task. However, for some multithreading-related reason,
a task becomes ready after the check. Since no other thread is aware of this and no new tasks are being added to
the executor, a deadlock occurs.
In VVC, this function is unnecessary because we use a scoreboard. All tasks submitted to the executor are ready tasks.
We still need several refactors to improve the current VVC decoder's performance,
which will frequently break the API/ABI. To mitigate this, we've copied the executor from
avutil to avcodec. Once the API/ABI is stable, we will move this class back to avutil
Fixes:
https://fate.ffmpeg.org/report.cgi?slot=x86_64-archlinux-gcc-tsan&time=20240823175808
Reproduction steps:
./configure --enable-memory-poisoning --toolchain=gcc-tsan --disable-stripping && make fate-vvc
Root cause:
We hold the current frame's lock while updating progress for other frames,
which also requires acquiring other frame locks. This could potentially lead to a deadlock.
However, I don't think this will happen in practice because progress updates are one-way, with no cyclic dependencies.
But we need this patch to make FATE happy.
memset tables in the main thread can become a bottleneck for the decoder.
For example, if it takes 1% of the processing time for one core, the maximum achievable FPS will be 100.
Move the memeset to worker threads will fix the issue.
The parser stage is not parallelizable.
We need to schedule it as soon as possible to create later stages, which are more parallelizable
clips | before | after | delta
--------------------------------------------|--------|-------|------
RitualDance_1920x1080_60_10_420_37_RA.266 | 342.7 | 365.3 | 6.59%
NovosobornayaSquare_1920x1080.bin | 321.7 | 400 | 24.34%
Tango2_3840x2160_60_10_420_27_LD.266 | 82.3 | 91.7 | 11.42%
RitualDance_1920x1080_60_10_420_32_LD.266 | 323.7 | 319.3 | -1.36%
Chimera_8bit_1080P_1000_frames.vvc | 364 | 411.3 | 12.99%
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 162.7 | 185.7 | 14.14%
For RPR, the current frame may reference a frame with a different resolution.
Therefore, we need to consider frame scaling when we wait for reference pixels.
For some error bitstreams, a CTU belongs to two slices/entry points.
If the decoder initializes and submmits the CTU task twice, it may crash the program
or cause it to enter an infinite loop.
Reported-by: Frank Plowman <post@frankplowman.com>
A namespace is unnecessary here given that all these files
are already in the vvc subfolder.
Reviewed-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-04-04 16:45:00 +02:00
Renamed from libavcodec/vvc/vvc_thread.c (Browse further)