Commit graph

212 commits

Author SHA1 Message Date
Nuo Mi
ca3550948c lavc/vvcdec: ensure slices contain nonzero CTUs
fixes https://github.com/ffvvc/tests/tree/main/fuzz/passed/000323.bit

Co-authored-by: Frank Plowman <post@frankplowman.com>
2025-01-29 18:22:41 +08:00
Nuo Mi
974d4a8f0a lavc/vvcdec: remove unneeded VVCContext->pix_fmt
AVCodecContext->sw_pix_fmt is used to hold the software pixel format.

Co-authored-by: Frank Plowman <post@frankplowman.com>
2025-01-29 18:22:41 +08:00
Nuo Mi
61ff0fac35 lavc/vvcdec: remove unneeded set_output_format
Downstream can determine the format from the output frame format

Co-authored-by: Frank Plowman <post@frankplowman.com>
2025-01-29 18:22:41 +08:00
Zhao Zhili
ea381285e7 avcodec/vvc: Add support for output_corrupt/showall flags 2025-01-19 13:30:13 +08:00
Nuo Mi
8eb1d76e14 lavc/vvc/refs: export keyframe and picture type in output frames
fixes https://trac.ffmpeg.org/ticket/11406

Co-authored-by: Ruben Gonzalez <rgonzalez@fluendo.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2025-01-13 18:05:06 -03:00
Frank Plowman
8bd66a8c95 lavc/vvc: Check slice structure
The criteria for slice structure validity is similar to that of
subpicture structure validity that we saw not too long ago [1].
The relationship between tiles and slices must satisfy the following
properties:

* Exhaustivity.  All tiles in a picture must belong to a slice.  The
  tiles cover the picture, so this implies the slices must cover the
  picture.
* Mutual exclusivity.  No tile may belong to more than one slice, i.e.
  slices may not overlap.

In most cases these properties are guaranteed by the syntax.  There is
one noticable exception however: when pps_tile_idx_delta_present_flag is
equal to one, each slice is associated with a syntax element
pps_tile_idx_delta_val[i] which "specifies the difference between the
tile index of the tile containing the first CTU in the ( i + 1 )-th
rectangular slice and the tile index of the tile containing the first
CTU in the i-th rectangular slice" [2].  When these syntax elements are
present, the i-th slice can begin anywhere and the usual guarantees
provided by the syntax are lost.

The patch detects slice structures which violate either of the two
properties above, and are therefore invalid, while building the
slice map.  Should the slice map be determined to be invalid, an
AVERROR_INVALIDDATA is returned.  This prevents issues including
segmentation faults when trying to decode,  invalid bitstreams.

[1]: https://ffmpeg.org//pipermail/ffmpeg-devel/2024-October/334470.html
[2]: H.266 (V3) Section 7.4.3.5, Picture parameter set RBSP semantics

Signed-off-by: Frank Plowman <post@frankplowman.com>
2025-01-12 13:15:06 +08:00
James Almer
d7180a3f92 avcodec/vvc/dec: print thread debug logs only if DEBUG is defined
Makes the output of a normal decoding process with loglevel debug a lot less
verbose.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-01-10 10:23:57 -03:00
Frank Plowman
539cea3183 lavc/vvc: Fix race condition for MVs cropped to subpic
When the current subpicture has sps_subpic_treated_as_pic_flag equal to
1, motion vectors are cropped such that they cannot point to other
subpictures.  This was accounted for in the prediction logic, but not
in pred_get_y, which is used by the scheduling logic to determine which
parts of the reference pictures must have been reconstructed before
inter prediction of a subsequent frame may begin.  Consequently, where a
motion vector pointed to a location significantly above the current
subpicture, there was the possibility of a race condition.  Patch fixes
this by cropping the motion vector to the current subpicture in
pred_get_y.

Signed-off-by: Frank Plowman <post@frankplowman.com>
2025-01-05 20:25:29 +08:00
Chris Warrington
f80af3657f avcodec/vvc decode: ALF filtering without CC-ALF
When a stream has ALF filtering enabled but not CC-ALF, the CC-ALF set indexes alf->ctb_cc_idc are being read uninitialized during ALF filtering.

This change initializes alf->ctb_cc_idc whenever ALF is enabled.

Ref. https://trac.ffmpeg.org/ticket/11325
2025-01-05 18:00:18 +08:00
Anton Khirnov
56ba57b672 lavc/refstruct: move to lavu and make public
It is highly versatile and generally useful.
2024-12-15 14:03:47 +01:00
Frank Plowman
8629306627 lavc/vvc: Fix scaling matrix DC coef derivation
In 7.4.3.20 of H.266 (V3), there are two similarly-named variables:
scalingMatrixDcPred and ScalingMatrixDcRec.  The old code set
ScalingMatrixDcRec, rather than scalingMatrixDcPred, in the first two
branches of the conditions on scaling_list_copy_mode_flag[id] and
aps->scaling_list_pred_mode_flag[id].  This could lead to decode
mismatches in sequences with explicitly-signalled scaling lists.

Signed-off-by: Frank Plowman <post@frankplowman.com>
2024-12-10 20:26:12 +08:00
Frank Plowman
34c6ad0a07 lavc/vvc: Use a bitfield to store MIP information
Reduces memory consumption by ~4MB for 1080p video with a maximum delay
of 16 frames by packing various information related to MIP:
* intra_mip_flag, 1 bit
* intra_mip_transposed_flag, 1 bit
* intra_mip_mode, 4 bits
Into a single byte.

Co-authored-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: Frank Plowman <post@frankplowman.com>
2024-12-07 17:37:45 +08:00
Frank Plowman
56419fd096 lavc/vvc: Fix overflow in MVD derivation
H.266 (V3) section 7.4.12.8: "The value of lMvd[ compIdx ] shall be in
the range of −2^{17} to 2^{17} − 1, inclusive."

Signed-off-by: Frank Plowman <post@frankplowman.com>
2024-12-03 10:22:55 +08:00
Frank Plowman
499896ca2f lavc/vvc: Fix derivation of LmcsMaxBinIdx
Per H.266 (V3) section 7.4.3.19, LmcsMaxBinIdx is set equal to
15 - lmcs_delta_max_bin_idx.  The previous code instead had it equal to
15 - lmcs_min_bin_idx.  This could cause decoder mismatches.

Signed-off-by: Frank Plowman <post@frankplowman.com>
2024-12-03 10:22:55 +08:00
Frank Plowman
699322519c lavc/vvc: Store MIP information over entire CU area
Previously, the code only stored the MIP mode and transpose flag in the
relevant tables at the top-left corner of the CU.  This information ends
up being retrieved in ff_vvc_intra_pred_* not based on the CU position
but instead the transform unit position (specifically, using the x0 and
y0 from get_luma_predict_unit).  There might be multiple transform units
in a CU, hence the top-left corner of the transform unit might not
coincide with the top-left corner of the CU.  Consequently, we need to
store the MIP information at all positions in the CU, not only its
top-left corner, as we already do for the MIP flag.

Signed-off-by: Frank Plowman <post@frankplowman.com>
2024-12-03 10:20:51 +08:00
Frank Plowman
7399d9f374 lavc/vvc: Don't check motion estimation region for IBC
The final parameter of check_available determines whether the motion
estimation region constraints imposed in section 8.5.2.3 of H.266 (V3)
on MVP candidates apply to the current candidate or not.  In the case of
IBC spatial merge candidates they do not, as their availability is
dependent only on the criteria described in sections 8.6.2.3 and 6.4.4,
which do not include this constraint on the motion estimation region.

Signed-off-by: Frank Plowman <post@frankplowman.com>
2024-12-03 10:20:51 +08:00
Nuo Mi
4de67e8746 avcodec/vvcdec: return error if CTU size > 128
The v3 spec reserves CTU size 256. Currently, we use an uint8_t* table to hold cb_width and cb_height.
If a CTU size of 256 is not split, cb_width and cb_height will overflow to 0.
To avoid switching to uint16_t, rejecting CTU size 256 provides a simple solution.
2024-11-30 09:58:59 +08:00
Nuo Mi
eb67e60cb0 avcodec/vvcdec: schedule next stage only if the current stage reports no error
If the current stage reports an error, some variables may not be correctly initialized.
Scheduling the next stage could lead to the use of uninitialized variables.
2024-11-30 09:58:59 +08:00
Nuo Mi
4ec767abcc avcodec/vvcdec: misc, reformat inter_data() 2024-11-30 09:58:59 +08:00
Nuo Mi
ba89c5b989 avcodec/vvcdec: inter_data, check the return value from hls_merge_data
Reported-by: Frank Plowman <post@frankplowman.com>
2024-11-30 09:58:59 +08:00
Nuo Mi
5c5a08ecb5 avcodec/vvcdec: ensure every CTU belongs to a slice
According to section 6.3.3 "Spatial or component-wise partitionings,"
CTUs should fully cover slices with no overlaps, gaps, or additions.
No overlaps are ensured by task_init_parse.
No gaps and no additions are ensured by this patch.

Co-authored-by: Frank Plowman <post@frankplowman.com>
2024-11-30 09:58:59 +08:00
Frank Plowman
1e5f24d1a6 lavc/vvc: Remove floating point logic
This was the only floating point logic in the native VVC decoder.

Signed-off-by: Frank Plowman <post@frankplowman.com>
2024-11-11 19:31:00 +08:00
Fei Wang
e726fdeb05 lavc/vaapi_dec: Add VVC decoder
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-11-01 12:13:07 +08:00
Fei Wang
4dc18c78cd lavc/vvc_dec: Add hardware decode API
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-11-01 12:13:07 +08:00
Fei Wang
a94aa2d61e lavc/vvc_ps: Add alf raw syntax into VVCALF
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-11-01 12:13:07 +08:00
Fei Wang
15a75e8e04 lavc/vvc_refs: Define VVC_FRAME_FLAG* to h header
So that hardware decoder can use the flags too.

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-11-01 12:13:07 +08:00
Nuo Mi
b611410569 avcodec/vvc/thread: Check frame to be non NULL
Fixes: NULL pointer dereference
Fixes: 71303/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VVC_fuzzer-4875859050168320

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reported-by: Michael Niedermayer <michael@niedermayer.cc>
2024-10-20 20:36:15 +08:00
Nuo Mi
a144e7b92e avcodec/vvcdec: remove unused tb_pos_x0 and tb_pos_y0
This change will save approximately 531 MB for an 8K clip when processed with 16 threads.
The calculation is as follows:
7680 * 4320 * sizeof(int) * 2 * 2 * 16 / (4 * 4).
2024-10-16 20:28:09 +08:00
Nuo Mi
2e936f2c11 avcodec/vvdec: refact, ff_vvc_deblock_bs use CodingUnit/TransformUnit instead of fc->tabs
perf result for:
"perf record -F 99 ./ffmpeg_g -i  Tango2_3840x2160_60_10_420_27_LD.266 -f null -"

before: 5.24%
1.87%  ffmpeg_g  [.] vvc_deblock_bs_chroma
1.72%  ffmpeg_g  [.] ff_vvc_deblock_bs
1.65%  ffmpeg_g  [.] vvc_deblock_bs_luma

after: 3.48%
1.84%  ffmpeg_g  [.] vvc_deblock_bs_chroma
1.64%  ffmpeg_g  [.] ff_vvc_deblock_bs + vvc_deblock_bs_luma(inlined)
2024-10-16 20:28:09 +08:00
Nuo Mi
d78b43ecf8 avcodec/vvcdec: misc, move pcmf from min_tu_tl_init to min_cb_nz_tl_init
pcmf are cu level flags
2024-10-16 20:28:09 +08:00
Nuo Mi
634780f3cf avcodec/vvcdec: refact out deblock boundary strength stage
The deblock boundary strength stage utilizes ~5% of CPU resources for 8K clips.
It's worth considering it as a standalone stage. This stage has been relocated
to follow the parser process, allowing us to reuse CUs and TUs before releasing them.
2024-10-16 20:28:09 +08:00
Nuo Mi
846fbc395b avcodec/vvc: simplify priority logical to improve performance for 4K/8K
For 4K/8K video processing, it's possible to have over 1,000 tasks pending on the executor.
In such cases, O(n) and O(log(n)) insertion times are too costly.
Reducing this to O(1) will significantly decrease the time spent in critical sections

clip                                                        | before | after  | delta
------------------------------------------------------------|--------|--------|-------
VVC_HDR_UHDTV2_OpenGOP_7680x4320_50fps_HLG10.bit            |    24  |   27   |  12.5%
VVC_HDR_UHDTV2_OpenGOP_7680x4320_50fps_HLG10_HighBitrate.bit|    12  |   17   |  41.7%
tears_of_steel_4k_8M_8bit_2000.vvc                          |    34  |  102   | 200.0%
VVC_UHDTV1_OpenGOP_3840x2160_60fps_HLG10.bit                |   126  |  128   |   1.6%
RitualDance_1920x1080_60_10_420_37_RA.266                   |   350  |  378   |   8.0%
NovosobornayaSquare_1920x1080.bin                           |   341  |  369   |   8.2%
Tango2_3840x2160_60_10_420_27_LD.266                        |    69  |   70   |   1.4%
RitualDance_1920x1080_60_10_420_32_LD.266                   |   243  |  259   |   6.6%
Chimera_8bit_1080P_1000_frames.vvc                          |   420  |  392   |  -6.7%
BQTerrace_1920x1080_60_10_420_22_RA.vvc                     |   148  |  144   |  -2.7%
2024-10-04 21:58:42 +08:00
Nuo Mi
40a14ef970 avcodec/executor: remove unused ready callback
Due to the nature of multithreading, using a "ready check" mechanism may introduce a deadlock. For example:

Suppose all tasks have been submitted to the executor, and the last thread checks the entire list and finds
no ready tasks. It then goes to sleep, waiting for a new task. However, for some multithreading-related reason,
a task becomes ready after the check. Since no other thread is aware of this and no new tasks are being added to
the executor, a deadlock occurs.

In VVC, this function is unnecessary because we use a scoreboard. All tasks submitted to the executor are ready tasks.
2024-10-04 21:58:42 +08:00
Nuo Mi
8446e27bf3 avcodec: make a local copy of executor
We still need several refactors to improve the current VVC decoder's performance,
which will frequently break the API/ABI. To mitigate this, we've copied the executor from
avutil to avcodec. Once the API/ABI is stable, we will move this class back to avutil
2024-10-04 21:58:42 +08:00
Zhao Zhili
240c16bbc6 avcodec/vvc: Don't use large array on stack
tmp_array in dmvr_hv takes 33024 bytes on stack, which can be
dangerous.
2024-10-01 11:30:22 +08:00
sunyuechi
ba7d0d5fc3 lavc/vvc_mc: R-V V avg w_avg
C908   X60
avg_8_2x2_c                                        :    1.2    1.0
avg_8_2x2_rvv_i32                                  :    0.7    0.7
avg_8_2x4_c                                        :    2.0    2.2
avg_8_2x4_rvv_i32                                  :    1.2    1.2
avg_8_2x8_c                                        :    3.7    4.0
avg_8_2x8_rvv_i32                                  :    1.7    1.5
avg_8_2x16_c                                       :    7.2    7.7
avg_8_2x16_rvv_i32                                 :    3.0    2.7
avg_8_2x32_c                                       :   14.2   15.2
avg_8_2x32_rvv_i32                                 :    5.5    5.0
avg_8_2x64_c                                       :   51.0   43.7
avg_8_2x64_rvv_i32                                 :   39.2   29.7
avg_8_2x128_c                                      :  100.5   79.2
avg_8_2x128_rvv_i32                                :   79.7   68.2
avg_8_4x2_c                                        :    1.7    2.0
avg_8_4x2_rvv_i32                                  :    1.0    0.7
avg_8_4x4_c                                        :    3.5    3.7
avg_8_4x4_rvv_i32                                  :    1.2    1.2
avg_8_4x8_c                                        :    6.7    7.0
avg_8_4x8_rvv_i32                                  :    1.7    1.5
avg_8_4x16_c                                       :   13.5   14.0
avg_8_4x16_rvv_i32                                 :    3.0    2.7
avg_8_4x32_c                                       :   26.2   27.7
avg_8_4x32_rvv_i32                                 :    5.5    4.7
avg_8_4x64_c                                       :   73.0   73.7
avg_8_4x64_rvv_i32                                 :   39.0   32.5
avg_8_4x128_c                                      :  143.0  137.2
avg_8_4x128_rvv_i32                                :   72.7   68.0
avg_8_8x2_c                                        :    3.5    3.5
avg_8_8x2_rvv_i32                                  :    1.0    0.7
avg_8_8x4_c                                        :    6.2    6.5
avg_8_8x4_rvv_i32                                  :    1.5    1.0
avg_8_8x8_c                                        :   12.7   13.2
avg_8_8x8_rvv_i32                                  :    2.0    1.5
avg_8_8x16_c                                       :   25.0   26.5
avg_8_8x16_rvv_i32                                 :    3.2    2.7
avg_8_8x32_c                                       :   50.0   52.7
avg_8_8x32_rvv_i32                                 :    6.2    5.0
avg_8_8x64_c                                       :  118.7  122.5
avg_8_8x64_rvv_i32                                 :   40.2   31.5
avg_8_8x128_c                                      :  236.7  220.2
avg_8_8x128_rvv_i32                                :   85.2   67.7
avg_8_16x2_c                                       :    6.2    6.7
avg_8_16x2_rvv_i32                                 :    1.2    0.7
avg_8_16x4_c                                       :   12.5   13.0
avg_8_16x4_rvv_i32                                 :    1.7    1.0
avg_8_16x8_c                                       :   24.5   26.0
avg_8_16x8_rvv_i32                                 :    3.0    1.7
avg_8_16x16_c                                      :   49.0   51.5
avg_8_16x16_rvv_i32                                :    5.5    3.0
avg_8_16x32_c                                      :   97.5  102.5
avg_8_16x32_rvv_i32                                :   10.5    5.5
avg_8_16x64_c                                      :  213.7  222.0
avg_8_16x64_rvv_i32                                :   48.5   34.2
avg_8_16x128_c                                     :  434.7  420.0
avg_8_16x128_rvv_i32                               :   97.7   74.0
avg_8_32x2_c                                       :   12.2   12.7
avg_8_32x2_rvv_i32                                 :    1.5    1.0
avg_8_32x4_c                                       :   24.5   25.5
avg_8_32x4_rvv_i32                                 :    3.0    1.7
avg_8_32x8_c                                       :   48.5   50.7
avg_8_32x8_rvv_i32                                 :    5.2    2.7
avg_8_32x16_c                                      :   96.7  101.2
avg_8_32x16_rvv_i32                                :   10.2    5.0
avg_8_32x32_c                                      :  192.7  202.2
avg_8_32x32_rvv_i32                                :   19.7    9.5
avg_8_32x64_c                                      :  427.5  426.5
avg_8_32x64_rvv_i32                                :   64.2   18.2
avg_8_32x128_c                                     :  816.5  821.0
avg_8_32x128_rvv_i32                               :  135.2   75.5
avg_8_64x2_c                                       :   24.0   25.2
avg_8_64x2_rvv_i32                                 :    2.7    1.5
avg_8_64x4_c                                       :   48.2   50.5
avg_8_64x4_rvv_i32                                 :    5.0    2.7
avg_8_64x8_c                                       :   96.0  100.7
avg_8_64x8_rvv_i32                                 :    9.7    4.5
avg_8_64x16_c                                      :  207.7  201.2
avg_8_64x16_rvv_i32                                :   19.0    9.0
avg_8_64x32_c                                      :  383.2  402.0
avg_8_64x32_rvv_i32                                :   37.5   17.5
avg_8_64x64_c                                      :  837.2  828.7
avg_8_64x64_rvv_i32                                :   84.7   35.5
avg_8_64x128_c                                     : 1640.7 1640.2
avg_8_64x128_rvv_i32                               :  206.0  153.0
avg_8_128x2_c                                      :   48.7   51.0
avg_8_128x2_rvv_i32                                :    5.2    2.7
avg_8_128x4_c                                      :   96.7  101.5
avg_8_128x4_rvv_i32                                :   10.2    5.0
avg_8_128x8_c                                      :  192.2  202.0
avg_8_128x8_rvv_i32                                :   19.7    9.2
avg_8_128x16_c                                     :  400.7  403.2
avg_8_128x16_rvv_i32                               :   38.7   18.5
avg_8_128x32_c                                     :  786.7  805.7
avg_8_128x32_rvv_i32                               :   77.0   36.2
avg_8_128x64_c                                     : 1615.5 1655.5
avg_8_128x64_rvv_i32                               :  189.7   80.7
avg_8_128x128_c                                    : 3182.0 3238.0
avg_8_128x128_rvv_i32                              :  397.5  308.5
w_avg_8_2x2_c                                      :    1.7    1.2
w_avg_8_2x2_rvv_i32                                :    1.2    1.0
w_avg_8_2x4_c                                      :    2.7    2.7
w_avg_8_2x4_rvv_i32                                :    1.7    1.5
w_avg_8_2x8_c                                      :   21.7    4.7
w_avg_8_2x8_rvv_i32                                :    2.7    2.5
w_avg_8_2x16_c                                     :    9.5    9.2
w_avg_8_2x16_rvv_i32                               :    4.7    4.2
w_avg_8_2x32_c                                     :   19.0   18.7
w_avg_8_2x32_rvv_i32                               :    9.0    8.0
w_avg_8_2x64_c                                     :   62.0   50.2
w_avg_8_2x64_rvv_i32                               :   47.7   33.5
w_avg_8_2x128_c                                    :  116.7   87.7
w_avg_8_2x128_rvv_i32                              :   80.0   69.5
w_avg_8_4x2_c                                      :    2.5    2.5
w_avg_8_4x2_rvv_i32                                :    1.2    1.0
w_avg_8_4x4_c                                      :    4.7    4.5
w_avg_8_4x4_rvv_i32                                :    1.7    1.7
w_avg_8_4x8_c                                      :    9.0    8.7
w_avg_8_4x8_rvv_i32                                :    2.7    2.5
w_avg_8_4x16_c                                     :   17.7   17.5
w_avg_8_4x16_rvv_i32                               :    4.7    4.2
w_avg_8_4x32_c                                     :   35.0   35.0
w_avg_8_4x32_rvv_i32                               :    9.0    8.0
w_avg_8_4x64_c                                     :  100.5   84.5
w_avg_8_4x64_rvv_i32                               :   42.2   33.7
w_avg_8_4x128_c                                    :  203.5  151.2
w_avg_8_4x128_rvv_i32                              :   83.0   69.5
w_avg_8_8x2_c                                      :    4.5    4.5
w_avg_8_8x2_rvv_i32                                :    1.2    1.2
w_avg_8_8x4_c                                      :    8.7    8.7
w_avg_8_8x4_rvv_i32                                :    2.0    1.7
w_avg_8_8x8_c                                      :   17.0   17.0
w_avg_8_8x8_rvv_i32                                :    3.2    2.5
w_avg_8_8x16_c                                     :   34.0   33.5
w_avg_8_8x16_rvv_i32                               :    5.5    4.2
w_avg_8_8x32_c                                     :   86.0   67.5
w_avg_8_8x32_rvv_i32                               :   10.5    8.0
w_avg_8_8x64_c                                     :  187.2  149.5
w_avg_8_8x64_rvv_i32                               :   45.0   35.5
w_avg_8_8x128_c                                    :  342.7  290.0
w_avg_8_8x128_rvv_i32                              :  108.7   70.2
w_avg_8_16x2_c                                     :    8.5    8.2
w_avg_8_16x2_rvv_i32                               :    2.0    1.2
w_avg_8_16x4_c                                     :   16.7   16.7
w_avg_8_16x4_rvv_i32                               :    3.0    1.7
w_avg_8_16x8_c                                     :   33.2   33.5
w_avg_8_16x8_rvv_i32                               :    5.5    3.0
w_avg_8_16x16_c                                    :   66.2   66.7
w_avg_8_16x16_rvv_i32                              :   10.5    5.0
w_avg_8_16x32_c                                    :  132.5  131.0
w_avg_8_16x32_rvv_i32                              :   20.0    9.7
w_avg_8_16x64_c                                    :  340.0  283.5
w_avg_8_16x64_rvv_i32                              :   60.5   37.2
w_avg_8_16x128_c                                   :  641.2  597.5
w_avg_8_16x128_rvv_i32                             :  118.7   77.7
w_avg_8_32x2_c                                     :   16.5   16.7
w_avg_8_32x2_rvv_i32                               :    3.2    1.7
w_avg_8_32x4_c                                     :   33.2   33.2
w_avg_8_32x4_rvv_i32                               :    5.5    2.7
w_avg_8_32x8_c                                     :   66.0   62.5
w_avg_8_32x8_rvv_i32                               :   10.5    5.0
w_avg_8_32x16_c                                    :  131.5  132.0
w_avg_8_32x16_rvv_i32                              :   20.2    9.5
w_avg_8_32x32_c                                    :  261.7  272.0
w_avg_8_32x32_rvv_i32                              :   39.7   18.0
w_avg_8_32x64_c                                    :  575.2  545.5
w_avg_8_32x64_rvv_i32                              :  105.5   58.7
w_avg_8_32x128_c                                   : 1154.2 1088.0
w_avg_8_32x128_rvv_i32                             :  207.0   98.0
w_avg_8_64x2_c                                     :   33.0   33.0
w_avg_8_64x2_rvv_i32                               :    6.2    2.7
w_avg_8_64x4_c                                     :   65.5   66.0
w_avg_8_64x4_rvv_i32                               :   11.5    5.0
w_avg_8_64x8_c                                     :  131.2  132.5
w_avg_8_64x8_rvv_i32                               :   22.5    9.5
w_avg_8_64x16_c                                    :  268.2  262.5
w_avg_8_64x16_rvv_i32                              :   44.2   18.0
w_avg_8_64x32_c                                    :  561.5  528.7
w_avg_8_64x32_rvv_i32                              :   88.0   35.2
w_avg_8_64x64_c                                    : 1136.2 1124.0
w_avg_8_64x64_rvv_i32                              :  222.0   82.2
w_avg_8_64x128_c                                   : 2345.0 2312.7
w_avg_8_64x128_rvv_i32                             :  423.0  190.5
w_avg_8_128x2_c                                    :   65.7   66.5
w_avg_8_128x2_rvv_i32                              :   11.2    5.5
w_avg_8_128x4_c                                    :  131.2  132.2
w_avg_8_128x4_rvv_i32                              :   22.0   10.2
w_avg_8_128x8_c                                    :  263.5  312.0
w_avg_8_128x8_rvv_i32                              :   43.2   19.7
w_avg_8_128x16_c                                   :  528.7  526.2
w_avg_8_128x16_rvv_i32                             :   85.5   39.5
w_avg_8_128x32_c                                   : 1067.7 1062.7
w_avg_8_128x32_rvv_i32                             :  171.7   78.2
w_avg_8_128x64_c                                   : 2234.7 2168.7
w_avg_8_128x64_rvv_i32                             :  400.0  159.0
w_avg_8_128x128_c                                  : 4752.5 4295.0
w_avg_8_128x128_rvv_i32                            :  757.7  365.5

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-09-24 20:04:51 +03:00
Zhao Zhili
5c66a3ab51 avcodec/vvc: Fix output and unref a frame which isn't decoding yet
ff_vvc_output_frame is called before actually decoding. It's possible
for ff_vvc_output_frame to select current frame to output. If current
frame is nonref frame, it will be released by ff_vvc_unref_frame.

Fix this by always marking the current frame with
VVC_FRAME_FLAG_SHORT_REF, as is done by the HEVC decoder.
2024-09-15 16:42:14 +08:00
Frank Plowman
6df0c5f9f4 lavc/vvc: Remove experimental flag
This reverts commit 110d8549d5.

I have been working through fixing bugs, particularly crashes I've
found using a fuzzer, in the VVC decoder for the past few months.
While I won't claim it is now bug-free, it is considerably more
resilient than it was and I think in a position to have the
experimental flag removed for release 7.1.

Additionally, most of the Main 10 features of VVC which were missing
version of the decoder released in 7.0 have now been implemented.
This includes the most major missing features: IBC, subpictures and RPR.

Signed-off-by: Frank Plowman <post@frankplowman.com>
2024-09-06 22:14:52 +08:00
Nuo Mi
3d2fafa229 avcodec/vvcdec: fix potential deadlock in report_frame_progress
Fixes:
https://fate.ffmpeg.org/report.cgi?slot=x86_64-archlinux-gcc-tsan&time=20240823175808

Reproduction steps:
./configure --enable-memory-poisoning --toolchain=gcc-tsan --disable-stripping && make fate-vvc

Root cause:
We hold the current frame's lock while updating progress for other frames,
which also requires acquiring other frame locks. This could potentially lead to a deadlock.
However, I don't think this will happen in practice because progress updates are one-way, with no cyclic dependencies.
But we need this patch to make FATE happy.
2024-09-03 21:32:27 +08:00
Frank Plowman
54291f4383 lavc/vvc: Fix assertion bound on qPy_{a,b}
Signed-off-by: Frank Plowman <post@frankplowman.com>
2024-09-03 20:57:52 +08:00
Frank Plowman
01701bdcd5 lavc/vvc: Prevent OOB access in subpic_tiles
The previous logic relied on the subpicture boundaries coinciding with
the tile boundaries.  Per 6.3.1 of H.266 (V3), vertical subpicture
boundaries are always tile boundaries however the same cannot be said
for horizontal subpicture boundaries.  Furthermore, it is possible to
construct an illegal bitstream where vertical subpicture boundaries are
not coincident with tile boundaries.  In these cases, the condition of
the while loop would never be satisfied resulting in an OOB read on
col_bd/row_bd.

Patch fixes this issue by replacing != with <, thereby not requiring
subpicture boundaries and tile boundaries to be coincident.

Signed-off-by: Frank Plowman <post@frankplowman.com>
2024-08-31 15:05:23 +08:00
Nuo Mi
b2eabe0ff2 avcodec/vvcdec: format, fix indent for vvc_deblock_bs 2024-08-31 14:16:19 +08:00
Nuo Mi
7bd22342c3 avcodec/vvcdec: filter, fix uninitialized variables for YUV400 format
fix
==135000== Conditional jump or move depends on uninitialised value(s)
==135000==    at 0x169FF95: vvc_deblock_bs (filter.c:699)
and
==135000== Conditional jump or move depends on uninitialised value(s)
==135000==    at 0x16A2E72: ff_vvc_alf_filter (filter.c:1217)

Reported-by: James Almer <jamrial@gmail.com>
2024-08-31 14:16:19 +08:00
Nuo Mi
f851abb4b3 avcodec/vvcdec: bdof, do not pad sources and gradients to simplify the code 2024-08-31 13:57:51 +08:00
Nuo Mi
8347def797 avcodec/vvcdec: misc, rename BDOF_BLOCK_SIZE to BDOF_MIN_BLOCK_SIZE 2024-08-31 13:57:51 +08:00
Wu Jianhua
ca5c9e810a avcodec/vvc/dsp: prefix TxType and TxSize with VVC
See https://patchwork.ffmpeg.org/project/ffmpeg/patch/TYSPR06MB64337C4A9ADF5312E6648543AA62A@TYSPR06MB6433.apcprd06.prod.outlook.com/#81892

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
2024-08-15 20:52:14 +08:00
Wu Jianhua
ae1a9cfd52 avcodec/vvc_parser: move avctx->has_b_frames initialization to dec
From Jun Zhao <mypopydev@gmail.com>:
> Should we relocate this to the decoder? Other codecs typically set this
> parameter in the decoder.

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
2024-08-15 20:50:24 +08:00
Nuo Mi
80af195804 avcodec/vvcdec: move frame tab memset from the main thread to worker threads
memset tables in the main thread can become a bottleneck for the decoder.
For example, if it takes 1% of the processing time for one core, the maximum achievable FPS will be 100.
Move the memeset to worker threads will fix the issue.
2024-08-15 20:33:57 +08:00
Nuo Mi
daf6fcd816 avcodec/vvcdec: do not zero frame qp table
For luma, qp can only change at the CU level, so the qp tab size is related to the CU.
For chroma, considering the joint CbCr, the QP tab size is related to the TU.
2024-08-15 20:33:57 +08:00
Nuo Mi
ca2caeb21d avcodec/vvcdec: do not zero frame msf mmi table 2024-08-15 20:33:57 +08:00