Commit graph

50170 commits

Author SHA1 Message Date
Rémi Denis-Courmont
fa3b153cb1 lavc/vp7dsp: R-V V vp7_idct_add
Most of the code is shared with DC, thanks to minor earlier changes.

vp7_idct_add_c:       5.2
vp7_idct_add_rvv_i32: 2.5
2024-05-29 16:57:02 +03:00
Rémi Denis-Courmont
4a0e629b6f lavc/vp7dsp: revector ff_vp7_dc_wht_rvv
This prepares for some code reuse.
2024-05-29 16:57:02 +03:00
Rémi Denis-Courmont
fd39997f72 lavc/vp7dsp: add R-V V vp7_luma_dc_wht
This works out a bit more favourably than VP8's due to:
- additional multiplications that can be vectored,
- hardware-supported fixed-point rounding mode.

vp7_luma_dc_wht_c:       3.2
vp7_luma_dc_wht_rvv_i64: 2.0
2024-05-29 16:57:02 +03:00
Rémi Denis-Courmont
91b5ea7bb9 lavc/vp8dsp: R-V V vp8_luma_dc_wht
This is not great as transposition is poorly supported, but it works:
vp8_luma_dc_wht_c:       2.5
vp8_luma_dc_wht_rvv_i32: 1.7
2024-05-29 16:57:02 +03:00
Rémi Denis-Courmont
c53d42380d lavc/lpc: optimise RVV vector type for compute_autocorr
On SpacemiT X60 (with len == 4000):
autocorr_10_c:       2303.7
autocorr_10_rvv_f64: 1411.5 (before)
autocorr_10_rvv_f64:  842.2 (after)
2024-05-29 16:57:02 +03:00
Stone Chen
55e9c758f0 libavcode/x86/vvc: change label to vvc_sad_16 to reflect block sizes
According to the VVC specification (section 8.5.1), the maximum width/height of a subblock passed for DMVR SAD is 16. This along with previous constraint requiring width * height >= 128 means that  8x16, 16x8, and 16x16 are the only allowed sizes. This re-labels vvc_sad_16_128 to vvc_sad_16 to reflect this and adds a comment about the block size constraints. There's no functionality change.
2024-05-29 21:35:34 +08:00
David Rosca
510494760c lavc/vaapi_h264: Fix merging fields in DPB with missing references
If there are missing references, h264 decode does error concealment
by copying previous refs which means there will be duplicated surfaces.
Check long_ref and frame_idx in addition to surface when looking for
the other field to avoid trying to merge with wrong picture.
Also allow to merge with multiple pictures in case there are duplicates
of the other field.

Signed-off-by: David Rosca <nowrep@gmail.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2024-05-29 10:52:10 +08:00
David Rosca
d2d911eb9a lavc/vaapi_av1: Avoid sending the same slice buffer multiple times
When there are multiple tiles in one slice buffer, use multiple slice
params to avoid sending the same slice buffer multiple times and thus
increasing the bitstream size the driver will need to upload to hw.

Reviewed-by: Neal Gompa <ngompa13@gmail.com>
Signed-off-by: David Rosca <nowrep@gmail.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2024-05-29 10:49:35 +08:00
David Rosca
fe9d889dcd lavc/vaapi_decode: Make it possible to send multiple slice params buffers
Reviewed-by: Neal Gompa <ngompa13@gmail.com>
Signed-off-by: David Rosca <nowrep@gmail.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2024-05-29 10:47:43 +08:00
Haihao Xiang
c872ba5899 lavc/qsvenc: respect user's setting for keyframes
For example:
./ffmpeg -hwaccel qsv -i input.mp4 -force_key_frames:v source -c:v
hevc_qsv -f null -

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2024-05-29 10:46:54 +08:00
Haihao Xiang
dbdd9ccded lavc/qsvdec: fix keyframes
MFX_FRAMETYPE_IDR is ORed to the frame type for AVC and HEVC keyframes,
and MFX_FRAMETYPE_I is taken as keyframe flag for other codecs when
getting the output surface from the SDK, hence we may mark the output
frame as keyframe accordingly.

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2024-05-29 10:46:54 +08:00
Rémi Denis-Courmont
a11122f9c6 lavc/vp8dsp: save one R-V GPR
This saves one instruction and frees up A5, which will be repurposed in
later changes. Unfortunately, we need to add quite a lot of alternative
code for this.
2024-05-28 19:44:11 +03:00
Rémi Denis-Courmont
4e56455d36 lavc/vp8dsp: avoid one multiplication on RISC-V
Use shifts rather than multiply, and save one instruction.
2024-05-28 19:44:11 +03:00
Rémi Denis-Courmont
0aad5b9bf5 lavc/vp8dsp: factor R-V V bilin functions
For a given type, only the first VSETVLI instruction varies depending
on the size.
2024-05-28 19:44:11 +03:00
Rémi Denis-Courmont
b248d7c319 lavc/sbrdsp: fold immediate offset into relocation
This results in AUIPC; ADDI instead of AUIPC; ADDI; ... ADDI.
2024-05-28 19:44:11 +03:00
Rémi Denis-Courmont
8444115262 lavc/startcode: fix RVV return value on no match
If there are no zero bytes, t2 equals -1. The code cannot simply fall
through to the match case.
2024-05-28 19:43:40 +03:00
Rémi Denis-Courmont
af20fb9c4e lavc/lpc: fix off-by-one in R-V V compute_autocorr 2024-05-28 19:43:40 +03:00
Niklas Haas
9fd88bd092 avcodec/h2645_sei: loosen up min luminance requirements
The H.265 specification is quite clear on this case:

> When min_display_mastering_luminance is not in the range of 1 to
> 50000, the nominal maximum display luminance of the mastering display
> is unknown or unspecified or specified by other means not specified in
> this Specification.

And so the current code is correct in marking luminance data as invalid
if min luminance is set to 0. However, this breaks playback of at least
several real-world Blu-ray releases, for example La La Land, Planet of
the Apes, and quite possibly a lot more. These come with ostensibly
valid max_luminance tags (1000 nits), but min_luminance set to 0.

Loosen up this requirement by guarding it behind FF_COMPLIANCE_STRICT.
We still reject blatantly invalid metadata (wrong value range on
luminance, max set to 0, max below min, min above 50 nits etc.), so this
shouldn't cause any unintended regressions.

Fixes: https://github.com/mpv-player/mpv/issues/14177
2024-05-28 18:11:57 +02:00
Michael Niedermayer
62d7106c36
avcodec/vlc: Cleanup on multi table alloc failure in ff_vlc_init_multi_from_lengths()
Fixes: CID1544630 Resource leak

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-28 03:48:08 +02:00
Michael Niedermayer
d5cc21741b
avcodec/vc1_block: remove unneeded store to off in vc1_decode_p_mb_intfi()
Found while reviewing code related to coverity

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-28 03:48:08 +02:00
Michael Niedermayer
992b28f572
avcodec/vc1_block: remove unused off from vc1_decode_p_mb_intfr()
Fixes: CID1435166 Unused value
Fixes: CID1529221 Unused value

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-28 03:48:08 +02:00
Michael Niedermayer
a287f17db2
avcodec/tiff: Assert init_get_bits8() success in unpack_gray()
Helps: CID1441939 Unchecked return value

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-28 03:48:07 +02:00
Michael Niedermayer
8814cedb07
avcodec/tiff: Assert init_get_bits8() success in horizontal_fill()
Helps: CID1441167 Unchecked return value

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-28 03:48:07 +02:00
Michael Niedermayer
c841cb45e8
qsv: Initialize impl_value
Fixes: The warnings from CID1598553 Uninitialized scalar variable

Passing partly initialized structs is ugly and asking for hard to rieproduce bugs,
The uninitialized fields where not used

Reviewed-by: "Xiang, Haihao" <haihao.xiang-at-intel.com@ffmpeg.org>
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-28 03:48:04 +02:00
Michael Niedermayer
e7775973f0
avcodec/tests/bitstream_template: Assert bits_init8() return
Helps: CID1518967 Unchecked return value
Helps: CID1518968 Unchecked return value

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-28 03:48:03 +02:00
Rémi Denis-Courmont
a535ce2ac0 lavc/flacdsp: R-V Zvl256b lpc33
flac_lpc_33_13_c: 499.7
flac_lpc_33_13_rvv_i64: 197.7
flac_lpc_33_16_c: 601.5
flac_lpc_33_16_rvv_i64: 195.2
flac_lpc_33_29_c: 1011.5
flac_lpc_33_29_rvv_i64: 300.7
flac_lpc_33_32_c: 1099.0
flac_lpc_33_32_rvv_i64: 296.7
2024-05-27 22:07:29 +03:00
Rémi Denis-Courmont
5ebb071d79 lavc/vp8dsp: disable EPEL HV on RV128
RV128 is mostly scifi at this point, so we can just disable it here
(the EPEL HV prologue/epilogue do not save 128-bit registers).
2024-05-27 22:07:29 +03:00
Diego Felix de Souza
aead61451c avcodec/nvenc_av1: Correct CQ range for AV1
The Constant Quality (CQ) range for the AV1 codec is actually 0 to
63, contrary to what is stated in the header and documentation.

Signed-off-by: Diego Felix de Souza <ddesouza@nvidia.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2024-05-27 19:20:18 +02:00
Frank Plowman
49c3918c1a lavc/vvc: Validate temporal MVP references
Per VVCv3 p. 157, the collocated reference picture used in temporal
motion vector prediction must have RprConstraintsActiveFlag equal to
zero and the same CTU size as the current picture.  Add these checks,
fixing crashes decoding some fuzzed bitstreams.

Additionally, only set up the collocated reference picture if it is
actually going to be used (i.e. if ph_temporal_mvp_enabled_flag is 1),
else legal RPR bitstreams will fail the new checks.

Co-authored-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: Frank Plowman <post@frankplowman.com>
2024-05-27 20:24:21 +08:00
llyyr
2b11a8b95b
lavc/vp9: reset segmentation fields when segmentation isn't enabled
Fields under the segmentation switch are never reset on a new frame, and
retain the value from the previous frame. This bugs out a bunch of
hwaccel drivers when segmentation is disabled but update_map isn't
reset because they don't ignore values behind switches. This commit also
resets the temporal field, though it may not be required.

We also do this for vp8 [1] so this commit is just mirroring the vp8
logic.

This fixes an issue with certain samples [2] that causes blocky
artifacts with vaapi, d3d11va and cuda (and possibly others).
Mesa worked around [3] this by ignoring these fields if
segmentation.enabled is 0, but d3d11va still displays blocky artifacts.

[1] 2e877090f9:/libavcodec/vp8.c#l797
[2] https://github.com/mpv-player/mpv/issues/13533
[3] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27816

Signed-off-by: llyyr <llyyr.public@gmail.com>
2024-05-27 12:23:40 +02:00
Fei Wang
01c7f68f7a lavc/qsvdec: Use coded_w/h for frame resolution when use system memory
Fix output mismatch when decode clip with crop(conf_win_*offset in
syntax) info by using system memory:

$ ffmpeg -c:v hevc_qsv -i conf_win_offet.bit -y out.yuv

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-05-27 09:38:46 +08:00
Fei Wang
1c56263704 lavc/qsvdec: Allow decoders to export crop information
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-05-27 09:38:46 +08:00
Haihao Xiang
a72e9aeabc lavc/qsvenc_av1: accept HDR metadata if have
The sdk av1 encoder can accept HDR metadata via mfxEncodeCtrl::ExtParam.

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2024-05-27 09:38:46 +08:00
Haihao Xiang
473e84ad62 lavc/qsvdec: update HDR side data on output AVFrame for AV1 decoding
The SDK may provide HDR metadata for HDR streams via mfxExtBuffer
attached on output mfxFrameSurface1

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2024-05-27 09:38:46 +08:00
Rémi Denis-Courmont
25a33665a0 lavc/vp8dsp: remove unused macro parameter 2024-05-26 19:20:48 +03:00
Rémi Denis-Courmont
728a1dd3b6 lavc/rv34dsp: remove stray load immediate 2024-05-26 19:20:45 +03:00
sunyuechi
63697d3350 lavc/vp8dsp: R-V V put_epel hv
C908:
vp8_put_epel4_h4v4_c: 20.0
vp8_put_epel4_h4v4_rvv_i32: 11.0
vp8_put_epel4_h4v6_c: 25.2
vp8_put_epel4_h4v6_rvv_i32: 13.5
vp8_put_epel4_h6v4_c: 22.2
vp8_put_epel4_h6v4_rvv_i32: 14.5
vp8_put_epel4_h6v6_c: 29.0
vp8_put_epel4_h6v6_rvv_i32: 15.7
vp8_put_epel8_h4v4_c: 73.0
vp8_put_epel8_h4v4_rvv_i32: 22.2
vp8_put_epel8_h4v6_c: 90.5
vp8_put_epel8_h4v6_rvv_i32: 26.7
vp8_put_epel8_h6v4_c: 85.0
vp8_put_epel8_h6v4_rvv_i32: 27.2
vp8_put_epel8_h6v6_c: 104.7
vp8_put_epel8_h6v6_rvv_i32: 29.5
vp8_put_epel16_h4v4_c: 145.5
vp8_put_epel16_h4v4_rvv_i32: 26.5
vp8_put_epel16_h4v6_c: 190.7
vp8_put_epel16_h4v6_rvv_i32: 47.5
vp8_put_epel16_h6v4_c: 173.7
vp8_put_epel16_h6v4_rvv_i32: 33.2
vp8_put_epel16_h6v6_c: 222.2
vp8_put_epel16_h6v6_rvv_i32: 35.5

Amended to disable unsupported RV128.

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-05-26 15:15:28 +03:00
Rémi Denis-Courmont
0b2316e37f lavc/sbrdsp: fix inverted boundary check
128-bit is the maximum, not the minimum here. Larger vector sizes can
result in reads past the end of the noise value table.

This partially reverts commit cdcb4b98b7.
2024-05-25 22:03:37 +03:00
Rémi Denis-Courmont
e6b38c944f lavc/sbrdsp: fix potential overflow in noise table
Since the SBR noise application optimisations are currently restricted
to hardware with 128-bit vectors, and use a quadruple multipler, they
can load up to 16 32-bit elements. But the "loads" are of 2 segments,
or 16 pairs of single precision float.

Thus we need to expand the dupiclated section of the noise table from
2x8 to 2x16 to avoid overflows.
2024-05-25 22:00:18 +03:00
Rémi Denis-Courmont
f883746587 lavc/flacdsp: do not assume maximum R-V VL
This loop correctly assumes that VLMAX=16 (4x128-bit vectors
with 32-bit elements) and 32 >= pred_order > 16. We need to alternate
between VL=16 and VL=t2=pred_order-16 elements to add up to pred_order.

The current code requests AVL=a2=pred_order elements. In QEMU and on
thte K230 hardware, this sets VL=16 as we need. But the specification
merely guarantees that we get: ceil(AVL / 2) <= VL <= VLMAX. For
instance, if pred_order equals 27, we could end up with VL=14 or VL=15
instead of VL=16. So instead, request literally VLMAX=16.
2024-05-25 10:31:50 +03:00
Andreas Rheinhardt
aff24c1658 avcodec/flacdec: Remove unused variable
Forgotten in 0380a03f1f.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-24 19:05:57 +02:00
Rémi Denis-Courmont
ba38d0e328 lavc/pixblockdsp: add scalar get_pixels_unaligned
The code is already there, we just need to use it.

get_pixels_unaligned_c: 2.2
get_pixels_unaligned_misaligned: 1.7
2024-05-24 17:53:43 +03:00
James Almer
0380a03f1f avcodec/flacdsp: split off lpc33 into a dsp function
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-24 09:23:00 -03:00
Haihao Xiang
8155808ce6 libavcodec/x86/vvc/vvc_sad: fix assembler error
X86ASM    libavcodec/x86/vvc/vvc_sad.o
libavcodec/x86/vvc/vvc_sad.asm:85: error: invalid number of operands
libavcodec/x86/vvc/vvc_sad.asm:87: error: invalid number of operands

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-23 09:12:50 -03:00
Stone Chen
0e52a4e434 libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC
Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions. DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce complexity, SAD is only calculated on even rows. This is calculated for all video bitdepths, but the values passed to the function are always 16bit (even if the original video bitdepth is 8). The AVX2 implementation uses min/max/sub.

Additionally this changes parameters dx and dy from int to intptr_t. This allows dx & dy to be used as pointer offsets without needing to use movsxd.

Benchmarks ( AMD 7940HS )
Before:
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 106.0 |
Chimera_8bit_1080P_1000_frames.vvc | 204.3 |
NovosobornayaSquare_1920x1080.bin | 197.3 |
RitualDance_1920x1080_60_10_420_37_RA.266 | 174.0 |

After:
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 109.3 |
Chimera_8bit_1080P_1000_frames.vvc | 216.0 |
NovosobornayaSquare_1920x1080.bin | 204.0|
RitualDance_1920x1080_60_10_420_37_RA.266 | 181.7 |

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-22 20:36:21 -03:00
Rémi Denis-Courmont
910d281b21 lavc/h263dsp: R-V V {h,v}_loop_filter
Since the horizontal and vertical filters are identical except for a
transposition, this uses a common subprocedure with an ad-hoc ABI.
To preserve return-address stack prediction, a link register has to be
used (c.f. the "Control Transfer Instructions" from the
RISC-V ISA Manual). The alternate/temporary link register T0 is used
here, so that the normal RA is preserved (something Arm cannot do!).

To load the strength value based on `qscale`, the shortest possible
and PIC-compatible sequence is used: AUIPC; ADD; LBU. The classic
LLA; ADD; LBU sequence would add one more instruction since LLA is a
convenience alias for AUIPC; ADDI. To ensure that this trick works,
relocation relaxation is disabled.

To implement the two signed divisions by a power of two toward zero:
 (x / (1 << SHIFT))
the code relies on the small range of integers involved, computing:
 (x + (x >> (16 - SHIFT))) >> SHIFT
rather than the more general:
 (x + ((x >> (16 - 1)) & ((1 << SHIFT) - 1))) >> SHIFT
Thus one ANDI instruction is avoided.

T-Head C908:
h263dsp.h_loop_filter_c:       228.2
h263dsp.h_loop_filter_rvv_i32: 144.0
h263dsp.v_loop_filter_c:       242.7
h263dsp.v_loop_filter_rvv_i32: 114.0
(C is probably worse in real use due to less predictible branches.)
2024-05-22 19:15:39 +03:00
James Almer
3d1597d3e2 x86/vvc_alf: use the x86inc instruction macros
Let its magic figure out the correct mnemonic based on target instruction set.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-22 20:51:30 +08:00
sunyuechi
0c1304ae11 lavc/vp9dsp: R-V V mc avg
C908:
vp9_avg4_8bpp_c: 1.2
vp9_avg4_8bpp_rvv_i64: 1.0
vp9_avg8_8bpp_c: 3.7
vp9_avg8_8bpp_rvv_i64: 1.5
vp9_avg16_8bpp_c: 14.7
vp9_avg16_8bpp_rvv_i64: 3.5
vp9_avg32_8bpp_c: 57.7
vp9_avg32_8bpp_rvv_i64: 10.0
vp9_avg64_8bpp_c: 229.0
vp9_avg64_8bpp_rvv_i64: 31.7

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-05-21 21:28:14 +03:00
Rémi Denis-Courmont
7591eb4055 Revert "lavc/sbrdsp: R-V V neg_odd_64"
While this function can easily be written with vectors, it just fails to
get any performance improvement.

For reference, this is a simpler loop-free implementation that does get
better performance than the current one depending on hardware, but still
more or less the same metrics as the C code:

 func ff_sbr_neg_odd_64_rvv, zve64x
         li      a1, 32
         addi    a0, a0, 7
         li      t0, 8
         vsetvli zero, a1, e8, m2, ta, ma
         li      t1, 0x80
         vlse8.v v8, (a0), t0
         vxor.vx v8, v8, t1
         vsse8.v v8, (a0), t0
         ret
 endfunc

This reverts commit d06fd18f8f.
2024-05-21 21:26:39 +03:00
Rémi Denis-Courmont
d452db8410 lavc/vc1dsp: R-V V vc1_unescape_buffer
Notes:
- The loop is biased toward no unescaped bytes as that should be most common.
- The input byte array is slid rather than the (8 times smaller) bit-mask,
  as RISC-V V does not provide a bit-mask (or bit-wise) slide instruction.
- There are two comparisons with 0 per iteration, for the same reason.
- In case of match, bytes are copied until the first match, and the loop is
  restarted after the escape byte. Vector compression (vcompress.vm) could
  discard all escape bytes but that is slower if escape bytes are rare.

Further optimisations should be possible, e.g.:
- processing 2 bytes fewer per iteration to get rid of a 2 slides,
- taking a short cut if the input vector contains less than 2 zeroes.
But this is a good starting point:

T-Head C908:
vc1dsp.vc1_unescape_buffer_c:      12749.5
vc1dsp.vc1_unescape_buffer_rvv_i32: 6009.0

SpacemiT X60:
vc1dsp.vc1_unescape_buffer_c:      11038.0
vc1dsp.vc1_unescape_buffer_rvv_i32: 2061.0
2024-05-21 21:16:30 +03:00