ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-04-19 17:10:22 +00:00

Author	SHA1	Message	Date
Rémi Denis-Courmont	fa3b153cb1	lavc/vp7dsp: R-V V vp7_idct_add Most of the code is shared with DC, thanks to minor earlier changes. vp7_idct_add_c: 5.2 vp7_idct_add_rvv_i32: 2.5	2024-05-29 16:57:02 +03:00
Rémi Denis-Courmont	4a0e629b6f	lavc/vp7dsp: revector ff_vp7_dc_wht_rvv This prepares for some code reuse.	2024-05-29 16:57:02 +03:00
Rémi Denis-Courmont	fd39997f72	lavc/vp7dsp: add R-V V vp7_luma_dc_wht This works out a bit more favourably than VP8's due to: - additional multiplications that can be vectored, - hardware-supported fixed-point rounding mode. vp7_luma_dc_wht_c: 3.2 vp7_luma_dc_wht_rvv_i64: 2.0	2024-05-29 16:57:02 +03:00
Rémi Denis-Courmont	91b5ea7bb9	lavc/vp8dsp: R-V V vp8_luma_dc_wht This is not great as transposition is poorly supported, but it works: vp8_luma_dc_wht_c: 2.5 vp8_luma_dc_wht_rvv_i32: 1.7	2024-05-29 16:57:02 +03:00
Rémi Denis-Courmont	c53d42380d	lavc/lpc: optimise RVV vector type for compute_autocorr On SpacemiT X60 (with len == 4000): autocorr_10_c: 2303.7 autocorr_10_rvv_f64: 1411.5 (before) autocorr_10_rvv_f64: 842.2 (after)	2024-05-29 16:57:02 +03:00
Stone Chen	55e9c758f0	libavcode/x86/vvc: change label to vvc_sad_16 to reflect block sizes According to the VVC specification (section 8.5.1), the maximum width/height of a subblock passed for DMVR SAD is 16. This along with previous constraint requiring width * height >= 128 means that 8x16, 16x8, and 16x16 are the only allowed sizes. This re-labels vvc_sad_16_128 to vvc_sad_16 to reflect this and adds a comment about the block size constraints. There's no functionality change.	2024-05-29 21:35:34 +08:00
David Rosca	510494760c	lavc/vaapi_h264: Fix merging fields in DPB with missing references If there are missing references, h264 decode does error concealment by copying previous refs which means there will be duplicated surfaces. Check long_ref and frame_idx in addition to surface when looking for the other field to avoid trying to merge with wrong picture. Also allow to merge with multiple pictures in case there are duplicates of the other field. Signed-off-by: David Rosca <nowrep@gmail.com> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2024-05-29 10:52:10 +08:00
David Rosca	d2d911eb9a	lavc/vaapi_av1: Avoid sending the same slice buffer multiple times When there are multiple tiles in one slice buffer, use multiple slice params to avoid sending the same slice buffer multiple times and thus increasing the bitstream size the driver will need to upload to hw. Reviewed-by: Neal Gompa <ngompa13@gmail.com> Signed-off-by: David Rosca <nowrep@gmail.com> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2024-05-29 10:49:35 +08:00
David Rosca	fe9d889dcd	lavc/vaapi_decode: Make it possible to send multiple slice params buffers Reviewed-by: Neal Gompa <ngompa13@gmail.com> Signed-off-by: David Rosca <nowrep@gmail.com> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2024-05-29 10:47:43 +08:00
Haihao Xiang	c872ba5899	lavc/qsvenc: respect user's setting for keyframes For example: ./ffmpeg -hwaccel qsv -i input.mp4 -force_key_frames:v source -c:v hevc_qsv -f null - Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2024-05-29 10:46:54 +08:00
Haihao Xiang	dbdd9ccded	lavc/qsvdec: fix keyframes MFX_FRAMETYPE_IDR is ORed to the frame type for AVC and HEVC keyframes, and MFX_FRAMETYPE_I is taken as keyframe flag for other codecs when getting the output surface from the SDK, hence we may mark the output frame as keyframe accordingly. Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2024-05-29 10:46:54 +08:00
Rémi Denis-Courmont	a11122f9c6	lavc/vp8dsp: save one R-V GPR This saves one instruction and frees up A5, which will be repurposed in later changes. Unfortunately, we need to add quite a lot of alternative code for this.	2024-05-28 19:44:11 +03:00
Rémi Denis-Courmont	4e56455d36	lavc/vp8dsp: avoid one multiplication on RISC-V Use shifts rather than multiply, and save one instruction.	2024-05-28 19:44:11 +03:00
Rémi Denis-Courmont	0aad5b9bf5	lavc/vp8dsp: factor R-V V bilin functions For a given type, only the first VSETVLI instruction varies depending on the size.	2024-05-28 19:44:11 +03:00
Rémi Denis-Courmont	b248d7c319	lavc/sbrdsp: fold immediate offset into relocation This results in AUIPC; ADDI instead of AUIPC; ADDI; ... ADDI.	2024-05-28 19:44:11 +03:00
Rémi Denis-Courmont	8444115262	lavc/startcode: fix RVV return value on no match If there are no zero bytes, t2 equals -1. The code cannot simply fall through to the match case.	2024-05-28 19:43:40 +03:00
Rémi Denis-Courmont	af20fb9c4e	lavc/lpc: fix off-by-one in R-V V compute_autocorr	2024-05-28 19:43:40 +03:00
Niklas Haas	9fd88bd092	avcodec/h2645_sei: loosen up min luminance requirements The H.265 specification is quite clear on this case: > When min_display_mastering_luminance is not in the range of 1 to > 50000, the nominal maximum display luminance of the mastering display > is unknown or unspecified or specified by other means not specified in > this Specification. And so the current code is correct in marking luminance data as invalid if min luminance is set to 0. However, this breaks playback of at least several real-world Blu-ray releases, for example La La Land, Planet of the Apes, and quite possibly a lot more. These come with ostensibly valid max_luminance tags (1000 nits), but min_luminance set to 0. Loosen up this requirement by guarding it behind FF_COMPLIANCE_STRICT. We still reject blatantly invalid metadata (wrong value range on luminance, max set to 0, max below min, min above 50 nits etc.), so this shouldn't cause any unintended regressions. Fixes: https://github.com/mpv-player/mpv/issues/14177	2024-05-28 18:11:57 +02:00
Michael Niedermayer	62d7106c36	avcodec/vlc: Cleanup on multi table alloc failure in ff_vlc_init_multi_from_lengths() Fixes: CID1544630 Resource leak Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-28 03:48:08 +02:00
Michael Niedermayer	d5cc21741b	avcodec/vc1_block: remove unneeded store to off in vc1_decode_p_mb_intfi() Found while reviewing code related to coverity Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-28 03:48:08 +02:00
Michael Niedermayer	992b28f572	avcodec/vc1_block: remove unused off from vc1_decode_p_mb_intfr() Fixes: CID1435166 Unused value Fixes: CID1529221 Unused value Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-28 03:48:08 +02:00
Michael Niedermayer	a287f17db2	avcodec/tiff: Assert init_get_bits8() success in unpack_gray() Helps: CID1441939 Unchecked return value Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-28 03:48:07 +02:00
Michael Niedermayer	8814cedb07	avcodec/tiff: Assert init_get_bits8() success in horizontal_fill() Helps: CID1441167 Unchecked return value Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-28 03:48:07 +02:00
Michael Niedermayer	c841cb45e8	qsv: Initialize impl_value Fixes: The warnings from CID1598553 Uninitialized scalar variable Passing partly initialized structs is ugly and asking for hard to rieproduce bugs, The uninitialized fields where not used Reviewed-by: "Xiang, Haihao" <haihao.xiang-at-intel.com@ffmpeg.org> Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-28 03:48:04 +02:00
Michael Niedermayer	e7775973f0	avcodec/tests/bitstream_template: Assert bits_init8() return Helps: CID1518967 Unchecked return value Helps: CID1518968 Unchecked return value Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-28 03:48:03 +02:00
Rémi Denis-Courmont	a535ce2ac0	lavc/flacdsp: R-V Zvl256b lpc33 flac_lpc_33_13_c: 499.7 flac_lpc_33_13_rvv_i64: 197.7 flac_lpc_33_16_c: 601.5 flac_lpc_33_16_rvv_i64: 195.2 flac_lpc_33_29_c: 1011.5 flac_lpc_33_29_rvv_i64: 300.7 flac_lpc_33_32_c: 1099.0 flac_lpc_33_32_rvv_i64: 296.7	2024-05-27 22:07:29 +03:00
Rémi Denis-Courmont	5ebb071d79	lavc/vp8dsp: disable EPEL HV on RV128 RV128 is mostly scifi at this point, so we can just disable it here (the EPEL HV prologue/epilogue do not save 128-bit registers).	2024-05-27 22:07:29 +03:00
Diego Felix de Souza	aead61451c	avcodec/nvenc_av1: Correct CQ range for AV1 The Constant Quality (CQ) range for the AV1 codec is actually 0 to 63, contrary to what is stated in the header and documentation. Signed-off-by: Diego Felix de Souza <ddesouza@nvidia.com> Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>	2024-05-27 19:20:18 +02:00
Frank Plowman	49c3918c1a	lavc/vvc: Validate temporal MVP references Per VVCv3 p. 157, the collocated reference picture used in temporal motion vector prediction must have RprConstraintsActiveFlag equal to zero and the same CTU size as the current picture. Add these checks, fixing crashes decoding some fuzzed bitstreams. Additionally, only set up the collocated reference picture if it is actually going to be used (i.e. if ph_temporal_mvp_enabled_flag is 1), else legal RPR bitstreams will fail the new checks. Co-authored-by: Nuo Mi <nuomi2021@gmail.com> Signed-off-by: Frank Plowman <post@frankplowman.com>	2024-05-27 20:24:21 +08:00
llyyr	2b11a8b95b	lavc/vp9: reset segmentation fields when segmentation isn't enabled Fields under the segmentation switch are never reset on a new frame, and retain the value from the previous frame. This bugs out a bunch of hwaccel drivers when segmentation is disabled but update_map isn't reset because they don't ignore values behind switches. This commit also resets the temporal field, though it may not be required. We also do this for vp8 [1] so this commit is just mirroring the vp8 logic. This fixes an issue with certain samples [2] that causes blocky artifacts with vaapi, d3d11va and cuda (and possibly others). Mesa worked around [3] this by ignoring these fields if segmentation.enabled is 0, but d3d11va still displays blocky artifacts. [1] `2e877090f9`:/libavcodec/vp8.c#l797 [2] https://github.com/mpv-player/mpv/issues/13533 [3] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27816 Signed-off-by: llyyr <llyyr.public@gmail.com>	2024-05-27 12:23:40 +02:00
Fei Wang	01c7f68f7a	lavc/qsvdec: Use coded_w/h for frame resolution when use system memory Fix output mismatch when decode clip with crop(conf_win_*offset in syntax) info by using system memory: $ ffmpeg -c:v hevc_qsv -i conf_win_offet.bit -y out.yuv Signed-off-by: Fei Wang <fei.w.wang@intel.com>	2024-05-27 09:38:46 +08:00
Fei Wang	1c56263704	lavc/qsvdec: Allow decoders to export crop information Signed-off-by: Fei Wang <fei.w.wang@intel.com>	2024-05-27 09:38:46 +08:00
Haihao Xiang	a72e9aeabc	lavc/qsvenc_av1: accept HDR metadata if have The sdk av1 encoder can accept HDR metadata via mfxEncodeCtrl::ExtParam. Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2024-05-27 09:38:46 +08:00
Haihao Xiang	473e84ad62	lavc/qsvdec: update HDR side data on output AVFrame for AV1 decoding The SDK may provide HDR metadata for HDR streams via mfxExtBuffer attached on output mfxFrameSurface1 Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2024-05-27 09:38:46 +08:00
Rémi Denis-Courmont	25a33665a0	lavc/vp8dsp: remove unused macro parameter	2024-05-26 19:20:48 +03:00
Rémi Denis-Courmont	728a1dd3b6	lavc/rv34dsp: remove stray load immediate	2024-05-26 19:20:45 +03:00
sunyuechi	63697d3350	lavc/vp8dsp: R-V V put_epel hv C908: vp8_put_epel4_h4v4_c: 20.0 vp8_put_epel4_h4v4_rvv_i32: 11.0 vp8_put_epel4_h4v6_c: 25.2 vp8_put_epel4_h4v6_rvv_i32: 13.5 vp8_put_epel4_h6v4_c: 22.2 vp8_put_epel4_h6v4_rvv_i32: 14.5 vp8_put_epel4_h6v6_c: 29.0 vp8_put_epel4_h6v6_rvv_i32: 15.7 vp8_put_epel8_h4v4_c: 73.0 vp8_put_epel8_h4v4_rvv_i32: 22.2 vp8_put_epel8_h4v6_c: 90.5 vp8_put_epel8_h4v6_rvv_i32: 26.7 vp8_put_epel8_h6v4_c: 85.0 vp8_put_epel8_h6v4_rvv_i32: 27.2 vp8_put_epel8_h6v6_c: 104.7 vp8_put_epel8_h6v6_rvv_i32: 29.5 vp8_put_epel16_h4v4_c: 145.5 vp8_put_epel16_h4v4_rvv_i32: 26.5 vp8_put_epel16_h4v6_c: 190.7 vp8_put_epel16_h4v6_rvv_i32: 47.5 vp8_put_epel16_h6v4_c: 173.7 vp8_put_epel16_h6v4_rvv_i32: 33.2 vp8_put_epel16_h6v6_c: 222.2 vp8_put_epel16_h6v6_rvv_i32: 35.5 Amended to disable unsupported RV128. Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2024-05-26 15:15:28 +03:00
Rémi Denis-Courmont	0b2316e37f	lavc/sbrdsp: fix inverted boundary check 128-bit is the maximum, not the minimum here. Larger vector sizes can result in reads past the end of the noise value table. This partially reverts commit `cdcb4b98b7`.	2024-05-25 22:03:37 +03:00
Rémi Denis-Courmont	e6b38c944f	lavc/sbrdsp: fix potential overflow in noise table Since the SBR noise application optimisations are currently restricted to hardware with 128-bit vectors, and use a quadruple multipler, they can load up to 16 32-bit elements. But the "loads" are of 2 segments, or 16 pairs of single precision float. Thus we need to expand the dupiclated section of the noise table from 2x8 to 2x16 to avoid overflows.	2024-05-25 22:00:18 +03:00
Rémi Denis-Courmont	f883746587	lavc/flacdsp: do not assume maximum R-V VL This loop correctly assumes that VLMAX=16 (4x128-bit vectors with 32-bit elements) and 32 >= pred_order > 16. We need to alternate between VL=16 and VL=t2=pred_order-16 elements to add up to pred_order. The current code requests AVL=a2=pred_order elements. In QEMU and on thte K230 hardware, this sets VL=16 as we need. But the specification merely guarantees that we get: ceil(AVL / 2) <= VL <= VLMAX. For instance, if pred_order equals 27, we could end up with VL=14 or VL=15 instead of VL=16. So instead, request literally VLMAX=16.	2024-05-25 10:31:50 +03:00
Andreas Rheinhardt	aff24c1658	avcodec/flacdec: Remove unused variable Forgotten in `0380a03f1f`. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-05-24 19:05:57 +02:00
Rémi Denis-Courmont	ba38d0e328	lavc/pixblockdsp: add scalar get_pixels_unaligned The code is already there, we just need to use it. get_pixels_unaligned_c: 2.2 get_pixels_unaligned_misaligned: 1.7	2024-05-24 17:53:43 +03:00
James Almer	0380a03f1f	avcodec/flacdsp: split off lpc33 into a dsp function Signed-off-by: James Almer <jamrial@gmail.com>	2024-05-24 09:23:00 -03:00
Haihao Xiang	8155808ce6	libavcodec/x86/vvc/vvc_sad: fix assembler error X86ASM libavcodec/x86/vvc/vvc_sad.o libavcodec/x86/vvc/vvc_sad.asm:85: error: invalid number of operands libavcodec/x86/vvc/vvc_sad.asm:87: error: invalid number of operands Signed-off-by: Haihao Xiang <haihao.xiang@intel.com> Signed-off-by: James Almer <jamrial@gmail.com>	2024-05-23 09:12:50 -03:00
Stone Chen	0e52a4e434	libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions. DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce complexity, SAD is only calculated on even rows. This is calculated for all video bitdepths, but the values passed to the function are always 16bit (even if the original video bitdepth is 8). The AVX2 implementation uses min/max/sub. Additionally this changes parameters dx and dy from int to intptr_t. This allows dx & dy to be used as pointer offsets without needing to use movsxd. Benchmarks ( AMD 7940HS ) Before: BQTerrace_1920x1080_60_10_420_22_RA.vvc \| 106.0 \| Chimera_8bit_1080P_1000_frames.vvc \| 204.3 \| NovosobornayaSquare_1920x1080.bin \| 197.3 \| RitualDance_1920x1080_60_10_420_37_RA.266 \| 174.0 \| After: BQTerrace_1920x1080_60_10_420_22_RA.vvc \| 109.3 \| Chimera_8bit_1080P_1000_frames.vvc \| 216.0 \| NovosobornayaSquare_1920x1080.bin \| 204.0\| RitualDance_1920x1080_60_10_420_37_RA.266 \| 181.7 \| Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2024-05-22 20:36:21 -03:00
Rémi Denis-Courmont	910d281b21	lavc/h263dsp: R-V V {h,v}_loop_filter Since the horizontal and vertical filters are identical except for a transposition, this uses a common subprocedure with an ad-hoc ABI. To preserve return-address stack prediction, a link register has to be used (c.f. the "Control Transfer Instructions" from the RISC-V ISA Manual). The alternate/temporary link register T0 is used here, so that the normal RA is preserved (something Arm cannot do!). To load the strength value based on `qscale`, the shortest possible and PIC-compatible sequence is used: AUIPC; ADD; LBU. The classic LLA; ADD; LBU sequence would add one more instruction since LLA is a convenience alias for AUIPC; ADDI. To ensure that this trick works, relocation relaxation is disabled. To implement the two signed divisions by a power of two toward zero: (x / (1 << SHIFT)) the code relies on the small range of integers involved, computing: (x + (x >> (16 - SHIFT))) >> SHIFT rather than the more general: (x + ((x >> (16 - 1)) & ((1 << SHIFT) - 1))) >> SHIFT Thus one ANDI instruction is avoided. T-Head C908: h263dsp.h_loop_filter_c: 228.2 h263dsp.h_loop_filter_rvv_i32: 144.0 h263dsp.v_loop_filter_c: 242.7 h263dsp.v_loop_filter_rvv_i32: 114.0 (C is probably worse in real use due to less predictible branches.)	2024-05-22 19:15:39 +03:00
James Almer	3d1597d3e2	x86/vvc_alf: use the x86inc instruction macros Let its magic figure out the correct mnemonic based on target instruction set. Signed-off-by: James Almer <jamrial@gmail.com>	2024-05-22 20:51:30 +08:00
sunyuechi	0c1304ae11	lavc/vp9dsp: R-V V mc avg C908: vp9_avg4_8bpp_c: 1.2 vp9_avg4_8bpp_rvv_i64: 1.0 vp9_avg8_8bpp_c: 3.7 vp9_avg8_8bpp_rvv_i64: 1.5 vp9_avg16_8bpp_c: 14.7 vp9_avg16_8bpp_rvv_i64: 3.5 vp9_avg32_8bpp_c: 57.7 vp9_avg32_8bpp_rvv_i64: 10.0 vp9_avg64_8bpp_c: 229.0 vp9_avg64_8bpp_rvv_i64: 31.7 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2024-05-21 21:28:14 +03:00
Rémi Denis-Courmont	7591eb4055	Revert "lavc/sbrdsp: R-V V neg_odd_64" While this function can easily be written with vectors, it just fails to get any performance improvement. For reference, this is a simpler loop-free implementation that does get better performance than the current one depending on hardware, but still more or less the same metrics as the C code: func ff_sbr_neg_odd_64_rvv, zve64x li a1, 32 addi a0, a0, 7 li t0, 8 vsetvli zero, a1, e8, m2, ta, ma li t1, 0x80 vlse8.v v8, (a0), t0 vxor.vx v8, v8, t1 vsse8.v v8, (a0), t0 ret endfunc This reverts commit `d06fd18f8f`.	2024-05-21 21:26:39 +03:00
Rémi Denis-Courmont	d452db8410	lavc/vc1dsp: R-V V vc1_unescape_buffer Notes: - The loop is biased toward no unescaped bytes as that should be most common. - The input byte array is slid rather than the (8 times smaller) bit-mask, as RISC-V V does not provide a bit-mask (or bit-wise) slide instruction. - There are two comparisons with 0 per iteration, for the same reason. - In case of match, bytes are copied until the first match, and the loop is restarted after the escape byte. Vector compression (vcompress.vm) could discard all escape bytes but that is slower if escape bytes are rare. Further optimisations should be possible, e.g.: - processing 2 bytes fewer per iteration to get rid of a 2 slides, - taking a short cut if the input vector contains less than 2 zeroes. But this is a good starting point: T-Head C908: vc1dsp.vc1_unescape_buffer_c: 12749.5 vc1dsp.vc1_unescape_buffer_rvv_i32: 6009.0 SpacemiT X60: vc1dsp.vc1_unescape_buffer_c: 11038.0 vc1dsp.vc1_unescape_buffer_rvv_i32: 2061.0	2024-05-21 21:16:30 +03:00

1 2 3 4 5 ...

50170 commits