ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-06-14 11:30:39 +00:00

Author	SHA1	Message	Date
Mark Thompson	c3665ee60f	av1dec: Add force_integer_mv derived field for decoder use This is not the same as the syntax element value in the frame header because the specification parsing tables override the value on intra frames. (cherry picked from commit `6f56e0e7e5`)	2024-05-06 21:33:03 +01:00
Mark Thompson	9963b9e3c9	av1dec: Fix RefFrameSignBias calculation (cherry picked from commit `ba6b08c75b`)	2024-04-24 17:36:01 +02:00
James Almer	506fbe681c	avcodec/codec_par: always clear extradata_size in avcodec_parameters_to_context() Missed in `d383ae43c2`. Signed-off-by: James Almer <jamrial@gmail.com> (cherry picked from commit `c4e3d6cdb0`)	2024-04-24 00:17:16 -03:00
Zhao Zhili	13e93ffbfd	avcodec/mediacodecenc: Fix return empty packet when bsf is used Signed-off-by: Zhao Zhili <zhilizhao@tencent.com> (cherry picked from commit `a5a3788f56`)	2024-04-23 16:10:28 +08:00
Andreas Rheinhardt	2d3ee7c069	avcodec/hevcdec: Fix precedence, bogus film grain warning Reviewed-by: Niklas Haas <ffmpeg@haasn.xyz> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> (cherry picked from commit `bba996d6cd`)	2024-04-22 23:43:03 +03:00
Niklas Haas	30002d58fa	avcodec/hevcdec: fix segfault on invalid film grain metadata Invalid input files may contain film grain metadata which survives ff_h274_film_grain_params_supported() but does not pass av_film_grain_params_select(), leading to a SIGSEGV on hevc_frame_end(). Fix this by duplicating the av_film_grain_params_select() check at frame init time. An alternative solution here would be to defer the incompatibility check to hevc_frame_end(), but this has the downside of allocating a film grain buffer even when we already know we can't apply film grain. Fixes: https://trac.ffmpeg.org/ticket/10951 (cherry picked from commit `459648761f`)	2024-04-22 23:43:03 +03:00
Frank Plowman	cbd98447bc	lavc/vvc: Skip enhancement layer NAL units The native VVC decoder does not yet support quality/spatial/multiview scalability. Bitstreams requiring this feature could cause crashes. Patch fixes this by skipping NAL units which are not in the base layer, warning the user while doing so. Signed-off-by: Frank Plowman <post@frankplowman.com> Signed-off-by: James Almer <jamrial@gmail.com> (cherry picked from commit `bb9e4ff355`)	2024-04-18 22:29:31 -03:00
Lynne	8dfafe5366	vulkan_av1: add workaround for NVIDIA drivers tested on broken CTS The first release of the CTS for AV1 decoding had incorrect offsets for the OrderHints values. The CTS will be fixed, and eventually, the drivers will be updated to the proper spec-conforming behaviour, but we still need to add a workaround as this will take months. Only NVIDIA use these values at all, so limit the workaround to only NVIDIA. Also, other vendors don't tend to provide accurate CTS information. (cherry picked from commit `db09f1a5d8`)	2024-04-16 18:14:32 +02:00
Mark Thompson	48721a415a	lavc/vulkan_av1: Use av1dec reference order hint information (cherry picked from commit `3cca8dfbd8`)	2024-04-16 18:14:32 +02:00
Mark Thompson	0d851a82dd	lavc/av1: Record reference ordering information for each frame This is needed by Vulkan. Constructing this can't be delegated to CBS because packets might contain multiple frames (when non-shown frames are present) but we need separate snapshots immediately before each frame for the decoder. (cherry picked from commit `22ced1edc6`)	2024-04-16 18:14:32 +02:00
Andreas Rheinhardt	265de29acb	avcodec/wavpack: Remove always-false check Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> (cherry picked from commit `d307aca184`)	2024-04-05 17:42:01 +02:00
Andreas Rheinhardt	607fca80b7	avcodec/wavpack: Fix leak and segfault on reallocation error av_realloc_f() frees the buffer it is given on allocation failure. But in this case, the buffer is an array of ownership pointers, causing leaks on error. Furthermore, the count of pointers is unchanged on error and the codec's close function uses it to free said ownership pointers, causing a NPD. This is a regression since `46412a8935`. Fix this by switching to av_realloc_array(). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> (cherry picked from commit `2f59648aed`)	2024-04-05 01:49:15 +02:00
Andreas Rheinhardt	82aa188281	avcodec/lossless_videoencdsp: Don't presume alignment in diff_bytes The alignment of all the parameters in diff_bytes can be anything the despite the documentation claiming otherwise. `8ecd383122` was based around said documentation and is therefore insufficient to fix e.g. the misaligned loads that happen in the huffyuvbgra and huffyuvbgr24 vsynth FATE-tests. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> (cherry picked from commit `a4800643bb`)	2024-04-05 01:49:05 +02:00
Andreas Rheinhardt	0e3a46720a	avcodec/ppc/h264dsp: Fix left shifts of negative numbers PPC equivalent of `c756b3fca2`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> (cherry picked from commit `e54696bcaa`)	2024-04-05 01:48:56 +02:00
Leo Izen	9a4c7b937f	avcodec, avformat/ffjni: fix duplicate JNI symbols Use SHLIBOBJS and STLIBOBJS in the Makefiles for avcodec and avformat, and add a stub ffjni.c to libavformat, which allows the symbols to be duplicated for shared builds but not static builds. Signed-off-by: Leo Izen <leo.izen@gmail.com> Signed-off-by: Matthieu Bouron <matthieu.bouron@gmail.com>	2024-04-04 21:54:22 +02:00
Michael Niedermayer	1ef084f910	avcodec/wavarc: fix signed integer overflow in block type 6/19 Fixes: signed integer overflow: -2088796289 + -91276551 cannot be represented in type 'int' Fixes: 67772/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WAVARC_fuzzer-6533568953122816 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> (cherry picked from commit `28c7094b25`) Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-04-04 21:12:15 +02:00
Michael Niedermayer	87e5bc918a	avcodec/exr: Dont use 64bits to hold 6bits Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> (cherry picked from commit `e3984de6ff`) Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-04-03 14:42:13 +02:00
Michael Niedermayer	8146cab801	avcodec/exr: Check for remaining bits in huf_unpack_enc_table() Fixes: Timeout Fixes: 67645/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_EXR_fuzzer-6308760977997824 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> (cherry picked from commit `589fa8a027`) Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-04-03 14:42:12 +02:00
Michael Niedermayer	5469ba6d74	avcodec/apedec: Use NABS to avoid undefined negation Fixes: negation of -2147483648 cannot be represented in type 'int32_t' (aka 'int'); cast to an unsigned type to negate this value to itself Fixes: 67738/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_APE_fuzzer-5444313212321792 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> (cherry picked from commit `1887ff250c`) Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-04-03 14:42:12 +02:00
Michael Niedermayer	e37d66a72e	avcodec/vvc/vvcdec: Do not submit frames without VVCFrameThread Such frames will crash when pthread functions are called on the NULL pointer Fixes: member access within null pointer of type 'VVCFrameThread' (aka 'struct VVCFrameThread') Fixes: 65160/clusterfuzz-testcase-minimized-ffmpeg_BSF_VVC_METADATA_fuzzer-4665241535119360 (partly) Fixes: 65636/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VVC_fuzzer-5394745824182272 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> (cherry picked from commit `84ce5ced31`) Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-04-03 14:42:11 +02:00
Michael Niedermayer	7e899776ec	avcodec/jpeg2000htdec: warn about non zero roi shift Suggested-by: Tomas Härdin <git@haerdin.se> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> (cherry picked from commit `7b7eea8e63`) Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-04-03 14:42:09 +02:00
Michael Niedermayer	cc9d291fb0	avcodec/jpeg2000htdec: Check magp before using it in a shift Fixes: shift exponent -1 is negative Fixes: 65378/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_fuzzer-5457678193197056 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> (cherry picked from commit `19ad05e9e0`) Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-04-03 14:42:09 +02:00
Haihao Xiang	74e4e900bb	lavc/vaapi_encode: convert from lambda to qp When AV_CODEC_FLAG_QSCALE is set, the value of avctx->global_quality is lambda. Signed-off-by: Haihao Xiang <haihao.xiang@intel.com> (cherry picked from commit `1590a96adc`)	2024-04-03 10:35:26 +08:00
Fei Wang	2d18c4906f	lavc/vaapi_encode: Add VAAPI version check for BLBRC Fix build fail when VAAPI version less than 0.39.2. Signed-off-by: Fei Wang <fei.w.wang@intel.com> (cherry picked from commit `09377887df`)	2024-04-03 10:32:59 +08:00
James Almer	112fdae9f9	avcodec/vvc_refs: don't ask for a "Inter layer ref" sample The FATE suite has two already. Signed-off-by: James Almer <jamrial@gmail.com> (cherry picked from commit `45b56455ad`)	2024-04-02 11:56:14 -03:00
Andreas Rheinhardt	dcbc1fdb3b	avcodec/vlc, bitstream: Fix multi VLC with uint8_t syms on BE VLC_MULTI_ELEM contains an uint8_t array that is supposed to be treated as an array of uint16_t when the used symbols have a size of two; otherwise it should be treated as just an array of uint8_t, but it was not always treated that way: vlc_multi_gen() initialized the first entry of the array by writing the symbol via AV_WN16; on big endian systems, the intended value was instead written into the second entry of the array (where it would likely be overwritten lateron during initialization). read_vlc_multi() also treated this case incorrectly: In case the code is so long that it needs a classical multi-stage lookup, the symbol has been written to the destination as if via AV_WN16. On little endian systems, this sets the correct first symbol and clobbers (zeroes) the next one, but the next one will be overwritten lateron anyway, so it won't be recognized. But on big-endian systems, the first symbol will be set to zero and the actually read symbol will be put into the slot for the next one (where it will be overwritten lateron). This commit fixes this; this fixes the magicyuv and utvideo FATE-tests on big endian arches. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> (cherry picked from commit `4ab82d2fb6`)	2024-04-02 14:32:00 +02:00
Timo Rothenpieler	4c5a809388	avcodec/nvenc: support SDK 12.2 bit depth API	2024-04-01 01:00:47 +02:00
Timo Rothenpieler	5ff5a431c7	avcodec/nvenc: stop using long deprecated format specifiers	2024-04-01 01:00:41 +02:00
Timo Rothenpieler	515949a15a	avcodec/nvdec: reset bitstream_len/nb_slices when resetting bitstream pointer	2024-03-30 00:16:21 +01:00
Tong Wu	7fa569e34d	avcodec/hevc_ps: fix the problem of memcmp losing effectiveness HEVCHdrParams* receives a pointer which points to a dynamically allocated memory block. It causes the memcmp always returning 1. Add a function to do the comparision. A condition is also added to avoid malloc(0). Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Tong Wu <tong1.wu@intel.com> Signed-off-by: James Almer <jamrial@gmail.com> (cherry picked from commit `6bf17136a2`)	2024-03-29 14:52:48 -03:00
Zhao Zhili	304208d40c	avcodec/h264_mp4toannexb: Fix heap buffer overflow Fixes: out of array write Fixes: 64407/clusterfuzz-testcase-minimized-ffmpeg_BSF_H264_MP4TOANNEXB_fuzzer-4966763443650560 mp4toannexb_filter counts the number of bytes needed in the first pass and allocate the memory, then do memcpy in the second pass. Update sps/pps size in the loop makes the count invalid in the case of SPS/PPS occur after IDR slice. This patch process in-band SPS/PPS before the two pass loops. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com> (cherry picked from commit `89e9486bc3`)	2024-03-27 20:11:57 +08:00
Michael Niedermayer	872980ace6	Bump prior release/7.0 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-27 01:04:53 +01:00
Michael Niedermayer	1eb8cbd09c	avcodec/wavarc: avoid signed integer overflow in AC code Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WAVARC_fuzzer-659847401740697 Fixes: signed integer overflow: 65312 * 34078 cannot be represented in type 'int' Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	6009dd07bd	avcodec/wavarc: Avoid signed integer overflow in sample Fixes: signed integer overflow: -2147483648 + -25122315 cannot be represented in type 'int' Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WAVARC_fuzzer-6199806972198912 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	ebdcf98499	avcodec/truemotion1: Height not being a multiple of 4 is unsupported mb_change_bits is given space based on height >> 2, while more data is read Fixes: out of array access Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TRUEMOTION1_fuzzer-5201925062590464.fuzz Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	d188a86730	avcodec/rtv1: fix undefined FFALIGN Fixes: signed integer overflow: 2147483647 + 4 cannot be represented in type 'int' Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_RTV1_fuzzer-6324303861514240 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	7eabe56436	avcodec/qoadec: Fix undefined overflow in lms_predict Fixes: signed integer overflow: -1575944192 + -602931200 cannot be represented in type 'int' Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_QOA_fuzzer-6470469339185152 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	48eeb198a5	avcodec/hcadec: do not allow code to continue after failed init Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488 Fixes: out of array write Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	addb85ea39	avcodec/hcadec: do not set hfr_group_count to invalid values Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488 Fixes: out of array write Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Dai, Jianhui J	61afe4d98c	avcodec/cbs_vp8: Improve the bitstream position check The VP8 compressed header may not be byte-aligned due to boolean coding. Round up byte count for accurate data positioning. Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2024-03-26 09:05:04 -04:00
Dai, Jianhui J	63dea3c1e1	avcodec/cbs_vp8: Use little endian in fixed() This commit adds value range checks to cbs_vp8_read_unsigned_le, migrates fixed() to use it, and enforces little-endian consistency for all read methods. Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2024-03-26 09:04:44 -04:00
Martin Storsjö	f872b19714	aarch64: hevc: Produce plain neon versions of qpel_bi_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_bi_hv4_8_c: 385.7 put_hevc_qpel_bi_hv4_8_neon: 131.0 put_hevc_qpel_bi_hv4_8_i8mm: 92.2 put_hevc_qpel_bi_hv6_8_c: 701.0 put_hevc_qpel_bi_hv6_8_neon: 239.5 put_hevc_qpel_bi_hv6_8_i8mm: 191.0 put_hevc_qpel_bi_hv8_8_c: 1162.0 put_hevc_qpel_bi_hv8_8_neon: 228.0 put_hevc_qpel_bi_hv8_8_i8mm: 225.2 put_hevc_qpel_bi_hv12_8_c: 2305.0 put_hevc_qpel_bi_hv12_8_neon: 558.0 put_hevc_qpel_bi_hv12_8_i8mm: 483.2 put_hevc_qpel_bi_hv16_8_c: 3965.2 put_hevc_qpel_bi_hv16_8_neon: 732.7 put_hevc_qpel_bi_hv16_8_i8mm: 656.5 put_hevc_qpel_bi_hv24_8_c: 8709.7 put_hevc_qpel_bi_hv24_8_neon: 1555.2 put_hevc_qpel_bi_hv24_8_i8mm: 1448.7 put_hevc_qpel_bi_hv32_8_c: 14818.0 put_hevc_qpel_bi_hv32_8_neon: 2763.7 put_hevc_qpel_bi_hv32_8_i8mm: 2468.0 put_hevc_qpel_bi_hv48_8_c: 32855.5 put_hevc_qpel_bi_hv48_8_neon: 6107.2 put_hevc_qpel_bi_hv48_8_i8mm: 5452.7 put_hevc_qpel_bi_hv64_8_c: 57591.5 put_hevc_qpel_bi_hv64_8_neon: 10660.2 put_hevc_qpel_bi_hv64_8_i8mm: 9580.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	d21b9a0411	aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. AWS Graviton 3: put_hevc_qpel_uni_w_hv4_8_c: 422.2 put_hevc_qpel_uni_w_hv4_8_neon: 140.7 put_hevc_qpel_uni_w_hv4_8_i8mm: 100.7 put_hevc_qpel_uni_w_hv8_8_c: 1208.0 put_hevc_qpel_uni_w_hv8_8_neon: 268.2 put_hevc_qpel_uni_w_hv8_8_i8mm: 261.5 put_hevc_qpel_uni_w_hv16_8_c: 4297.2 put_hevc_qpel_uni_w_hv16_8_neon: 802.2 put_hevc_qpel_uni_w_hv16_8_i8mm: 731.2 put_hevc_qpel_uni_w_hv32_8_c: 15518.5 put_hevc_qpel_uni_w_hv32_8_neon: 3085.2 put_hevc_qpel_uni_w_hv32_8_i8mm: 2783.2 put_hevc_qpel_uni_w_hv64_8_c: 57254.5 put_hevc_qpel_uni_w_hv64_8_neon: 11787.5 put_hevc_qpel_uni_w_hv64_8_i8mm: 10659.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	5ab138673b	aarch64: hevc: Produce plain neon versions of qpel_uni_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_uni_hv4_8_c: 384.2 put_hevc_qpel_uni_hv4_8_neon: 127.5 put_hevc_qpel_uni_hv4_8_i8mm: 85.5 put_hevc_qpel_uni_hv6_8_c: 705.5 put_hevc_qpel_uni_hv6_8_neon: 224.5 put_hevc_qpel_uni_hv6_8_i8mm: 176.2 put_hevc_qpel_uni_hv8_8_c: 1136.5 put_hevc_qpel_uni_hv8_8_neon: 216.5 put_hevc_qpel_uni_hv8_8_i8mm: 214.0 put_hevc_qpel_uni_hv12_8_c: 2259.5 put_hevc_qpel_uni_hv12_8_neon: 498.5 put_hevc_qpel_uni_hv12_8_i8mm: 410.7 put_hevc_qpel_uni_hv16_8_c: 3824.7 put_hevc_qpel_uni_hv16_8_neon: 670.0 put_hevc_qpel_uni_hv16_8_i8mm: 603.7 put_hevc_qpel_uni_hv24_8_c: 8113.5 put_hevc_qpel_uni_hv24_8_neon: 1474.7 put_hevc_qpel_uni_hv24_8_i8mm: 1351.5 put_hevc_qpel_uni_hv32_8_c: 14744.5 put_hevc_qpel_uni_hv32_8_neon: 2599.7 put_hevc_qpel_uni_hv32_8_i8mm: 2266.0 put_hevc_qpel_uni_hv48_8_c: 32800.0 put_hevc_qpel_uni_hv48_8_neon: 5650.0 put_hevc_qpel_uni_hv48_8_i8mm: 5011.7 put_hevc_qpel_uni_hv64_8_c: 57856.2 put_hevc_qpel_uni_hv64_8_neon: 9863.5 put_hevc_qpel_uni_hv64_8_i8mm: 8767.7 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	5cbeefc79e	aarch64: hevc: Produce plain neon versions of qpel_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_hv4_8_c: 386.0 put_hevc_qpel_hv4_8_neon: 125.7 put_hevc_qpel_hv4_8_i8mm: 83.2 put_hevc_qpel_hv6_8_c: 749.0 put_hevc_qpel_hv6_8_neon: 207.0 put_hevc_qpel_hv6_8_i8mm: 166.0 put_hevc_qpel_hv8_8_c: 1305.2 put_hevc_qpel_hv8_8_neon: 216.5 put_hevc_qpel_hv8_8_i8mm: 213.0 put_hevc_qpel_hv12_8_c: 2570.5 put_hevc_qpel_hv12_8_neon: 480.0 put_hevc_qpel_hv12_8_i8mm: 398.2 put_hevc_qpel_hv16_8_c: 4158.7 put_hevc_qpel_hv16_8_neon: 659.7 put_hevc_qpel_hv16_8_i8mm: 593.5 put_hevc_qpel_hv24_8_c: 8626.7 put_hevc_qpel_hv24_8_neon: 1653.5 put_hevc_qpel_hv24_8_i8mm: 1398.7 put_hevc_qpel_hv32_8_c: 14646.0 put_hevc_qpel_hv32_8_neon: 2566.2 put_hevc_qpel_hv32_8_i8mm: 2287.5 put_hevc_qpel_hv48_8_c: 31072.5 put_hevc_qpel_hv48_8_neon: 6228.5 put_hevc_qpel_hv48_8_i8mm: 5291.0 put_hevc_qpel_hv64_8_c: 53847.2 put_hevc_qpel_hv64_8_neon: 9856.7 put_hevc_qpel_hv64_8_i8mm: 8831.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	20c38f4b8d	aarch64: hevc: Reorder qpel_hv functions to prepare for templating This is a pure reordering of code without changing anything in the individual functions. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:50 +02:00
Martin Storsjö	4f71e4ebf2	aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions The hv32 and hv64 functions were identical - both loop and process 16 pixels at a time. The hv16 function was near identical, except for the outer loop (and using sp instead of a separate register). Given the size of these functions, the extra cost of the outer loop is negligible, so use the same function for hv16 as well. This removes over 200 lines of duplicated assembly, and over 4 KB of binary size. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:40 +02:00
Martin Storsjö	4063e50eec	aarch64: hevc: Split the qpel_*_hv functions into two parts The first horizontal filter can use either i8mm or plain neon versions, while the second part is a pure neon implementation. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:29 +02:00
Martin Storsjö	ad01d06f91	aarch64: hevc: Implement a neon version of hevc_qpel_uni_w_h*_8 AWS Graviton 3: put_hevc_qpel_uni_w_h4_8_c: 159.0 put_hevc_qpel_uni_w_h4_8_neon: 64.2 put_hevc_qpel_uni_w_h4_8_i8mm: 40.0 put_hevc_qpel_uni_w_h6_8_c: 344.7 put_hevc_qpel_uni_w_h6_8_neon: 114.5 put_hevc_qpel_uni_w_h6_8_i8mm: 82.0 put_hevc_qpel_uni_w_h8_8_c: 596.2 put_hevc_qpel_uni_w_h8_8_neon: 132.2 put_hevc_qpel_uni_w_h8_8_i8mm: 106.0 put_hevc_qpel_uni_w_h12_8_c: 1325.0 put_hevc_qpel_uni_w_h12_8_neon: 299.0 put_hevc_qpel_uni_w_h12_8_i8mm: 211.5 put_hevc_qpel_uni_w_h16_8_c: 2300.0 put_hevc_qpel_uni_w_h16_8_neon: 422.0 put_hevc_qpel_uni_w_h16_8_i8mm: 286.2 put_hevc_qpel_uni_w_h24_8_c: 5059.0 put_hevc_qpel_uni_w_h24_8_neon: 912.2 put_hevc_qpel_uni_w_h24_8_i8mm: 664.2 put_hevc_qpel_uni_w_h32_8_c: 9198.2 put_hevc_qpel_uni_w_h32_8_neon: 1638.2 put_hevc_qpel_uni_w_h32_8_i8mm: 1033.7 put_hevc_qpel_uni_w_h48_8_c: 20754.7 put_hevc_qpel_uni_w_h48_8_neon: 3633.7 put_hevc_qpel_uni_w_h48_8_i8mm: 2300.7 put_hevc_qpel_uni_w_h64_8_c: 36854.7 put_hevc_qpel_uni_w_h64_8_neon: 6435.7 put_hevc_qpel_uni_w_h64_8_i8mm: 4039.2 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:03:18 +02:00
Martin Storsjö	de23b384fd	aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm In addition to just templating, this contains one change to ff_hevc_put_hevc_epel_bi_hv32_8, by setting the w6 register which ff_hevc_put_hevc_epel_h32_8_neon requires. AWS Graviton 3: put_hevc_epel_bi_hv4_8_c: 176.5 put_hevc_epel_bi_hv4_8_neon: 62.0 put_hevc_epel_bi_hv4_8_i8mm: 58.0 put_hevc_epel_bi_hv6_8_c: 343.7 put_hevc_epel_bi_hv6_8_neon: 109.7 put_hevc_epel_bi_hv6_8_i8mm: 105.7 put_hevc_epel_bi_hv8_8_c: 536.0 put_hevc_epel_bi_hv8_8_neon: 112.7 put_hevc_epel_bi_hv8_8_i8mm: 111.7 put_hevc_epel_bi_hv12_8_c: 1107.7 put_hevc_epel_bi_hv12_8_neon: 254.7 put_hevc_epel_bi_hv12_8_i8mm: 239.0 put_hevc_epel_bi_hv16_8_c: 1927.7 put_hevc_epel_bi_hv16_8_neon: 356.2 put_hevc_epel_bi_hv16_8_i8mm: 334.2 put_hevc_epel_bi_hv24_8_c: 4195.2 put_hevc_epel_bi_hv24_8_neon: 736.7 put_hevc_epel_bi_hv24_8_i8mm: 715.5 put_hevc_epel_bi_hv32_8_c: 7280.5 put_hevc_epel_bi_hv32_8_neon: 1287.7 put_hevc_epel_bi_hv32_8_i8mm: 1162.2 put_hevc_epel_bi_hv48_8_c: 16857.7 put_hevc_epel_bi_hv48_8_neon: 2836.2 put_hevc_epel_bi_hv48_8_i8mm: 2908.5 put_hevc_epel_bi_hv64_8_c: 29248.2 put_hevc_epel_bi_hv64_8_neon: 5051.7 put_hevc_epel_bi_hv64_8_i8mm: 4491.5 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:03:16 +02:00

1 2 3 4 5 ...

49729 commits