Commit graph

49729 commits

Author SHA1 Message Date
Mark Thompson
c3665ee60f av1dec: Add force_integer_mv derived field for decoder use
This is not the same as the syntax element value in the frame header
because the specification parsing tables override the value on intra
frames.

(cherry picked from commit 6f56e0e7e5)
2024-05-06 21:33:03 +01:00
Mark Thompson
9963b9e3c9 av1dec: Fix RefFrameSignBias calculation
(cherry picked from commit ba6b08c75b)
2024-04-24 17:36:01 +02:00
James Almer
506fbe681c avcodec/codec_par: always clear extradata_size in avcodec_parameters_to_context()
Missed in d383ae43c2.

Signed-off-by: James Almer <jamrial@gmail.com>
(cherry picked from commit c4e3d6cdb0)
2024-04-24 00:17:16 -03:00
Zhao Zhili
13e93ffbfd avcodec/mediacodecenc: Fix return empty packet when bsf is used
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
(cherry picked from commit a5a3788f56)
2024-04-23 16:10:28 +08:00
Andreas Rheinhardt
2d3ee7c069 avcodec/hevcdec: Fix precedence, bogus film grain warning
Reviewed-by: Niklas Haas <ffmpeg@haasn.xyz>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
(cherry picked from commit bba996d6cd)
2024-04-22 23:43:03 +03:00
Niklas Haas
30002d58fa avcodec/hevcdec: fix segfault on invalid film grain metadata
Invalid input files may contain film grain metadata which survives
ff_h274_film_grain_params_supported() but does not pass
av_film_grain_params_select(), leading to a SIGSEGV on hevc_frame_end().

Fix this by duplicating the av_film_grain_params_select() check at frame
init time.

An alternative solution here would be to defer the incompatibility check
to hevc_frame_end(), but this has the downside of allocating a film
grain buffer even when we already know we can't apply film grain.

Fixes: https://trac.ffmpeg.org/ticket/10951
(cherry picked from commit 459648761f)
2024-04-22 23:43:03 +03:00
Frank Plowman
cbd98447bc lavc/vvc: Skip enhancement layer NAL units
The native VVC decoder does not yet support quality/spatial/multiview
scalability.  Bitstreams requiring this feature could cause crashes.
Patch fixes this by skipping NAL units which are not in the base layer,
warning the user while doing so.

Signed-off-by: Frank Plowman <post@frankplowman.com>
Signed-off-by: James Almer <jamrial@gmail.com>
(cherry picked from commit bb9e4ff355)
2024-04-18 22:29:31 -03:00
Lynne
8dfafe5366 vulkan_av1: add workaround for NVIDIA drivers tested on broken CTS
The first release of the CTS for AV1 decoding had incorrect
offsets for the OrderHints values.
The CTS will be fixed, and eventually, the drivers will be
updated to the proper spec-conforming behaviour, but we still
need to add a workaround as this will take months.

Only NVIDIA use these values at all, so limit the workaround
to only NVIDIA. Also, other vendors don't tend to provide accurate
CTS information.

(cherry picked from commit db09f1a5d8)
2024-04-16 18:14:32 +02:00
Mark Thompson
48721a415a lavc/vulkan_av1: Use av1dec reference order hint information
(cherry picked from commit 3cca8dfbd8)
2024-04-16 18:14:32 +02:00
Mark Thompson
0d851a82dd lavc/av1: Record reference ordering information for each frame
This is needed by Vulkan.  Constructing this can't be delegated to CBS
because packets might contain multiple frames (when non-shown frames are
present) but we need separate snapshots immediately before each frame
for the decoder.

(cherry picked from commit 22ced1edc6)
2024-04-16 18:14:32 +02:00
Andreas Rheinhardt
265de29acb avcodec/wavpack: Remove always-false check
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
(cherry picked from commit d307aca184)
2024-04-05 17:42:01 +02:00
Andreas Rheinhardt
607fca80b7 avcodec/wavpack: Fix leak and segfault on reallocation error
av_realloc_f() frees the buffer it is given on allocation
failure. But in this case, the buffer is an array of
ownership pointers, causing leaks on error. Furthermore,
the count of pointers is unchanged on error and the codec's
close function uses it to free said ownership pointers,
causing a NPD.
This is a regression since 46412a8935.

Fix this by switching to av_realloc_array().

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
(cherry picked from commit 2f59648aed)
2024-04-05 01:49:15 +02:00
Andreas Rheinhardt
82aa188281 avcodec/lossless_videoencdsp: Don't presume alignment in diff_bytes
The alignment of all the parameters in diff_bytes can be
anything the despite the documentation claiming otherwise.
8ecd383122 was based around
said documentation and is therefore insufficient to fix
e.g. the misaligned loads that happen in the huffyuvbgra
and huffyuvbgr24 vsynth FATE-tests.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
(cherry picked from commit a4800643bb)
2024-04-05 01:49:05 +02:00
Andreas Rheinhardt
0e3a46720a avcodec/ppc/h264dsp: Fix left shifts of negative numbers
PPC equivalent of c756b3fca2.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
(cherry picked from commit e54696bcaa)
2024-04-05 01:48:56 +02:00
Leo Izen
9a4c7b937f avcodec, avformat/ffjni: fix duplicate JNI symbols
Use SHLIBOBJS and STLIBOBJS in the Makefiles for avcodec and avformat,
and add a stub ffjni.c to libavformat, which allows the symbols to be
duplicated for shared builds but not static builds.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
Signed-off-by: Matthieu Bouron <matthieu.bouron@gmail.com>
2024-04-04 21:54:22 +02:00
Michael Niedermayer
1ef084f910
avcodec/wavarc: fix signed integer overflow in block type 6/19
Fixes: signed integer overflow: -2088796289 + -91276551 cannot be represented in type 'int'
Fixes: 67772/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WAVARC_fuzzer-6533568953122816

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 28c7094b25)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-04 21:12:15 +02:00
Michael Niedermayer
87e5bc918a
avcodec/exr: Dont use 64bits to hold 6bits
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit e3984de6ff)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-03 14:42:13 +02:00
Michael Niedermayer
8146cab801
avcodec/exr: Check for remaining bits in huf_unpack_enc_table()
Fixes: Timeout
Fixes: 67645/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_EXR_fuzzer-6308760977997824

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 589fa8a027)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-03 14:42:12 +02:00
Michael Niedermayer
5469ba6d74
avcodec/apedec: Use NABS to avoid undefined negation
Fixes: negation of -2147483648 cannot be represented in type 'int32_t' (aka 'int'); cast to an unsigned type to negate this value to itself
Fixes: 67738/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_APE_fuzzer-5444313212321792

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 1887ff250c)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-03 14:42:12 +02:00
Michael Niedermayer
e37d66a72e
avcodec/vvc/vvcdec: Do not submit frames without VVCFrameThread
Such frames will crash when pthread functions are called on the NULL pointer

Fixes: member access within null pointer of type 'VVCFrameThread' (aka 'struct VVCFrameThread')
Fixes: 65160/clusterfuzz-testcase-minimized-ffmpeg_BSF_VVC_METADATA_fuzzer-4665241535119360 (partly)
Fixes: 65636/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VVC_fuzzer-5394745824182272

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 84ce5ced31)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-03 14:42:11 +02:00
Michael Niedermayer
7e899776ec
avcodec/jpeg2000htdec: warn about non zero roi shift
Suggested-by: Tomas Härdin <git@haerdin.se>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 7b7eea8e63)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-03 14:42:09 +02:00
Michael Niedermayer
cc9d291fb0
avcodec/jpeg2000htdec: Check magp before using it in a shift
Fixes: shift exponent -1 is negative
Fixes: 65378/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_fuzzer-5457678193197056

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 19ad05e9e0)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-03 14:42:09 +02:00
Haihao Xiang
74e4e900bb lavc/vaapi_encode: convert from lambda to qp
When AV_CODEC_FLAG_QSCALE is set, the value of avctx->global_quality is
lambda.

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
(cherry picked from commit 1590a96adc)
2024-04-03 10:35:26 +08:00
Fei Wang
2d18c4906f lavc/vaapi_encode: Add VAAPI version check for BLBRC
Fix build fail when VAAPI version less than 0.39.2.

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
(cherry picked from commit 09377887df)
2024-04-03 10:32:59 +08:00
James Almer
112fdae9f9 avcodec/vvc_refs: don't ask for a "Inter layer ref" sample
The FATE suite has two already.

Signed-off-by: James Almer <jamrial@gmail.com>
(cherry picked from commit 45b56455ad)
2024-04-02 11:56:14 -03:00
Andreas Rheinhardt
dcbc1fdb3b avcodec/vlc, bitstream: Fix multi VLC with uint8_t syms on BE
VLC_MULTI_ELEM contains an uint8_t array that is supposed
to be treated as an array of uint16_t when the used symbols
have a size of two; otherwise it should be treated as just
an array of uint8_t, but it was not always treated that way:

vlc_multi_gen() initialized the first entry of the array
by writing the symbol via AV_WN16; on big endian systems,
the intended value was instead written into the second entry
of the array (where it would likely be overwritten lateron
during initialization).

read_vlc_multi() also treated this case incorrectly: In case
the code is so long that it needs a classical multi-stage lookup,
the symbol has been written to the destination as if via AV_WN16.
On little endian systems, this sets the correct first symbol and
clobbers (zeroes) the next one, but the next one will be overwritten
lateron anyway, so it won't be recognized. But on big-endian systems,
the first symbol will be set to zero and the actually read symbol
will be put into the slot for the next one (where it will be overwritten
lateron).

This commit fixes this; this fixes the magicyuv and utvideo FATE-tests
on big endian arches.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
(cherry picked from commit 4ab82d2fb6)
2024-04-02 14:32:00 +02:00
Timo Rothenpieler
4c5a809388 avcodec/nvenc: support SDK 12.2 bit depth API 2024-04-01 01:00:47 +02:00
Timo Rothenpieler
5ff5a431c7 avcodec/nvenc: stop using long deprecated format specifiers 2024-04-01 01:00:41 +02:00
Timo Rothenpieler
515949a15a avcodec/nvdec: reset bitstream_len/nb_slices when resetting bitstream pointer 2024-03-30 00:16:21 +01:00
Tong Wu
7fa569e34d avcodec/hevc_ps: fix the problem of memcmp losing effectiveness
HEVCHdrParams* receives a pointer which points to a dynamically
allocated memory block. It causes the memcmp always returning 1.
Add a function to do the comparision. A condition is also added to
avoid malloc(0).

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
Signed-off-by: James Almer <jamrial@gmail.com>
(cherry picked from commit 6bf17136a2)
2024-03-29 14:52:48 -03:00
Zhao Zhili
304208d40c avcodec/h264_mp4toannexb: Fix heap buffer overflow
Fixes: out of array write
Fixes: 64407/clusterfuzz-testcase-minimized-ffmpeg_BSF_H264_MP4TOANNEXB_fuzzer-4966763443650560

mp4toannexb_filter counts the number of bytes needed in the first
pass and allocate the memory, then do memcpy in the second pass.
Update sps/pps size in the loop makes the count invalid in the
case of SPS/PPS occur after IDR slice. This patch process in-band
SPS/PPS before the two pass loops.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
(cherry picked from commit 89e9486bc3)
2024-03-27 20:11:57 +08:00
Michael Niedermayer
872980ace6
Bump prior release/7.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-27 01:04:53 +01:00
Michael Niedermayer
1eb8cbd09c
avcodec/wavarc: avoid signed integer overflow in AC code
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WAVARC_fuzzer-659847401740697
Fixes: signed integer overflow: 65312 * 34078 cannot be represented in type 'int'

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
6009dd07bd
avcodec/wavarc: Avoid signed integer overflow in sample
Fixes: signed integer overflow: -2147483648 + -25122315 cannot be represented in type 'int'
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WAVARC_fuzzer-6199806972198912

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
ebdcf98499
avcodec/truemotion1: Height not being a multiple of 4 is unsupported
mb_change_bits is given space based on height >> 2, while more data is read

Fixes: out of array access
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TRUEMOTION1_fuzzer-5201925062590464.fuzz

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
d188a86730
avcodec/rtv1: fix undefined FFALIGN
Fixes: signed integer overflow: 2147483647 + 4 cannot be represented in type 'int'
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_RTV1_fuzzer-6324303861514240

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
7eabe56436
avcodec/qoadec: Fix undefined overflow in lms_predict
Fixes: signed integer overflow: -1575944192 + -602931200 cannot be represented in type 'int'
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_QOA_fuzzer-6470469339185152

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
48eeb198a5
avcodec/hcadec: do not allow code to continue after failed init
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488
Fixes: out of array write

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
addb85ea39
avcodec/hcadec: do not set hfr_group_count to invalid values
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488
Fixes: out of array write

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Dai, Jianhui J
61afe4d98c avcodec/cbs_vp8: Improve the bitstream position check
The VP8 compressed header may not be byte-aligned due to boolean
coding. Round up byte count for accurate data positioning.

Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2024-03-26 09:05:04 -04:00
Dai, Jianhui J
63dea3c1e1 avcodec/cbs_vp8: Use little endian in fixed()
This commit adds value range checks to cbs_vp8_read_unsigned_le,
migrates fixed() to use it, and enforces little-endian consistency for
all read methods.

Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2024-03-26 09:04:44 -04:00
Martin Storsjö
f872b19714 aarch64: hevc: Produce plain neon versions of qpel_bi_hv
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.

By allocating storage for h+8 rows, incrementing the stack
pointer won't end up at the right spot in the end. Store the
intended final stack pointer value in a register x14 which we
store on the stack.

AWS Graviton 3:
put_hevc_qpel_bi_hv4_8_c: 385.7
put_hevc_qpel_bi_hv4_8_neon: 131.0
put_hevc_qpel_bi_hv4_8_i8mm: 92.2
put_hevc_qpel_bi_hv6_8_c: 701.0
put_hevc_qpel_bi_hv6_8_neon: 239.5
put_hevc_qpel_bi_hv6_8_i8mm: 191.0
put_hevc_qpel_bi_hv8_8_c: 1162.0
put_hevc_qpel_bi_hv8_8_neon: 228.0
put_hevc_qpel_bi_hv8_8_i8mm: 225.2
put_hevc_qpel_bi_hv12_8_c: 2305.0
put_hevc_qpel_bi_hv12_8_neon: 558.0
put_hevc_qpel_bi_hv12_8_i8mm: 483.2
put_hevc_qpel_bi_hv16_8_c: 3965.2
put_hevc_qpel_bi_hv16_8_neon: 732.7
put_hevc_qpel_bi_hv16_8_i8mm: 656.5
put_hevc_qpel_bi_hv24_8_c: 8709.7
put_hevc_qpel_bi_hv24_8_neon: 1555.2
put_hevc_qpel_bi_hv24_8_i8mm: 1448.7
put_hevc_qpel_bi_hv32_8_c: 14818.0
put_hevc_qpel_bi_hv32_8_neon: 2763.7
put_hevc_qpel_bi_hv32_8_i8mm: 2468.0
put_hevc_qpel_bi_hv48_8_c: 32855.5
put_hevc_qpel_bi_hv48_8_neon: 6107.2
put_hevc_qpel_bi_hv48_8_i8mm: 5452.7
put_hevc_qpel_bi_hv64_8_c: 57591.5
put_hevc_qpel_bi_hv64_8_neon: 10660.2
put_hevc_qpel_bi_hv64_8_i8mm: 9580.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:55 +02:00
Martin Storsjö
d21b9a0411 aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.

AWS Graviton 3:
put_hevc_qpel_uni_w_hv4_8_c: 422.2
put_hevc_qpel_uni_w_hv4_8_neon: 140.7
put_hevc_qpel_uni_w_hv4_8_i8mm: 100.7
put_hevc_qpel_uni_w_hv8_8_c: 1208.0
put_hevc_qpel_uni_w_hv8_8_neon: 268.2
put_hevc_qpel_uni_w_hv8_8_i8mm: 261.5
put_hevc_qpel_uni_w_hv16_8_c: 4297.2
put_hevc_qpel_uni_w_hv16_8_neon: 802.2
put_hevc_qpel_uni_w_hv16_8_i8mm: 731.2
put_hevc_qpel_uni_w_hv32_8_c: 15518.5
put_hevc_qpel_uni_w_hv32_8_neon: 3085.2
put_hevc_qpel_uni_w_hv32_8_i8mm: 2783.2
put_hevc_qpel_uni_w_hv64_8_c: 57254.5
put_hevc_qpel_uni_w_hv64_8_neon: 11787.5
put_hevc_qpel_uni_w_hv64_8_i8mm: 10659.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:55 +02:00
Martin Storsjö
5ab138673b aarch64: hevc: Produce plain neon versions of qpel_uni_hv
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.

By allocating storage for h+8 rows, incrementing the stack
pointer won't end up at the right spot in the end. Store the
intended final stack pointer value in a register x14 which we
store on the stack.

AWS Graviton 3:
put_hevc_qpel_uni_hv4_8_c: 384.2
put_hevc_qpel_uni_hv4_8_neon: 127.5
put_hevc_qpel_uni_hv4_8_i8mm: 85.5
put_hevc_qpel_uni_hv6_8_c: 705.5
put_hevc_qpel_uni_hv6_8_neon: 224.5
put_hevc_qpel_uni_hv6_8_i8mm: 176.2
put_hevc_qpel_uni_hv8_8_c: 1136.5
put_hevc_qpel_uni_hv8_8_neon: 216.5
put_hevc_qpel_uni_hv8_8_i8mm: 214.0
put_hevc_qpel_uni_hv12_8_c: 2259.5
put_hevc_qpel_uni_hv12_8_neon: 498.5
put_hevc_qpel_uni_hv12_8_i8mm: 410.7
put_hevc_qpel_uni_hv16_8_c: 3824.7
put_hevc_qpel_uni_hv16_8_neon: 670.0
put_hevc_qpel_uni_hv16_8_i8mm: 603.7
put_hevc_qpel_uni_hv24_8_c: 8113.5
put_hevc_qpel_uni_hv24_8_neon: 1474.7
put_hevc_qpel_uni_hv24_8_i8mm: 1351.5
put_hevc_qpel_uni_hv32_8_c: 14744.5
put_hevc_qpel_uni_hv32_8_neon: 2599.7
put_hevc_qpel_uni_hv32_8_i8mm: 2266.0
put_hevc_qpel_uni_hv48_8_c: 32800.0
put_hevc_qpel_uni_hv48_8_neon: 5650.0
put_hevc_qpel_uni_hv48_8_i8mm: 5011.7
put_hevc_qpel_uni_hv64_8_c: 57856.2
put_hevc_qpel_uni_hv64_8_neon: 9863.5
put_hevc_qpel_uni_hv64_8_i8mm: 8767.7

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:55 +02:00
Martin Storsjö
5cbeefc79e aarch64: hevc: Produce plain neon versions of qpel_hv
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.

By allocating storage for h+8 rows, incrementing the stack
pointer won't end up at the right spot in the end. Store the
intended final stack pointer value in a register x14 which we
store on the stack.

AWS Graviton 3:
put_hevc_qpel_hv4_8_c: 386.0
put_hevc_qpel_hv4_8_neon: 125.7
put_hevc_qpel_hv4_8_i8mm: 83.2
put_hevc_qpel_hv6_8_c: 749.0
put_hevc_qpel_hv6_8_neon: 207.0
put_hevc_qpel_hv6_8_i8mm: 166.0
put_hevc_qpel_hv8_8_c: 1305.2
put_hevc_qpel_hv8_8_neon: 216.5
put_hevc_qpel_hv8_8_i8mm: 213.0
put_hevc_qpel_hv12_8_c: 2570.5
put_hevc_qpel_hv12_8_neon: 480.0
put_hevc_qpel_hv12_8_i8mm: 398.2
put_hevc_qpel_hv16_8_c: 4158.7
put_hevc_qpel_hv16_8_neon: 659.7
put_hevc_qpel_hv16_8_i8mm: 593.5
put_hevc_qpel_hv24_8_c: 8626.7
put_hevc_qpel_hv24_8_neon: 1653.5
put_hevc_qpel_hv24_8_i8mm: 1398.7
put_hevc_qpel_hv32_8_c: 14646.0
put_hevc_qpel_hv32_8_neon: 2566.2
put_hevc_qpel_hv32_8_i8mm: 2287.5
put_hevc_qpel_hv48_8_c: 31072.5
put_hevc_qpel_hv48_8_neon: 6228.5
put_hevc_qpel_hv48_8_i8mm: 5291.0
put_hevc_qpel_hv64_8_c: 53847.2
put_hevc_qpel_hv64_8_neon: 9856.7
put_hevc_qpel_hv64_8_i8mm: 8831.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:55 +02:00
Martin Storsjö
20c38f4b8d aarch64: hevc: Reorder qpel_hv functions to prepare for templating
This is a pure reordering of code without changing anything in
the individual functions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:50 +02:00
Martin Storsjö
4f71e4ebf2 aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions
The hv32 and hv64 functions were identical - both loop and
process 16 pixels at a time.

The hv16 function was near identical, except for the outer loop
(and using sp instead of a separate register).

Given the size of these functions, the extra cost of the outer
loop is negligible, so use the same function for hv16 as well.

This removes over 200 lines of duplicated assembly, and over 4 KB
of binary size.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:40 +02:00
Martin Storsjö
4063e50eec aarch64: hevc: Split the qpel_*_hv functions into two parts
The first horizontal filter can use either i8mm or plain neon
versions, while the second part is a pure neon implementation.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:29 +02:00
Martin Storsjö
ad01d06f91 aarch64: hevc: Implement a neon version of hevc_qpel_uni_w_h*_8
AWS Graviton 3:
put_hevc_qpel_uni_w_h4_8_c: 159.0
put_hevc_qpel_uni_w_h4_8_neon: 64.2
put_hevc_qpel_uni_w_h4_8_i8mm: 40.0
put_hevc_qpel_uni_w_h6_8_c: 344.7
put_hevc_qpel_uni_w_h6_8_neon: 114.5
put_hevc_qpel_uni_w_h6_8_i8mm: 82.0
put_hevc_qpel_uni_w_h8_8_c: 596.2
put_hevc_qpel_uni_w_h8_8_neon: 132.2
put_hevc_qpel_uni_w_h8_8_i8mm: 106.0
put_hevc_qpel_uni_w_h12_8_c: 1325.0
put_hevc_qpel_uni_w_h12_8_neon: 299.0
put_hevc_qpel_uni_w_h12_8_i8mm: 211.5
put_hevc_qpel_uni_w_h16_8_c: 2300.0
put_hevc_qpel_uni_w_h16_8_neon: 422.0
put_hevc_qpel_uni_w_h16_8_i8mm: 286.2
put_hevc_qpel_uni_w_h24_8_c: 5059.0
put_hevc_qpel_uni_w_h24_8_neon: 912.2
put_hevc_qpel_uni_w_h24_8_i8mm: 664.2
put_hevc_qpel_uni_w_h32_8_c: 9198.2
put_hevc_qpel_uni_w_h32_8_neon: 1638.2
put_hevc_qpel_uni_w_h32_8_i8mm: 1033.7
put_hevc_qpel_uni_w_h48_8_c: 20754.7
put_hevc_qpel_uni_w_h48_8_neon: 3633.7
put_hevc_qpel_uni_w_h48_8_i8mm: 2300.7
put_hevc_qpel_uni_w_h64_8_c: 36854.7
put_hevc_qpel_uni_w_h64_8_neon: 6435.7
put_hevc_qpel_uni_w_h64_8_i8mm: 4039.2

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:03:18 +02:00
Martin Storsjö
de23b384fd aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm
In addition to just templating, this contains one change to
ff_hevc_put_hevc_epel_bi_hv32_8, by setting the w6 register
which ff_hevc_put_hevc_epel_h32_8_neon requires.

AWS Graviton 3:
put_hevc_epel_bi_hv4_8_c: 176.5
put_hevc_epel_bi_hv4_8_neon: 62.0
put_hevc_epel_bi_hv4_8_i8mm: 58.0
put_hevc_epel_bi_hv6_8_c: 343.7
put_hevc_epel_bi_hv6_8_neon: 109.7
put_hevc_epel_bi_hv6_8_i8mm: 105.7
put_hevc_epel_bi_hv8_8_c: 536.0
put_hevc_epel_bi_hv8_8_neon: 112.7
put_hevc_epel_bi_hv8_8_i8mm: 111.7
put_hevc_epel_bi_hv12_8_c: 1107.7
put_hevc_epel_bi_hv12_8_neon: 254.7
put_hevc_epel_bi_hv12_8_i8mm: 239.0
put_hevc_epel_bi_hv16_8_c: 1927.7
put_hevc_epel_bi_hv16_8_neon: 356.2
put_hevc_epel_bi_hv16_8_i8mm: 334.2
put_hevc_epel_bi_hv24_8_c: 4195.2
put_hevc_epel_bi_hv24_8_neon: 736.7
put_hevc_epel_bi_hv24_8_i8mm: 715.5
put_hevc_epel_bi_hv32_8_c: 7280.5
put_hevc_epel_bi_hv32_8_neon: 1287.7
put_hevc_epel_bi_hv32_8_i8mm: 1162.2
put_hevc_epel_bi_hv48_8_c: 16857.7
put_hevc_epel_bi_hv48_8_neon: 2836.2
put_hevc_epel_bi_hv48_8_i8mm: 2908.5
put_hevc_epel_bi_hv64_8_c: 29248.2
put_hevc_epel_bi_hv64_8_neon: 5051.7
put_hevc_epel_bi_hv64_8_i8mm: 4491.5

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:03:16 +02:00