Commit graph

49678 commits

Author SHA1 Message Date
Andreas Rheinhardt
e4e6377afc avcodec/arm/mpegvideo_arm: Use static_assert to check offsets
Also move AV_CHECK_OFFSET to its only user, namely
lavc/arm/mpegvideo_arm.c and rename it to CHECK_OFFSET.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:43 +01:00
Andreas Rheinhardt
790f793844 avutil/common: Don't auto-include mem.h
There are lots of files that don't need it: The number of object
files that actually need it went down from 2011 to 884 here.

Keep it for external users in order to not cause breakages.

Also improve the other headers a bit while just at it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:43 +01:00
Andreas Rheinhardt
b616be1649 lib*/version: Use static_assert for static asserts
Also update the checks that guard against inserting
a new enum entry in the middle of a range.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00
Andreas Rheinhardt
c8549d480f avcodec/msmpeg4: Don't include x86-specific header unconditionally
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00
Andreas Rheinhardt
a265e8ca92 avcodec, avfilter: Don't use "" for system headers
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00
Andreas Rheinhardt
347a70f101 avcodec/pcm-bluray/dvd: Use correct pointer types on BE
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00
Andreas Rheinhardt
cd63dab55c avcodec/mips/ac3dsp_mips: Add missing includes
Likely broken in d7a75d2163.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00
Andreas Rheinhardt
dc7a60529c avcodec/ratecontrol: Use forward declaration for AVExpr
Avoids including eval.h everywhere where mpegvideo.h
is included.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-30 05:06:28 +01:00
Andreas Rheinhardt
348461e550 avcodec/h264_refs: Use smaller scope, don't use av_uninit
In particular, declare iterators with loop scope.
Also remove av_uninit while at it, because they
are now unnecessary due to the changes of the preceding
commit.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-30 05:06:28 +01:00
Andreas Rheinhardt
ac14d68277 avcodec/h264_refs: Rewrite code to make control flow clearer
While this change IMO makes the control flow clearer
for the human reader, it is especially important for
GCC: It erroneously believes that it is possible to
enter the SHORT2(UNUSED|LONG) cases without having
entered the preceding block that initializes pic,
frame_num, structure and j; it would emit -Wmaybe-uninitialized
warnings for these variables if they were not pseudo-
initialized with av_uninit(). This patch allows to remove
the latter.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-30 05:06:28 +01:00
Timo Rothenpieler
e99c273fec avcodec/nvdec: reset bitstream_len/nb_slices when resetting bitstream pointer 2024-03-30 00:12:23 +01:00
James Almer
547c920193 avcodec/hevc_ps: don't use a fixed sized buffer for parameter set raw data
Allocate it instead, and use it to compare sets instead of the parsed struct.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-03-29 15:34:21 -03:00
Tong Wu
6bf17136a2 avcodec/hevc_ps: fix the problem of memcmp losing effectiveness
HEVCHdrParams* receives a pointer which points to a dynamically
allocated memory block. It causes the memcmp always returning 1.
Add a function to do the comparision. A condition is also added to
avoid malloc(0).

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-03-29 12:35:54 -03:00
Anton Khirnov
c240ff98b3 lavc/packet: schedule AV_PKT_DATA_QUALITY_FACTOR for removal
It is unused internally and has been marked as deprecated a long time
ago.
2024-03-29 09:01:54 +01:00
Anton Khirnov
1d843ae6c7 lavc: rename avpacket.c to packet.c
For consistency with its API header packet.h.
2024-03-29 09:01:54 +01:00
Andreas Rheinhardt
8d1093a784 avcodec/libvpxenc: Remove obsolete av_unused
Forgotten in 753074721b.

Reviewed-by: James Zern <jzern@google.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-29 00:58:05 +01:00
Andreas Rheinhardt
1093b40218 avcodec/libvpxenc: Only search for side data when intending to use it
Also rewrite the code so that a variable that is only used
depending upon CONFIG_LIBVPX_VP9_ENCODER is not declared
outside of the #if block.
(The variable was declared with av_uninit, but it should have been
av_unused, as the former does not work for all compilers.)

Reviewed-by: James Zern <jzern@google.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-29 00:45:17 +01:00
Andreas Rheinhardt
e465cebfee avcodec/Makefile: Remove redundant dependencies on hevc_data.o
hevc_data.c only provides ff_hevc_diag_scan tables and
neither the QSV HEVC encoder nor the HEVC parser use these
directly and the indirect dependency is already accounted
for in the dependencies of the hevcparse subsystem since
b0c61209cd, so remove these
spurious dependencies.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-29 00:39:11 +01:00
Anton Khirnov
e0de84ad2e lavc/encode: map AVCodecContext.decoded_side_data to coded_side_data
This way it can be automagically propagated through the encoder to
muxing.
2024-03-28 08:40:11 +01:00
Anton Khirnov
a3f4670943 lavc/decode: move sd_global_map to avcodec
It will be shared with encoding code.
2024-03-28 08:40:01 +01:00
Anton Khirnov
e1f384adbf lavc/frame_thread_encoder: avoid assigning a whole AVCodecContext
It is highly unsafe, as AVCodecContext contains many allocated fields.
Almost everything needed by worker threads should be covered by
routing through AVCodecParameters and av_opt_copy(), except for a few
fields that are copied manually.

avcodec_free_context() can now be used for per-thread contexts.
2024-03-28 08:40:01 +01:00
Anton Khirnov
198a7788e7 lavc: avoid leaking AVCodecContext.chroma_intra_matrix 2024-03-28 08:40:01 +01:00
Andreas Rheinhardt
686d33a6b0 avcodec/profiles: Don't include avcodec.h
Forgotten in 8238bc0b5e.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-28 03:08:01 +01:00
Andreas Rheinhardt
33b1c7ebbf avcodec/magicyuvenc: Don't call functions twice due to macro
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-28 03:08:01 +01:00
Andreas Rheinhardt
8013574e9b avcodec/mjpegenc: Inline chroma subsampling
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-28 03:08:00 +01:00
Andreas Rheinhardt
0b212f3595 avcodec/bfi: Remove unused AVCodecContext* from context
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-28 03:06:13 +01:00
Andreas Rheinhardt
6edd83c0e2 avcodec/ratecontrol: Avoid function pointer casts
It is undefined behaviour to call a function with a different
signature for the call than the actual function signature;
there are no exceptions for void* and RateControlEntry*.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-28 03:06:13 +01:00
Andreas Rheinhardt
641850f67f avcodec/wmaprodec: Explicitly return 0 on success
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-28 03:06:13 +01:00
Zhao Zhili
89e9486bc3 avcodec/h264_mp4toannexb: Fix heap buffer overflow
Fixes: out of array write
Fixes: 64407/clusterfuzz-testcase-minimized-ffmpeg_BSF_H264_MP4TOANNEXB_fuzzer-4966763443650560

mp4toannexb_filter counts the number of bytes needed in the first
pass and allocate the memory, then do memcpy in the second pass.
Update sps/pps size in the loop makes the count invalid in the
case of SPS/PPS occur after IDR slice. This patch process in-band
SPS/PPS before the two pass loops.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-03-27 20:04:40 +08:00
Michael Niedermayer
6b213175c9
Bump after 7.0 branch point
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-27 01:04:54 +01:00
Michael Niedermayer
872980ace6
Bump prior release/7.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-27 01:04:53 +01:00
Michael Niedermayer
1eb8cbd09c
avcodec/wavarc: avoid signed integer overflow in AC code
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WAVARC_fuzzer-659847401740697
Fixes: signed integer overflow: 65312 * 34078 cannot be represented in type 'int'

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
6009dd07bd
avcodec/wavarc: Avoid signed integer overflow in sample
Fixes: signed integer overflow: -2147483648 + -25122315 cannot be represented in type 'int'
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WAVARC_fuzzer-6199806972198912

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
ebdcf98499
avcodec/truemotion1: Height not being a multiple of 4 is unsupported
mb_change_bits is given space based on height >> 2, while more data is read

Fixes: out of array access
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TRUEMOTION1_fuzzer-5201925062590464.fuzz

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
d188a86730
avcodec/rtv1: fix undefined FFALIGN
Fixes: signed integer overflow: 2147483647 + 4 cannot be represented in type 'int'
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_RTV1_fuzzer-6324303861514240

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
7eabe56436
avcodec/qoadec: Fix undefined overflow in lms_predict
Fixes: signed integer overflow: -1575944192 + -602931200 cannot be represented in type 'int'
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_QOA_fuzzer-6470469339185152

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
48eeb198a5
avcodec/hcadec: do not allow code to continue after failed init
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488
Fixes: out of array write

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
addb85ea39
avcodec/hcadec: do not set hfr_group_count to invalid values
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488
Fixes: out of array write

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Dai, Jianhui J
61afe4d98c avcodec/cbs_vp8: Improve the bitstream position check
The VP8 compressed header may not be byte-aligned due to boolean
coding. Round up byte count for accurate data positioning.

Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2024-03-26 09:05:04 -04:00
Dai, Jianhui J
63dea3c1e1 avcodec/cbs_vp8: Use little endian in fixed()
This commit adds value range checks to cbs_vp8_read_unsigned_le,
migrates fixed() to use it, and enforces little-endian consistency for
all read methods.

Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2024-03-26 09:04:44 -04:00
Martin Storsjö
f872b19714 aarch64: hevc: Produce plain neon versions of qpel_bi_hv
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.

By allocating storage for h+8 rows, incrementing the stack
pointer won't end up at the right spot in the end. Store the
intended final stack pointer value in a register x14 which we
store on the stack.

AWS Graviton 3:
put_hevc_qpel_bi_hv4_8_c: 385.7
put_hevc_qpel_bi_hv4_8_neon: 131.0
put_hevc_qpel_bi_hv4_8_i8mm: 92.2
put_hevc_qpel_bi_hv6_8_c: 701.0
put_hevc_qpel_bi_hv6_8_neon: 239.5
put_hevc_qpel_bi_hv6_8_i8mm: 191.0
put_hevc_qpel_bi_hv8_8_c: 1162.0
put_hevc_qpel_bi_hv8_8_neon: 228.0
put_hevc_qpel_bi_hv8_8_i8mm: 225.2
put_hevc_qpel_bi_hv12_8_c: 2305.0
put_hevc_qpel_bi_hv12_8_neon: 558.0
put_hevc_qpel_bi_hv12_8_i8mm: 483.2
put_hevc_qpel_bi_hv16_8_c: 3965.2
put_hevc_qpel_bi_hv16_8_neon: 732.7
put_hevc_qpel_bi_hv16_8_i8mm: 656.5
put_hevc_qpel_bi_hv24_8_c: 8709.7
put_hevc_qpel_bi_hv24_8_neon: 1555.2
put_hevc_qpel_bi_hv24_8_i8mm: 1448.7
put_hevc_qpel_bi_hv32_8_c: 14818.0
put_hevc_qpel_bi_hv32_8_neon: 2763.7
put_hevc_qpel_bi_hv32_8_i8mm: 2468.0
put_hevc_qpel_bi_hv48_8_c: 32855.5
put_hevc_qpel_bi_hv48_8_neon: 6107.2
put_hevc_qpel_bi_hv48_8_i8mm: 5452.7
put_hevc_qpel_bi_hv64_8_c: 57591.5
put_hevc_qpel_bi_hv64_8_neon: 10660.2
put_hevc_qpel_bi_hv64_8_i8mm: 9580.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:55 +02:00
Martin Storsjö
d21b9a0411 aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.

AWS Graviton 3:
put_hevc_qpel_uni_w_hv4_8_c: 422.2
put_hevc_qpel_uni_w_hv4_8_neon: 140.7
put_hevc_qpel_uni_w_hv4_8_i8mm: 100.7
put_hevc_qpel_uni_w_hv8_8_c: 1208.0
put_hevc_qpel_uni_w_hv8_8_neon: 268.2
put_hevc_qpel_uni_w_hv8_8_i8mm: 261.5
put_hevc_qpel_uni_w_hv16_8_c: 4297.2
put_hevc_qpel_uni_w_hv16_8_neon: 802.2
put_hevc_qpel_uni_w_hv16_8_i8mm: 731.2
put_hevc_qpel_uni_w_hv32_8_c: 15518.5
put_hevc_qpel_uni_w_hv32_8_neon: 3085.2
put_hevc_qpel_uni_w_hv32_8_i8mm: 2783.2
put_hevc_qpel_uni_w_hv64_8_c: 57254.5
put_hevc_qpel_uni_w_hv64_8_neon: 11787.5
put_hevc_qpel_uni_w_hv64_8_i8mm: 10659.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:55 +02:00
Martin Storsjö
5ab138673b aarch64: hevc: Produce plain neon versions of qpel_uni_hv
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.

By allocating storage for h+8 rows, incrementing the stack
pointer won't end up at the right spot in the end. Store the
intended final stack pointer value in a register x14 which we
store on the stack.

AWS Graviton 3:
put_hevc_qpel_uni_hv4_8_c: 384.2
put_hevc_qpel_uni_hv4_8_neon: 127.5
put_hevc_qpel_uni_hv4_8_i8mm: 85.5
put_hevc_qpel_uni_hv6_8_c: 705.5
put_hevc_qpel_uni_hv6_8_neon: 224.5
put_hevc_qpel_uni_hv6_8_i8mm: 176.2
put_hevc_qpel_uni_hv8_8_c: 1136.5
put_hevc_qpel_uni_hv8_8_neon: 216.5
put_hevc_qpel_uni_hv8_8_i8mm: 214.0
put_hevc_qpel_uni_hv12_8_c: 2259.5
put_hevc_qpel_uni_hv12_8_neon: 498.5
put_hevc_qpel_uni_hv12_8_i8mm: 410.7
put_hevc_qpel_uni_hv16_8_c: 3824.7
put_hevc_qpel_uni_hv16_8_neon: 670.0
put_hevc_qpel_uni_hv16_8_i8mm: 603.7
put_hevc_qpel_uni_hv24_8_c: 8113.5
put_hevc_qpel_uni_hv24_8_neon: 1474.7
put_hevc_qpel_uni_hv24_8_i8mm: 1351.5
put_hevc_qpel_uni_hv32_8_c: 14744.5
put_hevc_qpel_uni_hv32_8_neon: 2599.7
put_hevc_qpel_uni_hv32_8_i8mm: 2266.0
put_hevc_qpel_uni_hv48_8_c: 32800.0
put_hevc_qpel_uni_hv48_8_neon: 5650.0
put_hevc_qpel_uni_hv48_8_i8mm: 5011.7
put_hevc_qpel_uni_hv64_8_c: 57856.2
put_hevc_qpel_uni_hv64_8_neon: 9863.5
put_hevc_qpel_uni_hv64_8_i8mm: 8767.7

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:55 +02:00
Martin Storsjö
5cbeefc79e aarch64: hevc: Produce plain neon versions of qpel_hv
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.

By allocating storage for h+8 rows, incrementing the stack
pointer won't end up at the right spot in the end. Store the
intended final stack pointer value in a register x14 which we
store on the stack.

AWS Graviton 3:
put_hevc_qpel_hv4_8_c: 386.0
put_hevc_qpel_hv4_8_neon: 125.7
put_hevc_qpel_hv4_8_i8mm: 83.2
put_hevc_qpel_hv6_8_c: 749.0
put_hevc_qpel_hv6_8_neon: 207.0
put_hevc_qpel_hv6_8_i8mm: 166.0
put_hevc_qpel_hv8_8_c: 1305.2
put_hevc_qpel_hv8_8_neon: 216.5
put_hevc_qpel_hv8_8_i8mm: 213.0
put_hevc_qpel_hv12_8_c: 2570.5
put_hevc_qpel_hv12_8_neon: 480.0
put_hevc_qpel_hv12_8_i8mm: 398.2
put_hevc_qpel_hv16_8_c: 4158.7
put_hevc_qpel_hv16_8_neon: 659.7
put_hevc_qpel_hv16_8_i8mm: 593.5
put_hevc_qpel_hv24_8_c: 8626.7
put_hevc_qpel_hv24_8_neon: 1653.5
put_hevc_qpel_hv24_8_i8mm: 1398.7
put_hevc_qpel_hv32_8_c: 14646.0
put_hevc_qpel_hv32_8_neon: 2566.2
put_hevc_qpel_hv32_8_i8mm: 2287.5
put_hevc_qpel_hv48_8_c: 31072.5
put_hevc_qpel_hv48_8_neon: 6228.5
put_hevc_qpel_hv48_8_i8mm: 5291.0
put_hevc_qpel_hv64_8_c: 53847.2
put_hevc_qpel_hv64_8_neon: 9856.7
put_hevc_qpel_hv64_8_i8mm: 8831.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:55 +02:00
Martin Storsjö
20c38f4b8d aarch64: hevc: Reorder qpel_hv functions to prepare for templating
This is a pure reordering of code without changing anything in
the individual functions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:50 +02:00
Martin Storsjö
4f71e4ebf2 aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions
The hv32 and hv64 functions were identical - both loop and
process 16 pixels at a time.

The hv16 function was near identical, except for the outer loop
(and using sp instead of a separate register).

Given the size of these functions, the extra cost of the outer
loop is negligible, so use the same function for hv16 as well.

This removes over 200 lines of duplicated assembly, and over 4 KB
of binary size.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:40 +02:00
Martin Storsjö
4063e50eec aarch64: hevc: Split the qpel_*_hv functions into two parts
The first horizontal filter can use either i8mm or plain neon
versions, while the second part is a pure neon implementation.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:29 +02:00
Martin Storsjö
ad01d06f91 aarch64: hevc: Implement a neon version of hevc_qpel_uni_w_h*_8
AWS Graviton 3:
put_hevc_qpel_uni_w_h4_8_c: 159.0
put_hevc_qpel_uni_w_h4_8_neon: 64.2
put_hevc_qpel_uni_w_h4_8_i8mm: 40.0
put_hevc_qpel_uni_w_h6_8_c: 344.7
put_hevc_qpel_uni_w_h6_8_neon: 114.5
put_hevc_qpel_uni_w_h6_8_i8mm: 82.0
put_hevc_qpel_uni_w_h8_8_c: 596.2
put_hevc_qpel_uni_w_h8_8_neon: 132.2
put_hevc_qpel_uni_w_h8_8_i8mm: 106.0
put_hevc_qpel_uni_w_h12_8_c: 1325.0
put_hevc_qpel_uni_w_h12_8_neon: 299.0
put_hevc_qpel_uni_w_h12_8_i8mm: 211.5
put_hevc_qpel_uni_w_h16_8_c: 2300.0
put_hevc_qpel_uni_w_h16_8_neon: 422.0
put_hevc_qpel_uni_w_h16_8_i8mm: 286.2
put_hevc_qpel_uni_w_h24_8_c: 5059.0
put_hevc_qpel_uni_w_h24_8_neon: 912.2
put_hevc_qpel_uni_w_h24_8_i8mm: 664.2
put_hevc_qpel_uni_w_h32_8_c: 9198.2
put_hevc_qpel_uni_w_h32_8_neon: 1638.2
put_hevc_qpel_uni_w_h32_8_i8mm: 1033.7
put_hevc_qpel_uni_w_h48_8_c: 20754.7
put_hevc_qpel_uni_w_h48_8_neon: 3633.7
put_hevc_qpel_uni_w_h48_8_i8mm: 2300.7
put_hevc_qpel_uni_w_h64_8_c: 36854.7
put_hevc_qpel_uni_w_h64_8_neon: 6435.7
put_hevc_qpel_uni_w_h64_8_i8mm: 4039.2

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:03:18 +02:00
Martin Storsjö
de23b384fd aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm
In addition to just templating, this contains one change to
ff_hevc_put_hevc_epel_bi_hv32_8, by setting the w6 register
which ff_hevc_put_hevc_epel_h32_8_neon requires.

AWS Graviton 3:
put_hevc_epel_bi_hv4_8_c: 176.5
put_hevc_epel_bi_hv4_8_neon: 62.0
put_hevc_epel_bi_hv4_8_i8mm: 58.0
put_hevc_epel_bi_hv6_8_c: 343.7
put_hevc_epel_bi_hv6_8_neon: 109.7
put_hevc_epel_bi_hv6_8_i8mm: 105.7
put_hevc_epel_bi_hv8_8_c: 536.0
put_hevc_epel_bi_hv8_8_neon: 112.7
put_hevc_epel_bi_hv8_8_i8mm: 111.7
put_hevc_epel_bi_hv12_8_c: 1107.7
put_hevc_epel_bi_hv12_8_neon: 254.7
put_hevc_epel_bi_hv12_8_i8mm: 239.0
put_hevc_epel_bi_hv16_8_c: 1927.7
put_hevc_epel_bi_hv16_8_neon: 356.2
put_hevc_epel_bi_hv16_8_i8mm: 334.2
put_hevc_epel_bi_hv24_8_c: 4195.2
put_hevc_epel_bi_hv24_8_neon: 736.7
put_hevc_epel_bi_hv24_8_i8mm: 715.5
put_hevc_epel_bi_hv32_8_c: 7280.5
put_hevc_epel_bi_hv32_8_neon: 1287.7
put_hevc_epel_bi_hv32_8_i8mm: 1162.2
put_hevc_epel_bi_hv48_8_c: 16857.7
put_hevc_epel_bi_hv48_8_neon: 2836.2
put_hevc_epel_bi_hv48_8_i8mm: 2908.5
put_hevc_epel_bi_hv64_8_c: 29248.2
put_hevc_epel_bi_hv64_8_neon: 5051.7
put_hevc_epel_bi_hv64_8_i8mm: 4491.5

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:03:16 +02:00
Martin Storsjö
96e5adda9f aarch64: hevc: Produce epel_uni_w_hv functions for both neon and i8mm
AWS Graviton 3:
put_hevc_epel_uni_w_hv4_8_c: 191.2
put_hevc_epel_uni_w_hv4_8_neon: 87.7
put_hevc_epel_uni_w_hv4_8_i8mm: 83.2
put_hevc_epel_uni_w_hv6_8_c: 349.5
put_hevc_epel_uni_w_hv6_8_neon: 153.0
put_hevc_epel_uni_w_hv6_8_i8mm: 148.5
put_hevc_epel_uni_w_hv8_8_c: 581.2
put_hevc_epel_uni_w_hv8_8_neon: 166.7
put_hevc_epel_uni_w_hv8_8_i8mm: 163.5
put_hevc_epel_uni_w_hv12_8_c: 1230.0
put_hevc_epel_uni_w_hv12_8_neon: 387.7
put_hevc_epel_uni_w_hv12_8_i8mm: 370.2
put_hevc_epel_uni_w_hv16_8_c: 2003.2
put_hevc_epel_uni_w_hv16_8_neon: 501.5
put_hevc_epel_uni_w_hv16_8_i8mm: 490.2
put_hevc_epel_uni_w_hv24_8_c: 4448.7
put_hevc_epel_uni_w_hv24_8_neon: 1092.2
put_hevc_epel_uni_w_hv24_8_i8mm: 1069.7
put_hevc_epel_uni_w_hv32_8_c: 7817.2
put_hevc_epel_uni_w_hv32_8_neon: 1916.2
put_hevc_epel_uni_w_hv32_8_i8mm: 1829.5
put_hevc_epel_uni_w_hv48_8_c: 16728.2
put_hevc_epel_uni_w_hv48_8_neon: 4263.7
put_hevc_epel_uni_w_hv48_8_i8mm: 4342.7
put_hevc_epel_uni_w_hv64_8_c: 29563.2
put_hevc_epel_uni_w_hv64_8_neon: 7474.2
put_hevc_epel_uni_w_hv64_8_i8mm: 7128.5

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:59:58 +02:00