Commit graph

49645 commits

Author SHA1 Message Date
Michael Niedermayer
ebdcf98499
avcodec/truemotion1: Height not being a multiple of 4 is unsupported
mb_change_bits is given space based on height >> 2, while more data is read

Fixes: out of array access
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TRUEMOTION1_fuzzer-5201925062590464.fuzz

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
d188a86730
avcodec/rtv1: fix undefined FFALIGN
Fixes: signed integer overflow: 2147483647 + 4 cannot be represented in type 'int'
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_RTV1_fuzzer-6324303861514240

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
7eabe56436
avcodec/qoadec: Fix undefined overflow in lms_predict
Fixes: signed integer overflow: -1575944192 + -602931200 cannot be represented in type 'int'
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_QOA_fuzzer-6470469339185152

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
48eeb198a5
avcodec/hcadec: do not allow code to continue after failed init
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488
Fixes: out of array write

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Michael Niedermayer
addb85ea39
avcodec/hcadec: do not set hfr_group_count to invalid values
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488
Fixes: out of array write

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 23:19:49 +01:00
Dai, Jianhui J
61afe4d98c avcodec/cbs_vp8: Improve the bitstream position check
The VP8 compressed header may not be byte-aligned due to boolean
coding. Round up byte count for accurate data positioning.

Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2024-03-26 09:05:04 -04:00
Dai, Jianhui J
63dea3c1e1 avcodec/cbs_vp8: Use little endian in fixed()
This commit adds value range checks to cbs_vp8_read_unsigned_le,
migrates fixed() to use it, and enforces little-endian consistency for
all read methods.

Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2024-03-26 09:04:44 -04:00
Martin Storsjö
f872b19714 aarch64: hevc: Produce plain neon versions of qpel_bi_hv
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.

By allocating storage for h+8 rows, incrementing the stack
pointer won't end up at the right spot in the end. Store the
intended final stack pointer value in a register x14 which we
store on the stack.

AWS Graviton 3:
put_hevc_qpel_bi_hv4_8_c: 385.7
put_hevc_qpel_bi_hv4_8_neon: 131.0
put_hevc_qpel_bi_hv4_8_i8mm: 92.2
put_hevc_qpel_bi_hv6_8_c: 701.0
put_hevc_qpel_bi_hv6_8_neon: 239.5
put_hevc_qpel_bi_hv6_8_i8mm: 191.0
put_hevc_qpel_bi_hv8_8_c: 1162.0
put_hevc_qpel_bi_hv8_8_neon: 228.0
put_hevc_qpel_bi_hv8_8_i8mm: 225.2
put_hevc_qpel_bi_hv12_8_c: 2305.0
put_hevc_qpel_bi_hv12_8_neon: 558.0
put_hevc_qpel_bi_hv12_8_i8mm: 483.2
put_hevc_qpel_bi_hv16_8_c: 3965.2
put_hevc_qpel_bi_hv16_8_neon: 732.7
put_hevc_qpel_bi_hv16_8_i8mm: 656.5
put_hevc_qpel_bi_hv24_8_c: 8709.7
put_hevc_qpel_bi_hv24_8_neon: 1555.2
put_hevc_qpel_bi_hv24_8_i8mm: 1448.7
put_hevc_qpel_bi_hv32_8_c: 14818.0
put_hevc_qpel_bi_hv32_8_neon: 2763.7
put_hevc_qpel_bi_hv32_8_i8mm: 2468.0
put_hevc_qpel_bi_hv48_8_c: 32855.5
put_hevc_qpel_bi_hv48_8_neon: 6107.2
put_hevc_qpel_bi_hv48_8_i8mm: 5452.7
put_hevc_qpel_bi_hv64_8_c: 57591.5
put_hevc_qpel_bi_hv64_8_neon: 10660.2
put_hevc_qpel_bi_hv64_8_i8mm: 9580.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:55 +02:00
Martin Storsjö
d21b9a0411 aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.

AWS Graviton 3:
put_hevc_qpel_uni_w_hv4_8_c: 422.2
put_hevc_qpel_uni_w_hv4_8_neon: 140.7
put_hevc_qpel_uni_w_hv4_8_i8mm: 100.7
put_hevc_qpel_uni_w_hv8_8_c: 1208.0
put_hevc_qpel_uni_w_hv8_8_neon: 268.2
put_hevc_qpel_uni_w_hv8_8_i8mm: 261.5
put_hevc_qpel_uni_w_hv16_8_c: 4297.2
put_hevc_qpel_uni_w_hv16_8_neon: 802.2
put_hevc_qpel_uni_w_hv16_8_i8mm: 731.2
put_hevc_qpel_uni_w_hv32_8_c: 15518.5
put_hevc_qpel_uni_w_hv32_8_neon: 3085.2
put_hevc_qpel_uni_w_hv32_8_i8mm: 2783.2
put_hevc_qpel_uni_w_hv64_8_c: 57254.5
put_hevc_qpel_uni_w_hv64_8_neon: 11787.5
put_hevc_qpel_uni_w_hv64_8_i8mm: 10659.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:55 +02:00
Martin Storsjö
5ab138673b aarch64: hevc: Produce plain neon versions of qpel_uni_hv
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.

By allocating storage for h+8 rows, incrementing the stack
pointer won't end up at the right spot in the end. Store the
intended final stack pointer value in a register x14 which we
store on the stack.

AWS Graviton 3:
put_hevc_qpel_uni_hv4_8_c: 384.2
put_hevc_qpel_uni_hv4_8_neon: 127.5
put_hevc_qpel_uni_hv4_8_i8mm: 85.5
put_hevc_qpel_uni_hv6_8_c: 705.5
put_hevc_qpel_uni_hv6_8_neon: 224.5
put_hevc_qpel_uni_hv6_8_i8mm: 176.2
put_hevc_qpel_uni_hv8_8_c: 1136.5
put_hevc_qpel_uni_hv8_8_neon: 216.5
put_hevc_qpel_uni_hv8_8_i8mm: 214.0
put_hevc_qpel_uni_hv12_8_c: 2259.5
put_hevc_qpel_uni_hv12_8_neon: 498.5
put_hevc_qpel_uni_hv12_8_i8mm: 410.7
put_hevc_qpel_uni_hv16_8_c: 3824.7
put_hevc_qpel_uni_hv16_8_neon: 670.0
put_hevc_qpel_uni_hv16_8_i8mm: 603.7
put_hevc_qpel_uni_hv24_8_c: 8113.5
put_hevc_qpel_uni_hv24_8_neon: 1474.7
put_hevc_qpel_uni_hv24_8_i8mm: 1351.5
put_hevc_qpel_uni_hv32_8_c: 14744.5
put_hevc_qpel_uni_hv32_8_neon: 2599.7
put_hevc_qpel_uni_hv32_8_i8mm: 2266.0
put_hevc_qpel_uni_hv48_8_c: 32800.0
put_hevc_qpel_uni_hv48_8_neon: 5650.0
put_hevc_qpel_uni_hv48_8_i8mm: 5011.7
put_hevc_qpel_uni_hv64_8_c: 57856.2
put_hevc_qpel_uni_hv64_8_neon: 9863.5
put_hevc_qpel_uni_hv64_8_i8mm: 8767.7

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:55 +02:00
Martin Storsjö
5cbeefc79e aarch64: hevc: Produce plain neon versions of qpel_hv
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.

By allocating storage for h+8 rows, incrementing the stack
pointer won't end up at the right spot in the end. Store the
intended final stack pointer value in a register x14 which we
store on the stack.

AWS Graviton 3:
put_hevc_qpel_hv4_8_c: 386.0
put_hevc_qpel_hv4_8_neon: 125.7
put_hevc_qpel_hv4_8_i8mm: 83.2
put_hevc_qpel_hv6_8_c: 749.0
put_hevc_qpel_hv6_8_neon: 207.0
put_hevc_qpel_hv6_8_i8mm: 166.0
put_hevc_qpel_hv8_8_c: 1305.2
put_hevc_qpel_hv8_8_neon: 216.5
put_hevc_qpel_hv8_8_i8mm: 213.0
put_hevc_qpel_hv12_8_c: 2570.5
put_hevc_qpel_hv12_8_neon: 480.0
put_hevc_qpel_hv12_8_i8mm: 398.2
put_hevc_qpel_hv16_8_c: 4158.7
put_hevc_qpel_hv16_8_neon: 659.7
put_hevc_qpel_hv16_8_i8mm: 593.5
put_hevc_qpel_hv24_8_c: 8626.7
put_hevc_qpel_hv24_8_neon: 1653.5
put_hevc_qpel_hv24_8_i8mm: 1398.7
put_hevc_qpel_hv32_8_c: 14646.0
put_hevc_qpel_hv32_8_neon: 2566.2
put_hevc_qpel_hv32_8_i8mm: 2287.5
put_hevc_qpel_hv48_8_c: 31072.5
put_hevc_qpel_hv48_8_neon: 6228.5
put_hevc_qpel_hv48_8_i8mm: 5291.0
put_hevc_qpel_hv64_8_c: 53847.2
put_hevc_qpel_hv64_8_neon: 9856.7
put_hevc_qpel_hv64_8_i8mm: 8831.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:55 +02:00
Martin Storsjö
20c38f4b8d aarch64: hevc: Reorder qpel_hv functions to prepare for templating
This is a pure reordering of code without changing anything in
the individual functions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:50 +02:00
Martin Storsjö
4f71e4ebf2 aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions
The hv32 and hv64 functions were identical - both loop and
process 16 pixels at a time.

The hv16 function was near identical, except for the outer loop
(and using sp instead of a separate register).

Given the size of these functions, the extra cost of the outer
loop is negligible, so use the same function for hv16 as well.

This removes over 200 lines of duplicated assembly, and over 4 KB
of binary size.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:40 +02:00
Martin Storsjö
4063e50eec aarch64: hevc: Split the qpel_*_hv functions into two parts
The first horizontal filter can use either i8mm or plain neon
versions, while the second part is a pure neon implementation.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:05:29 +02:00
Martin Storsjö
ad01d06f91 aarch64: hevc: Implement a neon version of hevc_qpel_uni_w_h*_8
AWS Graviton 3:
put_hevc_qpel_uni_w_h4_8_c: 159.0
put_hevc_qpel_uni_w_h4_8_neon: 64.2
put_hevc_qpel_uni_w_h4_8_i8mm: 40.0
put_hevc_qpel_uni_w_h6_8_c: 344.7
put_hevc_qpel_uni_w_h6_8_neon: 114.5
put_hevc_qpel_uni_w_h6_8_i8mm: 82.0
put_hevc_qpel_uni_w_h8_8_c: 596.2
put_hevc_qpel_uni_w_h8_8_neon: 132.2
put_hevc_qpel_uni_w_h8_8_i8mm: 106.0
put_hevc_qpel_uni_w_h12_8_c: 1325.0
put_hevc_qpel_uni_w_h12_8_neon: 299.0
put_hevc_qpel_uni_w_h12_8_i8mm: 211.5
put_hevc_qpel_uni_w_h16_8_c: 2300.0
put_hevc_qpel_uni_w_h16_8_neon: 422.0
put_hevc_qpel_uni_w_h16_8_i8mm: 286.2
put_hevc_qpel_uni_w_h24_8_c: 5059.0
put_hevc_qpel_uni_w_h24_8_neon: 912.2
put_hevc_qpel_uni_w_h24_8_i8mm: 664.2
put_hevc_qpel_uni_w_h32_8_c: 9198.2
put_hevc_qpel_uni_w_h32_8_neon: 1638.2
put_hevc_qpel_uni_w_h32_8_i8mm: 1033.7
put_hevc_qpel_uni_w_h48_8_c: 20754.7
put_hevc_qpel_uni_w_h48_8_neon: 3633.7
put_hevc_qpel_uni_w_h48_8_i8mm: 2300.7
put_hevc_qpel_uni_w_h64_8_c: 36854.7
put_hevc_qpel_uni_w_h64_8_neon: 6435.7
put_hevc_qpel_uni_w_h64_8_i8mm: 4039.2

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:03:18 +02:00
Martin Storsjö
de23b384fd aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm
In addition to just templating, this contains one change to
ff_hevc_put_hevc_epel_bi_hv32_8, by setting the w6 register
which ff_hevc_put_hevc_epel_h32_8_neon requires.

AWS Graviton 3:
put_hevc_epel_bi_hv4_8_c: 176.5
put_hevc_epel_bi_hv4_8_neon: 62.0
put_hevc_epel_bi_hv4_8_i8mm: 58.0
put_hevc_epel_bi_hv6_8_c: 343.7
put_hevc_epel_bi_hv6_8_neon: 109.7
put_hevc_epel_bi_hv6_8_i8mm: 105.7
put_hevc_epel_bi_hv8_8_c: 536.0
put_hevc_epel_bi_hv8_8_neon: 112.7
put_hevc_epel_bi_hv8_8_i8mm: 111.7
put_hevc_epel_bi_hv12_8_c: 1107.7
put_hevc_epel_bi_hv12_8_neon: 254.7
put_hevc_epel_bi_hv12_8_i8mm: 239.0
put_hevc_epel_bi_hv16_8_c: 1927.7
put_hevc_epel_bi_hv16_8_neon: 356.2
put_hevc_epel_bi_hv16_8_i8mm: 334.2
put_hevc_epel_bi_hv24_8_c: 4195.2
put_hevc_epel_bi_hv24_8_neon: 736.7
put_hevc_epel_bi_hv24_8_i8mm: 715.5
put_hevc_epel_bi_hv32_8_c: 7280.5
put_hevc_epel_bi_hv32_8_neon: 1287.7
put_hevc_epel_bi_hv32_8_i8mm: 1162.2
put_hevc_epel_bi_hv48_8_c: 16857.7
put_hevc_epel_bi_hv48_8_neon: 2836.2
put_hevc_epel_bi_hv48_8_i8mm: 2908.5
put_hevc_epel_bi_hv64_8_c: 29248.2
put_hevc_epel_bi_hv64_8_neon: 5051.7
put_hevc_epel_bi_hv64_8_i8mm: 4491.5

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:03:16 +02:00
Martin Storsjö
96e5adda9f aarch64: hevc: Produce epel_uni_w_hv functions for both neon and i8mm
AWS Graviton 3:
put_hevc_epel_uni_w_hv4_8_c: 191.2
put_hevc_epel_uni_w_hv4_8_neon: 87.7
put_hevc_epel_uni_w_hv4_8_i8mm: 83.2
put_hevc_epel_uni_w_hv6_8_c: 349.5
put_hevc_epel_uni_w_hv6_8_neon: 153.0
put_hevc_epel_uni_w_hv6_8_i8mm: 148.5
put_hevc_epel_uni_w_hv8_8_c: 581.2
put_hevc_epel_uni_w_hv8_8_neon: 166.7
put_hevc_epel_uni_w_hv8_8_i8mm: 163.5
put_hevc_epel_uni_w_hv12_8_c: 1230.0
put_hevc_epel_uni_w_hv12_8_neon: 387.7
put_hevc_epel_uni_w_hv12_8_i8mm: 370.2
put_hevc_epel_uni_w_hv16_8_c: 2003.2
put_hevc_epel_uni_w_hv16_8_neon: 501.5
put_hevc_epel_uni_w_hv16_8_i8mm: 490.2
put_hevc_epel_uni_w_hv24_8_c: 4448.7
put_hevc_epel_uni_w_hv24_8_neon: 1092.2
put_hevc_epel_uni_w_hv24_8_i8mm: 1069.7
put_hevc_epel_uni_w_hv32_8_c: 7817.2
put_hevc_epel_uni_w_hv32_8_neon: 1916.2
put_hevc_epel_uni_w_hv32_8_i8mm: 1829.5
put_hevc_epel_uni_w_hv48_8_c: 16728.2
put_hevc_epel_uni_w_hv48_8_neon: 4263.7
put_hevc_epel_uni_w_hv48_8_i8mm: 4342.7
put_hevc_epel_uni_w_hv64_8_c: 29563.2
put_hevc_epel_uni_w_hv64_8_neon: 7474.2
put_hevc_epel_uni_w_hv64_8_i8mm: 7128.5

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:59:58 +02:00
Martin Storsjö
d7294199ab aarch64: hevc: Produce epel_uni_hv functions for both neon and i8mm
AWS Graviton 3:
put_hevc_epel_uni_hv4_8_c: 163.5
put_hevc_epel_uni_hv4_8_neon: 59.7
put_hevc_epel_uni_hv4_8_i8mm: 57.5
put_hevc_epel_uni_hv6_8_c: 344.7
put_hevc_epel_uni_hv6_8_neon: 105.0
put_hevc_epel_uni_hv6_8_i8mm: 102.7
put_hevc_epel_uni_hv8_8_c: 552.2
put_hevc_epel_uni_hv8_8_neon: 111.2
put_hevc_epel_uni_hv8_8_i8mm: 104.0
put_hevc_epel_uni_hv12_8_c: 1195.0
put_hevc_epel_uni_hv12_8_neon: 248.7
put_hevc_epel_uni_hv12_8_i8mm: 229.5
put_hevc_epel_uni_hv16_8_c: 1910.2
put_hevc_epel_uni_hv16_8_neon: 339.5
put_hevc_epel_uni_hv16_8_i8mm: 323.2
put_hevc_epel_uni_hv24_8_c: 4048.2
put_hevc_epel_uni_hv24_8_neon: 737.7
put_hevc_epel_uni_hv24_8_i8mm: 713.7
put_hevc_epel_uni_hv32_8_c: 6865.7
put_hevc_epel_uni_hv32_8_neon: 1285.0
put_hevc_epel_uni_hv32_8_i8mm: 1206.0
put_hevc_epel_uni_hv48_8_c: 15830.5
put_hevc_epel_uni_hv48_8_neon: 2844.7
put_hevc_epel_uni_hv48_8_i8mm: 2914.0
put_hevc_epel_uni_hv64_8_c: 27912.7
put_hevc_epel_uni_hv64_8_neon: 4970.5
put_hevc_epel_uni_hv64_8_i8mm: 4653.7

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:59:28 +02:00
Martin Storsjö
7bf3d14769 aarch64: hevc: Produce epel_hv functions for both plain neon and i8mm
AWS Graviton 3:
put_hevc_epel_hv4_8_c: 163.7
put_hevc_epel_hv4_8_neon: 52.5
put_hevc_epel_hv4_8_i8mm: 49.5
put_hevc_epel_hv6_8_c: 292.2
put_hevc_epel_hv6_8_neon: 97.7
put_hevc_epel_hv6_8_i8mm: 101.2
put_hevc_epel_hv8_8_c: 471.0
put_hevc_epel_hv8_8_neon: 106.7
put_hevc_epel_hv8_8_i8mm: 102.5
put_hevc_epel_hv12_8_c: 1030.2
put_hevc_epel_hv12_8_neon: 240.5
put_hevc_epel_hv12_8_i8mm: 215.0
put_hevc_epel_hv16_8_c: 1711.5
put_hevc_epel_hv16_8_neon: 340.2
put_hevc_epel_hv16_8_i8mm: 319.2
put_hevc_epel_hv24_8_c: 3670.0
put_hevc_epel_hv24_8_neon: 702.0
put_hevc_epel_hv24_8_i8mm: 666.5
put_hevc_epel_hv32_8_c: 6785.5
put_hevc_epel_hv32_8_neon: 1247.0
put_hevc_epel_hv32_8_i8mm: 1169.0
put_hevc_epel_hv48_8_c: 14689.7
put_hevc_epel_hv48_8_neon: 2665.2
put_hevc_epel_hv48_8_i8mm: 2740.0
put_hevc_epel_hv64_8_c: 25899.2
put_hevc_epel_hv64_8_neon: 4801.2
put_hevc_epel_hv64_8_i8mm: 4487.7

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:59:19 +02:00
Martin Storsjö
5b5666e5ab aarch64: hevc: Reorder epel_hv functions to prepare for templating
This is a pure reordering of code without changing anything in
the individual functions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:59:07 +02:00
Martin Storsjö
e6d4c0e117 aarch64: hevc: Split the epel_*_hv functions into two parts
The first horizontal filter can use either i8mm or plain neon
versions, while the second part is a pure neon implementation.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:59:00 +02:00
Martin Storsjö
54af555bfa aarch64: hevc: Implement a neon version of hevc_epel_uni_w_h*_8
AWS Graviton 3:
put_hevc_epel_uni_w_h4_8_c: 97.2
put_hevc_epel_uni_w_h4_8_neon: 41.2
put_hevc_epel_uni_w_h4_8_i8mm: 35.2
put_hevc_epel_uni_w_h6_8_c: 203.7
put_hevc_epel_uni_w_h6_8_neon: 84.7
put_hevc_epel_uni_w_h6_8_i8mm: 74.7
put_hevc_epel_uni_w_h8_8_c: 345.7
put_hevc_epel_uni_w_h8_8_neon: 94.0
put_hevc_epel_uni_w_h8_8_i8mm: 80.7
put_hevc_epel_uni_w_h12_8_c: 768.7
put_hevc_epel_uni_w_h12_8_neon: 196.7
put_hevc_epel_uni_w_h12_8_i8mm: 169.7
put_hevc_epel_uni_w_h16_8_c: 1313.0
put_hevc_epel_uni_w_h16_8_neon: 290.7
put_hevc_epel_uni_w_h16_8_i8mm: 238.0
put_hevc_epel_uni_w_h24_8_c: 2877.5
put_hevc_epel_uni_w_h24_8_neon: 650.0
put_hevc_epel_uni_w_h24_8_i8mm: 512.0
put_hevc_epel_uni_w_h32_8_c: 5113.5
put_hevc_epel_uni_w_h32_8_neon: 1129.5
put_hevc_epel_uni_w_h32_8_i8mm: 739.2
put_hevc_epel_uni_w_h48_8_c: 11757.0
put_hevc_epel_uni_w_h48_8_neon: 2518.7
put_hevc_epel_uni_w_h48_8_i8mm: 1688.5
put_hevc_epel_uni_w_h64_8_c: 20478.0
put_hevc_epel_uni_w_h64_8_neon: 4411.7
put_hevc_epel_uni_w_h64_8_i8mm: 2884.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:58:47 +02:00
Martin Storsjö
6d384298ec aarch64: hevc: Implement a neon version of put_hevc_epel_h*_8
AWS Graviton 3:
put_hevc_epel_h4_8_c: 64.7
put_hevc_epel_h4_8_neon: 25.0
put_hevc_epel_h4_8_i8mm: 21.2
put_hevc_epel_h6_8_c: 130.0
put_hevc_epel_h6_8_neon: 40.7
put_hevc_epel_h6_8_i8mm: 36.5
put_hevc_epel_h8_8_c: 209.0
put_hevc_epel_h8_8_neon: 45.2
put_hevc_epel_h8_8_i8mm: 41.2
put_hevc_epel_h12_8_c: 465.5
put_hevc_epel_h12_8_neon: 104.5
put_hevc_epel_h12_8_i8mm: 86.5
put_hevc_epel_h16_8_c: 830.7
put_hevc_epel_h16_8_neon: 134.2
put_hevc_epel_h16_8_i8mm: 114.0
put_hevc_epel_h24_8_c: 1844.7
put_hevc_epel_h24_8_neon: 282.2
put_hevc_epel_h24_8_i8mm: 277.2
put_hevc_epel_h32_8_c: 3227.5
put_hevc_epel_h32_8_neon: 501.5
put_hevc_epel_h32_8_i8mm: 396.0
put_hevc_epel_h48_8_c: 7229.2
put_hevc_epel_h48_8_neon: 1120.2
put_hevc_epel_h48_8_i8mm: 901.2
put_hevc_epel_h64_8_c: 12869.0
put_hevc_epel_h64_8_neon: 1999.2
put_hevc_epel_h64_8_i8mm: 1610.5

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:58:29 +02:00
Martin Storsjö
8f03c30a17 aarch64: hevc: Use ld1r instead of ldr+dup in hevc_qpel_uni_w_h
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:58:20 +02:00
Martin Storsjö
717cc82d28 aarch64: hevc: Specialize put_hevc_\type\()_h*_8_neon for horizontal looping
For widths of 32 pixels and more, loop first horizontally,
then vertically.

Previously, this function would process a 16 pixel wide slice
of the block, looping vertically. After processing the whole
height, it would backtrack and process the next 16 pixel wide
slice.

When doing 8tap filtering horizontally, the function must load
7 more pixels (in practice, 8) following the actual inputs, and
this was done for each slice.

By iterating first horizontally throughout each line, then
vertically, we access data in a more cache friendly order, and
we don't need to reload data unnecessarily.

Keep the original order in put_hevc_\type\()_h12_8_neon; the
only suboptimal case there is for width=24. But specializing
an optimal variant for that would require more code, which
might not be worth it.

For the h16 case, this implementation would give a slowdown,
as it now loads the first 8 pixels separately from the rest, but
for larger widths, it is a gain. Therefore, keep the h16 case
as it was (but remove the outer loop), and create a new specialized
version for horizontal looping with 16 pixels at a time.

Before:                  Cortex A53      A72      A73  Graviton 3
put_hevc_qpel_h16_8_neon:     710.5    667.7    692.5   211.0
put_hevc_qpel_h32_8_neon:    2791.5   2643.5   2732.0   883.5
put_hevc_qpel_h64_8_neon:   10954.0  10657.0  10874.2  3241.5
After:
put_hevc_qpel_h16_8_neon:     697.5    663.5    705.7   212.5
put_hevc_qpel_h32_8_neon:    2767.2   2684.5   2791.2   920.5
put_hevc_qpel_h64_8_neon:   10559.2  10471.5  10932.2  3051.7

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:58:11 +02:00
Martin Storsjö
e3a54cabde aarch64: hevc: Merge consecutive stores in put_hevc_\type\()_h16_8_neon
This gets rid of a couple instructions, but the actual performance
is almost identical on Cortex A72/A73. On Cortex A53, it is a
handful of cycles faster.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:58:01 +02:00
Martin Storsjö
78db8405c0 aarch64: hevc: Don't iterate with sp in ff_hevc_put_hevc_qpel_uni_w_hv32/64_8_neon_i8mm
Many of the routines within hevcdsp_epel_neon and hevcdsp_qpel_neon
store temporary buffers on the stack. When consuming it,
many of these functions use the stack pointer as incremental pointer
for reading the data (instead of storing it in another register),
which is rather unusual.

Technically, this is fine as long as the pointer remains properly
aligned.

However in the case of ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm,
after incrementing sp when reading data (within each 16 pixel
wide stripe) it would then reset the stack pointer back to a lower
value, for reading the next 16 pixel wide stripe, expecting the
data to remain untouched.

This can't be assumed; data on the stack below the stack pointer
can be clobbered (e.g. by a signal handler). Some OS ABIs
allow for a little margin that won't be touched, aka a red zone,
but not all do. The ones that do, guarantee 16 or 128 bytes, not
9 KB.

Convert this function to use a separate pointer register to
iterate through the data, retaining the stack pointer to point
at the bottom of the data we require to remain untouched.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:57:55 +02:00
Martin Storsjö
e66858fbab aarch64: hevc: Reorder a misplaced function init line
Group the epel and qpel functions together.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:57:50 +02:00
Andreas Rheinhardt
a6189ba896 avcodec/mpegutils: Simplify indenting
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-26 06:30:45 +01:00
Andreas Rheinhardt
5eda98f382 avcodec/mpegutils: Avoid allocations when using AVBPrint
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-26 06:30:45 +01:00
Michael Niedermayer
b54c9a9c8f
avcodec/osq: avoid several signed integer overflows
Fixes: signed integer overflow: 178459578 + 2009763270 cannot be represented in type 'int'
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_OSQ_fuzzer-5013423686287360

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-26 01:19:17 +01:00
Mark Thompson
4743c9e87a lavc/get_buffer: Add a warning on failed allocation from a fixed pool
For hardware cases where we are forced to have a fixed pool of frames
allocated up-front (such as array textures on decoder output), suggest
a possible workaround to the user if an allocation fails because the
pool is exhausted.
2024-03-25 20:44:30 +00:00
Michael Niedermayer
70b26b693e
avcodec/vmixdec: Check shift before use
Fixes: shift exponent 32 is too large for 32-bit type 'unsigned int'
Fixes: 65909/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VMIX_fuzzer-519459745831321

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-25 21:41:26 +01:00
Henrik Gramner
7c003b63c8 avcodec/x86/h264_idct: Fix incorrect xmm spilling on win64
Broken in afa471d0ef. It just happened
to work before due to x86inc.asm previously performing XMM spills in
INIT_MMX mode which was more of a bug than an intentional feature.
2024-03-25 21:17:47 +01:00
Lynne
ecdc94b97f
vulkan_av1: port to the new stable API
Co-Authored-by: Dave Airlie <airlied@redhat.com>
2024-03-25 08:54:40 +01:00
Lynne
998aa66a10
av1dec: add AV1_REF_FRAME_NONE 2024-03-25 08:54:18 +01:00
Mark Thompson
cafb4c5548 lavc/cbs_av1: Save more frame ordering information
This is wanted by the Vulkan decoder.
2024-03-25 08:32:04 +01:00
Henrik Gramner
afa471d0ef x86: Update x86inc.asm
Make things up-to-date with upstream.

https://code.videolan.org/videolan/x86inc.asm
2024-03-24 14:53:57 +01:00
Henrik Gramner
782c4df28d x86: Avoid using 'd' as an argument name
x86inc.asm adds defines for <argument_name>{b,w,d,q} which clashes with
the nasm d{b,w,d,q} pseudo-instructions for writing initialized data.
2024-03-24 14:53:57 +01:00
Niklas Haas
5d7f234e7e avcodec/hevcdec: apply AOM film grain synthesis
Following the usual logic for H.274 film grain.
2024-03-23 18:55:21 +01:00
Niklas Haas
2e24c12aa1 avcodec/h2645_sei: decode AFGS1 T.35 SEI
I restricted this SEI to HEVC for now, until I see a H.264 sample.
2024-03-23 18:55:21 +01:00
Niklas Haas
f50382cba6 avcodec/aom_film_grain: implement AFGS1 parsing
Based on the AOMedia Film Grain Synthesis 1 (AFGS1) spec:
  https://aomediacodec.github.io/afgs1-spec/

The parsing has been changed substantially relative to the AV1 film
grain OBU. In particular:

1. There is the possibility of maintaining multiple independent film
   grain parameter sets, and decoders/players are recommended to pick
   the one most appropriate for the intended display resolution. This
   could also be used to e.g. switch between different grain profiles
   without having to re-signal the appropriate coefficients.

2. Supporting this, it's possible to *predict* the grain coefficients
   from previously signalled parameter sets, transmitting only the
   residual.

3. When not predicting, the parameter sets are now stored as a series of
   increments, rather than being directly transmitted.

4. There are several new AFGS1-exclusive fields.

I placed this parser in its own file, rather than h2645_sei.c, since
nothing in the generic AFGS1 film grain payload is specific to T.35, and
to compartmentalize the code base.
2024-03-23 18:55:21 +01:00
Niklas Haas
1535d33818 avcodec/aom_film_grain: add AOM film grain synthesis
Implementation copied wholesale from dav1d, sans SIMD, under permissive
license. This implementation was extensively verified to be bit-exact,
so it serves as a much better starting point than trying to re-engineer
this from scratch for no reason. (I also authored the original
implementation in dav1d, so any "clean room" implementation would end up
looking much the same, anyway)

The notable changes I had to make while adapting this from the dav1d
code-base to the FFmpeg codebase include:

- reordering variable declarations to avoid triggering warnings
- replacing several inline helpers by avutil equivalents
- changing code that accesses frame metadata
- replacing raw plane copying logic by av_image_copy_plane

Apart from this, the implementation is basically unmodified.
2024-03-23 18:55:21 +01:00
Niklas Haas
1539efaacb avcodec/libdavv1d: signal new AVFilmGrainParams members
Not directly signalled by AV1, but we should still set this accordingly
so that users will know what the original intended video characteristics
and chroma resolution were.
2024-03-23 18:54:36 +01:00
Niklas Haas
511f297680 avcodec/av1dec: signal new AVFilmGrainParams members
Not directly signalled by AV1, but we should still set this accordingly
so that users will know what the original intended video characteristics
and chroma resolution were.
2024-03-23 18:54:36 +01:00
Niklas Haas
ad7f059180 avcodec/h2645_sei: signal new AVFilmGrainParams members
H.274 specifies that film grain parameters are signalled as intended for
4:4:4 frames, so we always signal this, regardless of the frame's actual
subsampling.
2024-03-23 18:54:36 +01:00
Jun Zhao
bfbf0f4e82 lavc/vvc_parser: small cleanup for style
small cleanup for style, redundant semicolons, goto labels,
in FFmpeg, we put goto labels at brace level.

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2024-03-23 22:49:29 +08:00
Michael Niedermayer
57f252b2d1 avcodec/cbs_h266_syntax_template: Check tile_y
Fixes: out of array access
Fixes: 67021/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-4883576579489792

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-23 22:33:21 +08:00
Matthieu Bouron
ad227a41d4 avcodec/mediacodec_wrapper: remove unnecessary NULL checks before calling Delete{Global,Local}Ref()
Delete{Global,Local}Ref already handle NULL.
2024-03-23 11:37:44 +01:00
Matthieu Bouron
b1a683a2fd avcodec/mediacodec_wrapper: use an OFFSET() macro where relevant
Reduces a bit the horizontal spacing.
2024-03-23 11:37:44 +01:00