ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-06-13 19:10:33 +00:00

Author	SHA1	Message	Date
Michael Niedermayer	ebdcf98499	avcodec/truemotion1: Height not being a multiple of 4 is unsupported mb_change_bits is given space based on height >> 2, while more data is read Fixes: out of array access Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TRUEMOTION1_fuzzer-5201925062590464.fuzz Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	d188a86730	avcodec/rtv1: fix undefined FFALIGN Fixes: signed integer overflow: 2147483647 + 4 cannot be represented in type 'int' Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_RTV1_fuzzer-6324303861514240 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	7eabe56436	avcodec/qoadec: Fix undefined overflow in lms_predict Fixes: signed integer overflow: -1575944192 + -602931200 cannot be represented in type 'int' Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_QOA_fuzzer-6470469339185152 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	48eeb198a5	avcodec/hcadec: do not allow code to continue after failed init Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488 Fixes: out of array write Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	addb85ea39	avcodec/hcadec: do not set hfr_group_count to invalid values Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488 Fixes: out of array write Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Dai, Jianhui J	61afe4d98c	avcodec/cbs_vp8: Improve the bitstream position check The VP8 compressed header may not be byte-aligned due to boolean coding. Round up byte count for accurate data positioning. Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2024-03-26 09:05:04 -04:00
Dai, Jianhui J	63dea3c1e1	avcodec/cbs_vp8: Use little endian in fixed() This commit adds value range checks to cbs_vp8_read_unsigned_le, migrates fixed() to use it, and enforces little-endian consistency for all read methods. Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2024-03-26 09:04:44 -04:00
Martin Storsjö	f872b19714	aarch64: hevc: Produce plain neon versions of qpel_bi_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_bi_hv4_8_c: 385.7 put_hevc_qpel_bi_hv4_8_neon: 131.0 put_hevc_qpel_bi_hv4_8_i8mm: 92.2 put_hevc_qpel_bi_hv6_8_c: 701.0 put_hevc_qpel_bi_hv6_8_neon: 239.5 put_hevc_qpel_bi_hv6_8_i8mm: 191.0 put_hevc_qpel_bi_hv8_8_c: 1162.0 put_hevc_qpel_bi_hv8_8_neon: 228.0 put_hevc_qpel_bi_hv8_8_i8mm: 225.2 put_hevc_qpel_bi_hv12_8_c: 2305.0 put_hevc_qpel_bi_hv12_8_neon: 558.0 put_hevc_qpel_bi_hv12_8_i8mm: 483.2 put_hevc_qpel_bi_hv16_8_c: 3965.2 put_hevc_qpel_bi_hv16_8_neon: 732.7 put_hevc_qpel_bi_hv16_8_i8mm: 656.5 put_hevc_qpel_bi_hv24_8_c: 8709.7 put_hevc_qpel_bi_hv24_8_neon: 1555.2 put_hevc_qpel_bi_hv24_8_i8mm: 1448.7 put_hevc_qpel_bi_hv32_8_c: 14818.0 put_hevc_qpel_bi_hv32_8_neon: 2763.7 put_hevc_qpel_bi_hv32_8_i8mm: 2468.0 put_hevc_qpel_bi_hv48_8_c: 32855.5 put_hevc_qpel_bi_hv48_8_neon: 6107.2 put_hevc_qpel_bi_hv48_8_i8mm: 5452.7 put_hevc_qpel_bi_hv64_8_c: 57591.5 put_hevc_qpel_bi_hv64_8_neon: 10660.2 put_hevc_qpel_bi_hv64_8_i8mm: 9580.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	d21b9a0411	aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. AWS Graviton 3: put_hevc_qpel_uni_w_hv4_8_c: 422.2 put_hevc_qpel_uni_w_hv4_8_neon: 140.7 put_hevc_qpel_uni_w_hv4_8_i8mm: 100.7 put_hevc_qpel_uni_w_hv8_8_c: 1208.0 put_hevc_qpel_uni_w_hv8_8_neon: 268.2 put_hevc_qpel_uni_w_hv8_8_i8mm: 261.5 put_hevc_qpel_uni_w_hv16_8_c: 4297.2 put_hevc_qpel_uni_w_hv16_8_neon: 802.2 put_hevc_qpel_uni_w_hv16_8_i8mm: 731.2 put_hevc_qpel_uni_w_hv32_8_c: 15518.5 put_hevc_qpel_uni_w_hv32_8_neon: 3085.2 put_hevc_qpel_uni_w_hv32_8_i8mm: 2783.2 put_hevc_qpel_uni_w_hv64_8_c: 57254.5 put_hevc_qpel_uni_w_hv64_8_neon: 11787.5 put_hevc_qpel_uni_w_hv64_8_i8mm: 10659.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	5ab138673b	aarch64: hevc: Produce plain neon versions of qpel_uni_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_uni_hv4_8_c: 384.2 put_hevc_qpel_uni_hv4_8_neon: 127.5 put_hevc_qpel_uni_hv4_8_i8mm: 85.5 put_hevc_qpel_uni_hv6_8_c: 705.5 put_hevc_qpel_uni_hv6_8_neon: 224.5 put_hevc_qpel_uni_hv6_8_i8mm: 176.2 put_hevc_qpel_uni_hv8_8_c: 1136.5 put_hevc_qpel_uni_hv8_8_neon: 216.5 put_hevc_qpel_uni_hv8_8_i8mm: 214.0 put_hevc_qpel_uni_hv12_8_c: 2259.5 put_hevc_qpel_uni_hv12_8_neon: 498.5 put_hevc_qpel_uni_hv12_8_i8mm: 410.7 put_hevc_qpel_uni_hv16_8_c: 3824.7 put_hevc_qpel_uni_hv16_8_neon: 670.0 put_hevc_qpel_uni_hv16_8_i8mm: 603.7 put_hevc_qpel_uni_hv24_8_c: 8113.5 put_hevc_qpel_uni_hv24_8_neon: 1474.7 put_hevc_qpel_uni_hv24_8_i8mm: 1351.5 put_hevc_qpel_uni_hv32_8_c: 14744.5 put_hevc_qpel_uni_hv32_8_neon: 2599.7 put_hevc_qpel_uni_hv32_8_i8mm: 2266.0 put_hevc_qpel_uni_hv48_8_c: 32800.0 put_hevc_qpel_uni_hv48_8_neon: 5650.0 put_hevc_qpel_uni_hv48_8_i8mm: 5011.7 put_hevc_qpel_uni_hv64_8_c: 57856.2 put_hevc_qpel_uni_hv64_8_neon: 9863.5 put_hevc_qpel_uni_hv64_8_i8mm: 8767.7 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	5cbeefc79e	aarch64: hevc: Produce plain neon versions of qpel_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_hv4_8_c: 386.0 put_hevc_qpel_hv4_8_neon: 125.7 put_hevc_qpel_hv4_8_i8mm: 83.2 put_hevc_qpel_hv6_8_c: 749.0 put_hevc_qpel_hv6_8_neon: 207.0 put_hevc_qpel_hv6_8_i8mm: 166.0 put_hevc_qpel_hv8_8_c: 1305.2 put_hevc_qpel_hv8_8_neon: 216.5 put_hevc_qpel_hv8_8_i8mm: 213.0 put_hevc_qpel_hv12_8_c: 2570.5 put_hevc_qpel_hv12_8_neon: 480.0 put_hevc_qpel_hv12_8_i8mm: 398.2 put_hevc_qpel_hv16_8_c: 4158.7 put_hevc_qpel_hv16_8_neon: 659.7 put_hevc_qpel_hv16_8_i8mm: 593.5 put_hevc_qpel_hv24_8_c: 8626.7 put_hevc_qpel_hv24_8_neon: 1653.5 put_hevc_qpel_hv24_8_i8mm: 1398.7 put_hevc_qpel_hv32_8_c: 14646.0 put_hevc_qpel_hv32_8_neon: 2566.2 put_hevc_qpel_hv32_8_i8mm: 2287.5 put_hevc_qpel_hv48_8_c: 31072.5 put_hevc_qpel_hv48_8_neon: 6228.5 put_hevc_qpel_hv48_8_i8mm: 5291.0 put_hevc_qpel_hv64_8_c: 53847.2 put_hevc_qpel_hv64_8_neon: 9856.7 put_hevc_qpel_hv64_8_i8mm: 8831.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	20c38f4b8d	aarch64: hevc: Reorder qpel_hv functions to prepare for templating This is a pure reordering of code without changing anything in the individual functions. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:50 +02:00
Martin Storsjö	4f71e4ebf2	aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions The hv32 and hv64 functions were identical - both loop and process 16 pixels at a time. The hv16 function was near identical, except for the outer loop (and using sp instead of a separate register). Given the size of these functions, the extra cost of the outer loop is negligible, so use the same function for hv16 as well. This removes over 200 lines of duplicated assembly, and over 4 KB of binary size. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:40 +02:00
Martin Storsjö	4063e50eec	aarch64: hevc: Split the qpel_*_hv functions into two parts The first horizontal filter can use either i8mm or plain neon versions, while the second part is a pure neon implementation. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:29 +02:00
Martin Storsjö	ad01d06f91	aarch64: hevc: Implement a neon version of hevc_qpel_uni_w_h*_8 AWS Graviton 3: put_hevc_qpel_uni_w_h4_8_c: 159.0 put_hevc_qpel_uni_w_h4_8_neon: 64.2 put_hevc_qpel_uni_w_h4_8_i8mm: 40.0 put_hevc_qpel_uni_w_h6_8_c: 344.7 put_hevc_qpel_uni_w_h6_8_neon: 114.5 put_hevc_qpel_uni_w_h6_8_i8mm: 82.0 put_hevc_qpel_uni_w_h8_8_c: 596.2 put_hevc_qpel_uni_w_h8_8_neon: 132.2 put_hevc_qpel_uni_w_h8_8_i8mm: 106.0 put_hevc_qpel_uni_w_h12_8_c: 1325.0 put_hevc_qpel_uni_w_h12_8_neon: 299.0 put_hevc_qpel_uni_w_h12_8_i8mm: 211.5 put_hevc_qpel_uni_w_h16_8_c: 2300.0 put_hevc_qpel_uni_w_h16_8_neon: 422.0 put_hevc_qpel_uni_w_h16_8_i8mm: 286.2 put_hevc_qpel_uni_w_h24_8_c: 5059.0 put_hevc_qpel_uni_w_h24_8_neon: 912.2 put_hevc_qpel_uni_w_h24_8_i8mm: 664.2 put_hevc_qpel_uni_w_h32_8_c: 9198.2 put_hevc_qpel_uni_w_h32_8_neon: 1638.2 put_hevc_qpel_uni_w_h32_8_i8mm: 1033.7 put_hevc_qpel_uni_w_h48_8_c: 20754.7 put_hevc_qpel_uni_w_h48_8_neon: 3633.7 put_hevc_qpel_uni_w_h48_8_i8mm: 2300.7 put_hevc_qpel_uni_w_h64_8_c: 36854.7 put_hevc_qpel_uni_w_h64_8_neon: 6435.7 put_hevc_qpel_uni_w_h64_8_i8mm: 4039.2 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:03:18 +02:00
Martin Storsjö	de23b384fd	aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm In addition to just templating, this contains one change to ff_hevc_put_hevc_epel_bi_hv32_8, by setting the w6 register which ff_hevc_put_hevc_epel_h32_8_neon requires. AWS Graviton 3: put_hevc_epel_bi_hv4_8_c: 176.5 put_hevc_epel_bi_hv4_8_neon: 62.0 put_hevc_epel_bi_hv4_8_i8mm: 58.0 put_hevc_epel_bi_hv6_8_c: 343.7 put_hevc_epel_bi_hv6_8_neon: 109.7 put_hevc_epel_bi_hv6_8_i8mm: 105.7 put_hevc_epel_bi_hv8_8_c: 536.0 put_hevc_epel_bi_hv8_8_neon: 112.7 put_hevc_epel_bi_hv8_8_i8mm: 111.7 put_hevc_epel_bi_hv12_8_c: 1107.7 put_hevc_epel_bi_hv12_8_neon: 254.7 put_hevc_epel_bi_hv12_8_i8mm: 239.0 put_hevc_epel_bi_hv16_8_c: 1927.7 put_hevc_epel_bi_hv16_8_neon: 356.2 put_hevc_epel_bi_hv16_8_i8mm: 334.2 put_hevc_epel_bi_hv24_8_c: 4195.2 put_hevc_epel_bi_hv24_8_neon: 736.7 put_hevc_epel_bi_hv24_8_i8mm: 715.5 put_hevc_epel_bi_hv32_8_c: 7280.5 put_hevc_epel_bi_hv32_8_neon: 1287.7 put_hevc_epel_bi_hv32_8_i8mm: 1162.2 put_hevc_epel_bi_hv48_8_c: 16857.7 put_hevc_epel_bi_hv48_8_neon: 2836.2 put_hevc_epel_bi_hv48_8_i8mm: 2908.5 put_hevc_epel_bi_hv64_8_c: 29248.2 put_hevc_epel_bi_hv64_8_neon: 5051.7 put_hevc_epel_bi_hv64_8_i8mm: 4491.5 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:03:16 +02:00
Martin Storsjö	96e5adda9f	aarch64: hevc: Produce epel_uni_w_hv functions for both neon and i8mm AWS Graviton 3: put_hevc_epel_uni_w_hv4_8_c: 191.2 put_hevc_epel_uni_w_hv4_8_neon: 87.7 put_hevc_epel_uni_w_hv4_8_i8mm: 83.2 put_hevc_epel_uni_w_hv6_8_c: 349.5 put_hevc_epel_uni_w_hv6_8_neon: 153.0 put_hevc_epel_uni_w_hv6_8_i8mm: 148.5 put_hevc_epel_uni_w_hv8_8_c: 581.2 put_hevc_epel_uni_w_hv8_8_neon: 166.7 put_hevc_epel_uni_w_hv8_8_i8mm: 163.5 put_hevc_epel_uni_w_hv12_8_c: 1230.0 put_hevc_epel_uni_w_hv12_8_neon: 387.7 put_hevc_epel_uni_w_hv12_8_i8mm: 370.2 put_hevc_epel_uni_w_hv16_8_c: 2003.2 put_hevc_epel_uni_w_hv16_8_neon: 501.5 put_hevc_epel_uni_w_hv16_8_i8mm: 490.2 put_hevc_epel_uni_w_hv24_8_c: 4448.7 put_hevc_epel_uni_w_hv24_8_neon: 1092.2 put_hevc_epel_uni_w_hv24_8_i8mm: 1069.7 put_hevc_epel_uni_w_hv32_8_c: 7817.2 put_hevc_epel_uni_w_hv32_8_neon: 1916.2 put_hevc_epel_uni_w_hv32_8_i8mm: 1829.5 put_hevc_epel_uni_w_hv48_8_c: 16728.2 put_hevc_epel_uni_w_hv48_8_neon: 4263.7 put_hevc_epel_uni_w_hv48_8_i8mm: 4342.7 put_hevc_epel_uni_w_hv64_8_c: 29563.2 put_hevc_epel_uni_w_hv64_8_neon: 7474.2 put_hevc_epel_uni_w_hv64_8_i8mm: 7128.5 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:59:58 +02:00
Martin Storsjö	d7294199ab	aarch64: hevc: Produce epel_uni_hv functions for both neon and i8mm AWS Graviton 3: put_hevc_epel_uni_hv4_8_c: 163.5 put_hevc_epel_uni_hv4_8_neon: 59.7 put_hevc_epel_uni_hv4_8_i8mm: 57.5 put_hevc_epel_uni_hv6_8_c: 344.7 put_hevc_epel_uni_hv6_8_neon: 105.0 put_hevc_epel_uni_hv6_8_i8mm: 102.7 put_hevc_epel_uni_hv8_8_c: 552.2 put_hevc_epel_uni_hv8_8_neon: 111.2 put_hevc_epel_uni_hv8_8_i8mm: 104.0 put_hevc_epel_uni_hv12_8_c: 1195.0 put_hevc_epel_uni_hv12_8_neon: 248.7 put_hevc_epel_uni_hv12_8_i8mm: 229.5 put_hevc_epel_uni_hv16_8_c: 1910.2 put_hevc_epel_uni_hv16_8_neon: 339.5 put_hevc_epel_uni_hv16_8_i8mm: 323.2 put_hevc_epel_uni_hv24_8_c: 4048.2 put_hevc_epel_uni_hv24_8_neon: 737.7 put_hevc_epel_uni_hv24_8_i8mm: 713.7 put_hevc_epel_uni_hv32_8_c: 6865.7 put_hevc_epel_uni_hv32_8_neon: 1285.0 put_hevc_epel_uni_hv32_8_i8mm: 1206.0 put_hevc_epel_uni_hv48_8_c: 15830.5 put_hevc_epel_uni_hv48_8_neon: 2844.7 put_hevc_epel_uni_hv48_8_i8mm: 2914.0 put_hevc_epel_uni_hv64_8_c: 27912.7 put_hevc_epel_uni_hv64_8_neon: 4970.5 put_hevc_epel_uni_hv64_8_i8mm: 4653.7 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:59:28 +02:00
Martin Storsjö	7bf3d14769	aarch64: hevc: Produce epel_hv functions for both plain neon and i8mm AWS Graviton 3: put_hevc_epel_hv4_8_c: 163.7 put_hevc_epel_hv4_8_neon: 52.5 put_hevc_epel_hv4_8_i8mm: 49.5 put_hevc_epel_hv6_8_c: 292.2 put_hevc_epel_hv6_8_neon: 97.7 put_hevc_epel_hv6_8_i8mm: 101.2 put_hevc_epel_hv8_8_c: 471.0 put_hevc_epel_hv8_8_neon: 106.7 put_hevc_epel_hv8_8_i8mm: 102.5 put_hevc_epel_hv12_8_c: 1030.2 put_hevc_epel_hv12_8_neon: 240.5 put_hevc_epel_hv12_8_i8mm: 215.0 put_hevc_epel_hv16_8_c: 1711.5 put_hevc_epel_hv16_8_neon: 340.2 put_hevc_epel_hv16_8_i8mm: 319.2 put_hevc_epel_hv24_8_c: 3670.0 put_hevc_epel_hv24_8_neon: 702.0 put_hevc_epel_hv24_8_i8mm: 666.5 put_hevc_epel_hv32_8_c: 6785.5 put_hevc_epel_hv32_8_neon: 1247.0 put_hevc_epel_hv32_8_i8mm: 1169.0 put_hevc_epel_hv48_8_c: 14689.7 put_hevc_epel_hv48_8_neon: 2665.2 put_hevc_epel_hv48_8_i8mm: 2740.0 put_hevc_epel_hv64_8_c: 25899.2 put_hevc_epel_hv64_8_neon: 4801.2 put_hevc_epel_hv64_8_i8mm: 4487.7 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:59:19 +02:00
Martin Storsjö	5b5666e5ab	aarch64: hevc: Reorder epel_hv functions to prepare for templating This is a pure reordering of code without changing anything in the individual functions. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:59:07 +02:00
Martin Storsjö	e6d4c0e117	aarch64: hevc: Split the epel_*_hv functions into two parts The first horizontal filter can use either i8mm or plain neon versions, while the second part is a pure neon implementation. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:59:00 +02:00
Martin Storsjö	54af555bfa	aarch64: hevc: Implement a neon version of hevc_epel_uni_w_h*_8 AWS Graviton 3: put_hevc_epel_uni_w_h4_8_c: 97.2 put_hevc_epel_uni_w_h4_8_neon: 41.2 put_hevc_epel_uni_w_h4_8_i8mm: 35.2 put_hevc_epel_uni_w_h6_8_c: 203.7 put_hevc_epel_uni_w_h6_8_neon: 84.7 put_hevc_epel_uni_w_h6_8_i8mm: 74.7 put_hevc_epel_uni_w_h8_8_c: 345.7 put_hevc_epel_uni_w_h8_8_neon: 94.0 put_hevc_epel_uni_w_h8_8_i8mm: 80.7 put_hevc_epel_uni_w_h12_8_c: 768.7 put_hevc_epel_uni_w_h12_8_neon: 196.7 put_hevc_epel_uni_w_h12_8_i8mm: 169.7 put_hevc_epel_uni_w_h16_8_c: 1313.0 put_hevc_epel_uni_w_h16_8_neon: 290.7 put_hevc_epel_uni_w_h16_8_i8mm: 238.0 put_hevc_epel_uni_w_h24_8_c: 2877.5 put_hevc_epel_uni_w_h24_8_neon: 650.0 put_hevc_epel_uni_w_h24_8_i8mm: 512.0 put_hevc_epel_uni_w_h32_8_c: 5113.5 put_hevc_epel_uni_w_h32_8_neon: 1129.5 put_hevc_epel_uni_w_h32_8_i8mm: 739.2 put_hevc_epel_uni_w_h48_8_c: 11757.0 put_hevc_epel_uni_w_h48_8_neon: 2518.7 put_hevc_epel_uni_w_h48_8_i8mm: 1688.5 put_hevc_epel_uni_w_h64_8_c: 20478.0 put_hevc_epel_uni_w_h64_8_neon: 4411.7 put_hevc_epel_uni_w_h64_8_i8mm: 2884.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:58:47 +02:00
Martin Storsjö	6d384298ec	aarch64: hevc: Implement a neon version of put_hevc_epel_h*_8 AWS Graviton 3: put_hevc_epel_h4_8_c: 64.7 put_hevc_epel_h4_8_neon: 25.0 put_hevc_epel_h4_8_i8mm: 21.2 put_hevc_epel_h6_8_c: 130.0 put_hevc_epel_h6_8_neon: 40.7 put_hevc_epel_h6_8_i8mm: 36.5 put_hevc_epel_h8_8_c: 209.0 put_hevc_epel_h8_8_neon: 45.2 put_hevc_epel_h8_8_i8mm: 41.2 put_hevc_epel_h12_8_c: 465.5 put_hevc_epel_h12_8_neon: 104.5 put_hevc_epel_h12_8_i8mm: 86.5 put_hevc_epel_h16_8_c: 830.7 put_hevc_epel_h16_8_neon: 134.2 put_hevc_epel_h16_8_i8mm: 114.0 put_hevc_epel_h24_8_c: 1844.7 put_hevc_epel_h24_8_neon: 282.2 put_hevc_epel_h24_8_i8mm: 277.2 put_hevc_epel_h32_8_c: 3227.5 put_hevc_epel_h32_8_neon: 501.5 put_hevc_epel_h32_8_i8mm: 396.0 put_hevc_epel_h48_8_c: 7229.2 put_hevc_epel_h48_8_neon: 1120.2 put_hevc_epel_h48_8_i8mm: 901.2 put_hevc_epel_h64_8_c: 12869.0 put_hevc_epel_h64_8_neon: 1999.2 put_hevc_epel_h64_8_i8mm: 1610.5 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:58:29 +02:00
Martin Storsjö	8f03c30a17	aarch64: hevc: Use ld1r instead of ldr+dup in hevc_qpel_uni_w_h Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:58:20 +02:00
Martin Storsjö	717cc82d28	aarch64: hevc: Specialize put_hevc_\type\()_h*_8_neon for horizontal looping For widths of 32 pixels and more, loop first horizontally, then vertically. Previously, this function would process a 16 pixel wide slice of the block, looping vertically. After processing the whole height, it would backtrack and process the next 16 pixel wide slice. When doing 8tap filtering horizontally, the function must load 7 more pixels (in practice, 8) following the actual inputs, and this was done for each slice. By iterating first horizontally throughout each line, then vertically, we access data in a more cache friendly order, and we don't need to reload data unnecessarily. Keep the original order in put_hevc_\type\()_h12_8_neon; the only suboptimal case there is for width=24. But specializing an optimal variant for that would require more code, which might not be worth it. For the h16 case, this implementation would give a slowdown, as it now loads the first 8 pixels separately from the rest, but for larger widths, it is a gain. Therefore, keep the h16 case as it was (but remove the outer loop), and create a new specialized version for horizontal looping with 16 pixels at a time. Before: Cortex A53 A72 A73 Graviton 3 put_hevc_qpel_h16_8_neon: 710.5 667.7 692.5 211.0 put_hevc_qpel_h32_8_neon: 2791.5 2643.5 2732.0 883.5 put_hevc_qpel_h64_8_neon: 10954.0 10657.0 10874.2 3241.5 After: put_hevc_qpel_h16_8_neon: 697.5 663.5 705.7 212.5 put_hevc_qpel_h32_8_neon: 2767.2 2684.5 2791.2 920.5 put_hevc_qpel_h64_8_neon: 10559.2 10471.5 10932.2 3051.7 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:58:11 +02:00
Martin Storsjö	e3a54cabde	aarch64: hevc: Merge consecutive stores in put_hevc_\type\()_h16_8_neon This gets rid of a couple instructions, but the actual performance is almost identical on Cortex A72/A73. On Cortex A53, it is a handful of cycles faster. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:58:01 +02:00
Martin Storsjö	78db8405c0	aarch64: hevc: Don't iterate with sp in ff_hevc_put_hevc_qpel_uni_w_hv32/64_8_neon_i8mm Many of the routines within hevcdsp_epel_neon and hevcdsp_qpel_neon store temporary buffers on the stack. When consuming it, many of these functions use the stack pointer as incremental pointer for reading the data (instead of storing it in another register), which is rather unusual. Technically, this is fine as long as the pointer remains properly aligned. However in the case of ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm, after incrementing sp when reading data (within each 16 pixel wide stripe) it would then reset the stack pointer back to a lower value, for reading the next 16 pixel wide stripe, expecting the data to remain untouched. This can't be assumed; data on the stack below the stack pointer can be clobbered (e.g. by a signal handler). Some OS ABIs allow for a little margin that won't be touched, aka a red zone, but not all do. The ones that do, guarantee 16 or 128 bytes, not 9 KB. Convert this function to use a separate pointer register to iterate through the data, retaining the stack pointer to point at the bottom of the data we require to remain untouched. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:57:55 +02:00
Martin Storsjö	e66858fbab	aarch64: hevc: Reorder a misplaced function init line Group the epel and qpel functions together. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:57:50 +02:00
Andreas Rheinhardt	a6189ba896	avcodec/mpegutils: Simplify indenting Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-26 06:30:45 +01:00
Andreas Rheinhardt	5eda98f382	avcodec/mpegutils: Avoid allocations when using AVBPrint Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-26 06:30:45 +01:00
Michael Niedermayer	b54c9a9c8f	avcodec/osq: avoid several signed integer overflows Fixes: signed integer overflow: 178459578 + 2009763270 cannot be represented in type 'int' Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_OSQ_fuzzer-5013423686287360 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 01:19:17 +01:00
Mark Thompson	4743c9e87a	lavc/get_buffer: Add a warning on failed allocation from a fixed pool For hardware cases where we are forced to have a fixed pool of frames allocated up-front (such as array textures on decoder output), suggest a possible workaround to the user if an allocation fails because the pool is exhausted.	2024-03-25 20:44:30 +00:00
Michael Niedermayer	70b26b693e	avcodec/vmixdec: Check shift before use Fixes: shift exponent 32 is too large for 32-bit type 'unsigned int' Fixes: 65909/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VMIX_fuzzer-519459745831321 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-25 21:41:26 +01:00
Henrik Gramner	7c003b63c8	avcodec/x86/h264_idct: Fix incorrect xmm spilling on win64 Broken in `afa471d0ef`. It just happened to work before due to x86inc.asm previously performing XMM spills in INIT_MMX mode which was more of a bug than an intentional feature.	2024-03-25 21:17:47 +01:00
Lynne	ecdc94b97f	vulkan_av1: port to the new stable API Co-Authored-by: Dave Airlie <airlied@redhat.com>	2024-03-25 08:54:40 +01:00
Lynne	998aa66a10	av1dec: add AV1_REF_FRAME_NONE	2024-03-25 08:54:18 +01:00
Mark Thompson	cafb4c5548	lavc/cbs_av1: Save more frame ordering information This is wanted by the Vulkan decoder.	2024-03-25 08:32:04 +01:00
Henrik Gramner	afa471d0ef	x86: Update x86inc.asm Make things up-to-date with upstream. https://code.videolan.org/videolan/x86inc.asm	2024-03-24 14:53:57 +01:00
Henrik Gramner	782c4df28d	x86: Avoid using 'd' as an argument name x86inc.asm adds defines for <argument_name>{b,w,d,q} which clashes with the nasm d{b,w,d,q} pseudo-instructions for writing initialized data.	2024-03-24 14:53:57 +01:00
Niklas Haas	5d7f234e7e	avcodec/hevcdec: apply AOM film grain synthesis Following the usual logic for H.274 film grain.	2024-03-23 18:55:21 +01:00
Niklas Haas	2e24c12aa1	avcodec/h2645_sei: decode AFGS1 T.35 SEI I restricted this SEI to HEVC for now, until I see a H.264 sample.	2024-03-23 18:55:21 +01:00
Niklas Haas	f50382cba6	avcodec/aom_film_grain: implement AFGS1 parsing Based on the AOMedia Film Grain Synthesis 1 (AFGS1) spec: https://aomediacodec.github.io/afgs1-spec/ The parsing has been changed substantially relative to the AV1 film grain OBU. In particular: 1. There is the possibility of maintaining multiple independent film grain parameter sets, and decoders/players are recommended to pick the one most appropriate for the intended display resolution. This could also be used to e.g. switch between different grain profiles without having to re-signal the appropriate coefficients. 2. Supporting this, it's possible to predict the grain coefficients from previously signalled parameter sets, transmitting only the residual. 3. When not predicting, the parameter sets are now stored as a series of increments, rather than being directly transmitted. 4. There are several new AFGS1-exclusive fields. I placed this parser in its own file, rather than h2645_sei.c, since nothing in the generic AFGS1 film grain payload is specific to T.35, and to compartmentalize the code base.	2024-03-23 18:55:21 +01:00
Niklas Haas	1535d33818	avcodec/aom_film_grain: add AOM film grain synthesis Implementation copied wholesale from dav1d, sans SIMD, under permissive license. This implementation was extensively verified to be bit-exact, so it serves as a much better starting point than trying to re-engineer this from scratch for no reason. (I also authored the original implementation in dav1d, so any "clean room" implementation would end up looking much the same, anyway) The notable changes I had to make while adapting this from the dav1d code-base to the FFmpeg codebase include: - reordering variable declarations to avoid triggering warnings - replacing several inline helpers by avutil equivalents - changing code that accesses frame metadata - replacing raw plane copying logic by av_image_copy_plane Apart from this, the implementation is basically unmodified.	2024-03-23 18:55:21 +01:00
Niklas Haas	1539efaacb	avcodec/libdavv1d: signal new AVFilmGrainParams members Not directly signalled by AV1, but we should still set this accordingly so that users will know what the original intended video characteristics and chroma resolution were.	2024-03-23 18:54:36 +01:00
Niklas Haas	511f297680	avcodec/av1dec: signal new AVFilmGrainParams members Not directly signalled by AV1, but we should still set this accordingly so that users will know what the original intended video characteristics and chroma resolution were.	2024-03-23 18:54:36 +01:00
Niklas Haas	ad7f059180	avcodec/h2645_sei: signal new AVFilmGrainParams members H.274 specifies that film grain parameters are signalled as intended for 4:4:4 frames, so we always signal this, regardless of the frame's actual subsampling.	2024-03-23 18:54:36 +01:00
Jun Zhao	bfbf0f4e82	lavc/vvc_parser: small cleanup for style small cleanup for style, redundant semicolons, goto labels, in FFmpeg, we put goto labels at brace level. Signed-off-by: Jun Zhao <barryjzhao@tencent.com>	2024-03-23 22:49:29 +08:00
Michael Niedermayer	57f252b2d1	avcodec/cbs_h266_syntax_template: Check tile_y Fixes: out of array access Fixes: 67021/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-4883576579489792 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-23 22:33:21 +08:00
Matthieu Bouron	ad227a41d4	avcodec/mediacodec_wrapper: remove unnecessary NULL checks before calling Delete{Global,Local}Ref() Delete{Global,Local}Ref already handle NULL.	2024-03-23 11:37:44 +01:00
Matthieu Bouron	b1a683a2fd	avcodec/mediacodec_wrapper: use an OFFSET() macro where relevant Reduces a bit the horizontal spacing.	2024-03-23 11:37:44 +01:00

1 2 3 4 5 ...

49645 commits