ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-02-06 01:44:53 +00:00

Author	SHA1	Message	Date
Michael Niedermayer	8f57b04fe5	avcodec/hevc/sei: Use get_bits64() in decode_nal_sei_3d_reference_displays_info() Fixes: Assertion n>=0 && n<=32 failed at ./libavcodec/get_bits.h:426 Fixes: 468435217/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-4644127078940672 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-05 20:20:08 +00:00
Michael Niedermayer	af86f0ffcc	avcodec/dca_xll: Clear padding in ff_dca_xll_parse() Fixes: Use of uninitialized memory Fixes: 472020020/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DCA_DEC_fuzzer-6433045331902464 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-05 18:12:46 +01:00
Michael Niedermayer	189bc0aaf5	avcodec/dxv: Clear tex_data padding on reallocation dxv assumes that newly reallocated memory in tex_data is not uninitialized thus we have to do that too in case of reallocation in ff_lzf_uncompress() Fixes: 475000819/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXV_DEC_fuzzer-5571269310611456 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-05 16:29:08 +01:00
Michael Niedermayer	0f35146e27	avcodec/lzf: Remove size messing from ff_lzf_uncompress() size represents the output size randomly changing it but not reseting it on errors leaks uninitialized memory. Fixes: 475000819/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXV_DEC_fuzzer-5571269310611456 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-05 16:29:08 +01:00
Michael Niedermayer	5db50e8775	avcodec/ffv1enc: refine end condition In the case where the last sorted value was -1u and we where on the first pass of run1 we failed to fill the last few values of bitmap No real world testcase is known Fixes: use of uninitialized memory Fixes: 460333808/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FFV1_fuzzer-6370167888347136 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-05 16:07:13 +01:00
Michael Niedermayer	11a5afea31	avcodec/dca_xll: Check get_rice_array() Fixes: use of uninitialized memory Fixes: 451655450/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DCA_DEC_fuzzer-6527248623796224 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-05 14:37:59 +01:00
Jun Zhao	27dd2f1c70	lavc/hevc: fix missing # in ldrsw immediate offset The ldrsw instruction requires immediate offset with # prefix. This fixes the syntax error introduced in commit `26752368f0` (aarch64/h26x: Add put_hevc_pel_bi_w_pixels) where the load_bi_w_pixels_param macro was added. Signed-off-by: Jun Zhao <barryjzhao@tencent.com>	2026-02-05 09:13:22 +08:00
Zhao Zhili	e250854ecf	aarch64/h264pred: disable inefficient functions These assembly optimizations have been identified as "performance regressions." Due to advancements in modern CPU micro-architectures and compiler optimization the C implementations now consistently outperform these handwritten routines. Test Name A55-clang M1 A76-gcc-14 A510-clang A715-clang X3-clang -------------------------------------------------------------------------------------------------------------------- pred8x8_dc_8_neon 55.9 ( 0.79x)! 0.2 ( 0.31x)! 35.7 ( 0.63x)! 98.3 ( 0.37x)! 35.9 ( 0.45x)! 33.6 ( 0.38x)! pred8x8_dc_10_neon 57.0 ( 1.04x) 0.3 ( 0.36x)! 35.9 ( 0.94x)! 98.2 ( 0.53x)! 35.8 ( 0.58x)! 33.2 ( 0.50x)! pred8x8_dc_128_8_neon 26.0 ( 0.69x)! 0.1 ( 0.43x)! 15.3 ( 0.73x)! 46.4 ( 0.36x)! 10.6 ( 0.48x)! 10.3 ( 1.09x) pred8x8_dc_128_10_neon 25.3 ( 0.99x)! 0.1 ( 0.42x)! 19.3 ( 0.48x)! 44.5 ( 0.42x)! 10.0 ( 0.61x)! 11.0 ( 1.00x) pred8x8_left_dc_8_neon 46.9 ( 0.72x)! 0.2 ( 0.26x)! 30.2 ( 0.49x)! 71.4 ( 0.39x)! 29.8 ( 0.35x)! 26.5 ( 0.44x)! pred8x8_left_dc_10_neon 45.4 ( 0.82x)! 0.2 ( 0.29x)! 28.1 ( 0.67x)! 70.2 ( 0.47x)! 30.0 ( 0.38x)! 26.5 ( 0.43x)! pred16x16_dc_8_neon 74.4 ( 1.34x) 0.3 ( 0.62x)! 44.7 ( 0.89x)! 128.0 ( 0.79x)! 48.5 ( 0.67x)! 39.4 ( 0.71x)! pred16x16_dc_128_8_neon 37.9 ( 0.79x)! 0.1 ( 0.60x)! 20.1 ( 0.80x)! 41.8 ( 0.46x)! 16.2 ( 0.81x)! 12.8 ( 0.95x)! pred16x16_left_dc_8_neon 69.9 ( 1.19x) 0.3 ( 0.46x)! 49.6 ( 0.54x)! 116.8 ( 0.62x)! 52.8 ( 0.45x)! 44.2 ( 0.51x)! pred8x8_hori_8_neon 30.6 ( 1.39x) 0.1 ( 0.45x)! 19.4 ( 0.81x)! 71.0 ( 0.50x)! 15.9 ( 0.55x)! 12.2 ( 0.94x)! pred8x8_hori_10_neon* 29.3 ( 1.82x) 0.1 ( 0.59x)! 18.5 ( 1.56x) 68.9 ( 0.64x)! 15.8 ( 0.62x)! 11.8 ( 0.97x)! pred8x8_top_dc_8_neon 35.8 ( 0.96x)! 0.1 ( 0.59x)! 16.8 ( 0.81x)! 58.9 ( 0.44x)! 11.3 ( 0.89x)! 11.4 ( 0.99x)! pred8x8_top_dc_10_neon 37.4 ( 1.24x) 0.1 ( 0.92x)! 20.4 ( 0.81x)! 59.5 ( 0.69x)! 10.5 ( 1.48x) 11.8 ( 1.02x) pred8x8_vertical_8_neon 18.3 ( 1.08x) 0.1 ( 0.54x)! 12.8 ( 0.89x)! 37.2 ( 0.40x)! 8.3 ( 0.77x)! 11.2 ( 1.00x) pred8x8_vertical_10_neon 19.0 ( 1.24x) 0.1 ( 0.55x)! 15.3 ( 0.62x)! 39.7 ( 0.50x)! 8.2 ( 0.91x)! 11.1 ( 0.99x)! - pred8x8_horizontal_10 also underperforms on new architectures, but useful on A55 and A76. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2026-02-04 09:06:37 +00:00
Zhao Zhili	f54841d375	avcodec/aarch64: add pngdsp Test Name A55-gcc-11 M1-clang A76-gcc-12 A510-clang X3-clang ------------------------------------------------------------------------------------------------------------------- add_bytes_l2_4096_neon 1807.2 ( 2.01x) 1.6 ( 1.94x) 333.0 ( 6.35x) 1058.2 ( 2.34x) 214.3 ( 1.99x) add_paeth_prediction_3_neon 33036.1 ( 2.41x) 145.1 ( 1.66x) 20443.3 ( 1.97x) 35225.1 ( 1.23x) 19420.8 ( 1.05x) add_paeth_prediction_4_neon 24368.6 ( 3.26x) 106.7 ( 2.01x) 15163.8 ( 2.77x) 26454.7 ( 1.62x) 14319.0 ( 1.35x) add_paeth_prediction_6_neon 17900.6 ( 4.44x) 72.0 ( 2.70x) 10214.3 ( 4.20x) 18296.9 ( 2.27x) 9693.1 ( 1.97x) add_paeth_prediction_8_neon 12615.4 ( 6.31x) 54.1 ( 2.58x) 7706.0 ( 5.45x) 13733.3 ( 2.94x) 7272.6 ( 2.63x) Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2026-02-04 12:05:35 +08:00
Oliver Chang	a795ca89fa	avcodec/qdm2: fix heap-use-after-free in qdm2_decode_frame The `sub_packet` index in `QDM2Context` was not reset to 0 when `qdm2_decode_frame` started processing a new packet. If an error occurred during the decoding of a previous packet, `sub_packet` would retain a non-zero value. In subsequent calls to `qdm2_decode_frame` with a new packet, this non-zero `sub_packet` value caused `qdm2_decode` to skip `qdm2_decode_super_block`. This function is responsible for initializing packet lists with pointers to the current packet's data. Skipping it led to the use of stale pointers from the previous (freed) packet, resulting in a heap-use-after-free vulnerability. This patch explicitly resets `s->sub_packet = 0` at the beginning of `qdm2_decode_frame`, ensuring correct initialization for each new packet. Fixes: OSS-Fuzz issue 476179569 (https://issues.oss-fuzz.com/issues/476179569).	2026-02-03 18:17:32 +00:00
Michael Niedermayer	2df0ef601a	avcodec/jpeg2000dec: allow bpno of -1 Fixes: tickets/4663/levels30.jp2 The file decodes without error messages and no integer overflows The file before the broader M_b check did decode with error messages and integer overflows but also no visual artifacts Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-03 12:39:32 +01:00
Michael Niedermayer	e1472a4e0c	avcodec/jpeg2000dec: allow M_b == 31 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-03 12:39:32 +01:00
Michael Niedermayer	8a3c7c9c32	avcodec/jpeg2000dec: Print bpno level when erroring out Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-03 12:39:32 +01:00
Michael Niedermayer	2efffa9ecd	avcodec/jpeg2000dec: Print M_b value when asking for a sample Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-03 12:39:31 +01:00
Frank Plowman	364d5dda91	lavc/vvc: Fix unchecked error codes from add_reconstructed_area	2026-01-31 13:46:13 +00:00
Frank Plowman	f9740eb969	lavc/vvc: Fix unchecked error codes from set_qp_y Fixes: clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VVC_fuzzer-4957602162475008	2026-01-31 13:46:13 +00:00
Martin Storsjö	f74c551eaa	aarch64: Fix indentation of a few instructions This file is excempt from the indent checker script, as there are a few other bits in it that the script wants to reformat into slightly worse form, or which might not warrant being reformatted. But these instructions should indeed be indented this way.	2026-01-30 05:21:27 +00:00
James Almer	041d108958	avcodec/opus/enc: don't remove more samples than needed from the last packet The hardcoded extra 120 samples results in the side data reporting the need to discard the entire packet rather than the padding samples. This is in line with the behavior of the libopus encoder. Signed-off-by: James Almer <jamrial@gmail.com>	2026-01-29 21:09:02 -03:00
James Almer	c3aea7628c	avcodec/opus/enc: set avctx->frame_size to a better guess based on encoder configuration Signed-off-by: James Almer <jamrial@gmail.com>	2026-01-29 21:09:02 -03:00
Andreas Rheinhardt	ca5504fb5c	avcodec/liblc3dec: Simplify sample fmt selection Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 14:08:15 +01:00
Andreas Rheinhardt	ba1aea762b	avcodec/liblc3{dec,enc}: Simplify sample_size, is_planar check Sample size is always sizeof(float), is planar is a simple if given that these codecs only support float and planar float. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 14:08:15 +01:00
Andreas Rheinhardt	436b74b725	avcodec/x86/hevc/dequant: Add SSSE3 dequant ASM function hevc_dequant_4x4_8_c (GCC): 20.2 ( 1.00x) hevc_dequant_4x4_8_c (Clang): 21.7 ( 1.00x) hevc_dequant_4x4_8_ssse3: 5.8 ( 3.51x) hevc_dequant_8x8_8_c (GCC): 32.9 ( 1.00x) hevc_dequant_8x8_8_c (Clang): 78.7 ( 1.00x) hevc_dequant_8x8_8_ssse3: 6.8 ( 4.83x) hevc_dequant_16x16_8_c (GCC): 105.1 ( 1.00x) hevc_dequant_16x16_8_c (Clang): 151.1 ( 1.00x) hevc_dequant_16x16_8_ssse3: 19.3 ( 5.45x) hevc_dequant_32x32_8_c (GCC): 415.7 ( 1.00x) hevc_dequant_32x32_8_c (Clang): 602.3 ( 1.00x) hevc_dequant_32x32_8_ssse3: 78.2 ( 5.32x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 12:25:33 +01:00
Andreas Rheinhardt	cf359a7907	avcodec/hevc/dsp: Add alignment for dequant Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 12:25:33 +01:00
Andreas Rheinhardt	0c7f87b136	avcodec/hevc/dsp_template: Optimize impossible branches away Saves 1856B of .text here. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 12:25:33 +01:00
Andreas Rheinhardt	2729c52988	avcodec/x86/hevc/deblock: Reduce usage of GPRs Don't use two GPRs to store two words from xmm registers; shuffle these words so that they are fit into one GPR. This reduces the amount of GPRs used and leads to tiny speedups here. Also avoid rex prefixes whenever possible (for lines that needed to be modified anyway). Old benchmarks: hevc_h_loop_filter_luma8_skip_c: 23.8 ( 1.00x) hevc_h_loop_filter_luma8_skip_sse2: 8.5 ( 2.80x) hevc_h_loop_filter_luma8_skip_ssse3: 7.2 ( 3.29x) hevc_h_loop_filter_luma8_skip_avx: 6.4 ( 3.71x) hevc_h_loop_filter_luma8_strong_c: 150.4 ( 1.00x) hevc_h_loop_filter_luma8_strong_sse2: 34.4 ( 4.37x) hevc_h_loop_filter_luma8_strong_ssse3: 34.5 ( 4.36x) hevc_h_loop_filter_luma8_strong_avx: 32.3 ( 4.65x) hevc_h_loop_filter_luma8_weak_c: 103.2 ( 1.00x) hevc_h_loop_filter_luma8_weak_sse2: 34.5 ( 2.99x) hevc_h_loop_filter_luma8_weak_ssse3: 7.3 (14.22x) hevc_h_loop_filter_luma8_weak_avx: 32.4 ( 3.18x) hevc_h_loop_filter_luma10_skip_c: 23.5 ( 1.00x) hevc_h_loop_filter_luma10_skip_sse2: 6.6 ( 3.58x) hevc_h_loop_filter_luma10_skip_ssse3: 6.1 ( 3.86x) hevc_h_loop_filter_luma10_skip_avx: 5.4 ( 4.34x) hevc_h_loop_filter_luma10_strong_c: 161.8 ( 1.00x) hevc_h_loop_filter_luma10_strong_sse2: 32.2 ( 5.03x) hevc_h_loop_filter_luma10_strong_ssse3: 30.4 ( 5.33x) hevc_h_loop_filter_luma10_strong_avx: 30.3 ( 5.33x) hevc_h_loop_filter_luma10_weak_c: 23.5 ( 1.00x) hevc_h_loop_filter_luma10_weak_sse2: 6.6 ( 3.58x) hevc_h_loop_filter_luma10_weak_ssse3: 6.1 ( 3.85x) hevc_h_loop_filter_luma10_weak_avx: 5.4 ( 4.35x) hevc_h_loop_filter_luma12_skip_c: 18.8 ( 1.00x) hevc_h_loop_filter_luma12_skip_sse2: 6.6 ( 2.87x) hevc_h_loop_filter_luma12_skip_ssse3: 6.1 ( 3.08x) hevc_h_loop_filter_luma12_skip_avx: 6.2 ( 3.06x) hevc_h_loop_filter_luma12_strong_c: 159.0 ( 1.00x) hevc_h_loop_filter_luma12_strong_sse2: 36.3 ( 4.38x) hevc_h_loop_filter_luma12_strong_ssse3: 36.1 ( 4.40x) hevc_h_loop_filter_luma12_strong_avx: 33.5 ( 4.75x) hevc_h_loop_filter_luma12_weak_c: 40.1 ( 1.00x) hevc_h_loop_filter_luma12_weak_sse2: 35.5 ( 1.13x) hevc_h_loop_filter_luma12_weak_ssse3: 36.1 ( 1.11x) hevc_h_loop_filter_luma12_weak_avx: 6.2 ( 6.52x) hevc_v_loop_filter_luma8_skip_c: 25.5 ( 1.00x) hevc_v_loop_filter_luma8_skip_sse2: 10.6 ( 2.40x) hevc_v_loop_filter_luma8_skip_ssse3: 11.4 ( 2.24x) hevc_v_loop_filter_luma8_skip_avx: 8.3 ( 3.07x) hevc_v_loop_filter_luma8_strong_c: 146.8 ( 1.00x) hevc_v_loop_filter_luma8_strong_sse2: 43.9 ( 3.35x) hevc_v_loop_filter_luma8_strong_ssse3: 43.7 ( 3.36x) hevc_v_loop_filter_luma8_strong_avx: 42.3 ( 3.47x) hevc_v_loop_filter_luma8_weak_c: 25.5 ( 1.00x) hevc_v_loop_filter_luma8_weak_sse2: 10.6 ( 2.40x) hevc_v_loop_filter_luma8_weak_ssse3: 44.0 ( 0.58x) hevc_v_loop_filter_luma8_weak_avx: 8.3 ( 3.09x) hevc_v_loop_filter_luma10_skip_c: 20.0 ( 1.00x) hevc_v_loop_filter_luma10_skip_sse2: 11.3 ( 1.77x) hevc_v_loop_filter_luma10_skip_ssse3: 11.0 ( 1.82x) hevc_v_loop_filter_luma10_skip_avx: 9.3 ( 2.15x) hevc_v_loop_filter_luma10_strong_c: 193.5 ( 1.00x) hevc_v_loop_filter_luma10_strong_sse2: 46.1 ( 4.19x) hevc_v_loop_filter_luma10_strong_ssse3: 44.2 ( 4.38x) hevc_v_loop_filter_luma10_strong_avx: 44.4 ( 4.35x) hevc_v_loop_filter_luma10_weak_c: 90.3 ( 1.00x) hevc_v_loop_filter_luma10_weak_sse2: 46.3 ( 1.95x) hevc_v_loop_filter_luma10_weak_ssse3: 10.8 ( 8.37x) hevc_v_loop_filter_luma10_weak_avx: 44.4 ( 2.03x) hevc_v_loop_filter_luma12_skip_c: 16.8 ( 1.00x) hevc_v_loop_filter_luma12_skip_sse2: 11.8 ( 1.42x) hevc_v_loop_filter_luma12_skip_ssse3: 11.7 ( 1.43x) hevc_v_loop_filter_luma12_skip_avx: 8.7 ( 1.93x) hevc_v_loop_filter_luma12_strong_c: 159.3 ( 1.00x) hevc_v_loop_filter_luma12_strong_sse2: 45.3 ( 3.52x) hevc_v_loop_filter_luma12_strong_ssse3: 60.3 ( 2.64x) hevc_v_loop_filter_luma12_strong_avx: 44.1 ( 3.61x) hevc_v_loop_filter_luma12_weak_c: 63.6 ( 1.00x) hevc_v_loop_filter_luma12_weak_sse2: 45.3 ( 1.40x) hevc_v_loop_filter_luma12_weak_ssse3: 11.7 ( 5.41x) hevc_v_loop_filter_luma12_weak_avx: 43.9 ( 1.45x) New benchmarks: hevc_h_loop_filter_luma8_skip_c: 24.2 ( 1.00x) hevc_h_loop_filter_luma8_skip_sse2: 8.6 ( 2.82x) hevc_h_loop_filter_luma8_skip_ssse3: 7.0 ( 3.46x) hevc_h_loop_filter_luma8_skip_avx: 6.8 ( 3.54x) hevc_h_loop_filter_luma8_strong_c: 150.4 ( 1.00x) hevc_h_loop_filter_luma8_strong_sse2: 33.3 ( 4.52x) hevc_h_loop_filter_luma8_strong_ssse3: 32.7 ( 4.61x) hevc_h_loop_filter_luma8_strong_avx: 32.7 ( 4.60x) hevc_h_loop_filter_luma8_weak_c: 104.0 ( 1.00x) hevc_h_loop_filter_luma8_weak_sse2: 33.2 ( 3.13x) hevc_h_loop_filter_luma8_weak_ssse3: 7.0 (14.91x) hevc_h_loop_filter_luma8_weak_avx: 31.3 ( 3.32x) hevc_h_loop_filter_luma10_skip_c: 19.2 ( 1.00x) hevc_h_loop_filter_luma10_skip_sse2: 6.2 ( 3.08x) hevc_h_loop_filter_luma10_skip_ssse3: 6.2 ( 3.08x) hevc_h_loop_filter_luma10_skip_avx: 5.0 ( 3.85x) hevc_h_loop_filter_luma10_strong_c: 159.8 ( 1.00x) hevc_h_loop_filter_luma10_strong_sse2: 30.0 ( 5.32x) hevc_h_loop_filter_luma10_strong_ssse3: 29.2 ( 5.48x) hevc_h_loop_filter_luma10_strong_avx: 28.6 ( 5.58x) hevc_h_loop_filter_luma10_weak_c: 19.2 ( 1.00x) hevc_h_loop_filter_luma10_weak_sse2: 6.2 ( 3.09x) hevc_h_loop_filter_luma10_weak_ssse3: 6.2 ( 3.09x) hevc_h_loop_filter_luma10_weak_avx: 5.0 ( 3.88x) hevc_h_loop_filter_luma12_skip_c: 18.7 ( 1.00x) hevc_h_loop_filter_luma12_skip_sse2: 6.2 ( 3.00x) hevc_h_loop_filter_luma12_skip_ssse3: 5.7 ( 3.27x) hevc_h_loop_filter_luma12_skip_avx: 5.2 ( 3.61x) hevc_h_loop_filter_luma12_strong_c: 160.2 ( 1.00x) hevc_h_loop_filter_luma12_strong_sse2: 34.2 ( 4.68x) hevc_h_loop_filter_luma12_strong_ssse3: 29.3 ( 5.48x) hevc_h_loop_filter_luma12_strong_avx: 31.4 ( 5.10x) hevc_h_loop_filter_luma12_weak_c: 40.2 ( 1.00x) hevc_h_loop_filter_luma12_weak_sse2: 35.2 ( 1.14x) hevc_h_loop_filter_luma12_weak_ssse3: 29.3 ( 1.37x) hevc_h_loop_filter_luma12_weak_avx: 5.0 ( 8.09x) hevc_v_loop_filter_luma8_skip_c: 25.6 ( 1.00x) hevc_v_loop_filter_luma8_skip_sse2: 10.2 ( 2.52x) hevc_v_loop_filter_luma8_skip_ssse3: 10.5 ( 2.45x) hevc_v_loop_filter_luma8_skip_avx: 8.2 ( 3.11x) hevc_v_loop_filter_luma8_strong_c: 147.1 ( 1.00x) hevc_v_loop_filter_luma8_strong_sse2: 42.6 ( 3.45x) hevc_v_loop_filter_luma8_strong_ssse3: 42.4 ( 3.47x) hevc_v_loop_filter_luma8_strong_avx: 40.1 ( 3.67x) hevc_v_loop_filter_luma8_weak_c: 25.6 ( 1.00x) hevc_v_loop_filter_luma8_weak_sse2: 10.6 ( 2.42x) hevc_v_loop_filter_luma8_weak_ssse3: 42.7 ( 0.60x) hevc_v_loop_filter_luma8_weak_avx: 8.2 ( 3.11x) hevc_v_loop_filter_luma10_skip_c: 16.7 ( 1.00x) hevc_v_loop_filter_luma10_skip_sse2: 11.0 ( 1.52x) hevc_v_loop_filter_luma10_skip_ssse3: 10.5 ( 1.59x) hevc_v_loop_filter_luma10_skip_avx: 9.6 ( 1.74x) hevc_v_loop_filter_luma10_strong_c: 190.0 ( 1.00x) hevc_v_loop_filter_luma10_strong_sse2: 44.8 ( 4.24x) hevc_v_loop_filter_luma10_strong_ssse3: 42.3 ( 4.49x) hevc_v_loop_filter_luma10_strong_avx: 42.5 ( 4.47x) hevc_v_loop_filter_luma10_weak_c: 88.3 ( 1.00x) hevc_v_loop_filter_luma10_weak_sse2: 45.7 ( 1.93x) hevc_v_loop_filter_luma10_weak_ssse3: 10.5 ( 8.40x) hevc_v_loop_filter_luma10_weak_avx: 42.4 ( 2.09x) hevc_v_loop_filter_luma12_skip_c: 16.7 ( 1.00x) hevc_v_loop_filter_luma12_skip_sse2: 11.7 ( 1.42x) hevc_v_loop_filter_luma12_skip_ssse3: 10.5 ( 1.59x) hevc_v_loop_filter_luma12_skip_avx: 8.8 ( 1.90x) hevc_v_loop_filter_luma12_strong_c: 159.4 ( 1.00x) hevc_v_loop_filter_luma12_strong_sse2: 45.2 ( 3.53x) hevc_v_loop_filter_luma12_strong_ssse3: 59.3 ( 2.69x) hevc_v_loop_filter_luma12_strong_avx: 41.7 ( 3.82x) hevc_v_loop_filter_luma12_weak_c: 63.3 ( 1.00x) hevc_v_loop_filter_luma12_weak_sse2: 44.9 ( 1.41x) hevc_v_loop_filter_luma12_weak_ssse3: 10.5 ( 6.02x) hevc_v_loop_filter_luma12_weak_avx: 41.7 ( 1.52x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 11:54:57 +01:00
Andreas Rheinhardt	0843252229	avcodec/x86/hevc/deblock: avoid unused GPR r12 is unused, so use it instead of r13 to reduce the amount of push/pops. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 11:54:57 +01:00
Andreas Rheinhardt	0aad8b860a	avcodec/x86/hevc/deblock: Avoid vmovdqa (It would even be possible to avoid a clobbering m10 in MASKED_COPY and the mask register (%3) in MASKED_COPY2 when VEX encoding is in use.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 11:54:57 +01:00
Andreas Rheinhardt	c940128fff	avcodec/x86/vp9lpf: Avoid vmovdqa Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 11:54:57 +01:00
Andreas Rheinhardt	c898ddb8fe	avcodec/x86/cfhddsp: Reduce number of xmm registers used Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 01:33:40 +01:00
Andreas Rheinhardt	848c3ca772	avcodec/x86/cfhddsp: Avoid pmaddwd The result of using pmaddwd with the coefficients 1,-1,...,1,-1 is just the negative of using pmaddwd with the coefficients -1,1,...,-1,1, so avoid one pmaddwd. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 01:33:37 +01:00
Andreas Rheinhardt	6224445753	avcodec/x86/cfhdencdsp: Avoid += x, -= x Avoid incrementing lowq and highq inside the loop by using complex addressing modes, avoiding to undo said modification at the end of the horizontal loop. For inputq, modify istrideq outside of the loop so that it is only modified once at the end of the horizontal loop. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 01:33:34 +01:00
Andreas Rheinhardt	7dd6487800	avcodec/x86/cfhdencdsp: Don't load twice Sign extend the integer arguments directly from the stack instead of loading qwords, followed by sign-extending the lower half. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 01:33:30 +01:00
Andreas Rheinhardt	91c7710412	avcodec/x86/cfhdencdsp: Avoid unnecessary constants Up until now, cfhdencdsp used constants consisting of -1, 1, ...,-1,1 words and 1, -1,...,1,-1 words for use as constants in pmaddwd. But one can use the same constants if one shuffles the words in a dword the opposite order. Similarly for some other constants. This also allowed to avoid a register in chfdenc_vert_filter. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 01:33:23 +01:00
Andreas Rheinhardt	cd3d8116fb	avcodec/x86/cfhdencdsp: Avoid load of -1 It can be easily generated at runtime. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 01:32:57 +01:00
Kasidis Arunruangsirilert	e9e8a32b29	avcodec/nvenc: add 4-way multi nvenc split frame encoding support	2026-01-27 12:58:46 +00:00
Diego de Souza	499b5f5f92	avcodec/nvenc: add b_adapt option for HEVC encoder The b_adapt option allows users to control adaptive B-frame decision when lookahead is enabled in HEVC encoding. This feature was already available for H.264 and AV1 encoders, but was missing from HEVC. Signed-off-by: Diego de Souza <ddesouza@nvidia.com>	2026-01-27 12:58:08 +00:00
Andreas Rheinhardt	bf4d5037b4	avcodec/h264dsp: Remove redundant h264 from H264DSPCtx member names These names are a remnant of dsputil when all the DSP functions from all codecs were part of DSPcontext. Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Reviewed-by: Sean McGovern <gseanmcg@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:25 +01:00
Andreas Rheinhardt	489aaf4e1c	avcodec/x86/h264_deblock: Don't sign-extend stride Unnecessary (and wrong) since `d5d699ab6e`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00
Andreas Rheinhardt	db66e057eb	avcodec/x86/h264_deblock: Avoid reload Old benchmarks: h264_h_loop_filter_luma_8bpp_c: 60.0 ( 1.00x) h264_h_loop_filter_luma_8bpp_sse2: 65.4 ( 0.92x) h264_h_loop_filter_luma_8bpp_avx: 65.3 ( 0.92x) New benchmarks: h264_h_loop_filter_luma_8bpp_c: 60.4 ( 1.00x) h264_h_loop_filter_luma_8bpp_sse2: 62.0 ( 0.97x) h264_h_loop_filter_luma_8bpp_avx: 61.7 ( 0.98x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00
Andreas Rheinhardt	8428a412bc	avcodec/x86/h264_deblock: Avoid MMX in deblock_h_luma_8 Old benchmarks: h264_h_loop_filter_luma_8bpp_c: 59.9 ( 1.00x) h264_h_loop_filter_luma_8bpp_sse2: 67.9 ( 0.88x) h264_h_loop_filter_luma_8bpp_avx: 67.4 ( 0.89x) New benchmarks: h264_h_loop_filter_luma_8bpp_c: 60.0 ( 1.00x) h264_h_loop_filter_luma_8bpp_sse2: 65.4 ( 0.92x) h264_h_loop_filter_luma_8bpp_avx: 65.3 ( 0.92x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00
Andreas Rheinhardt	9882973935	avcodec/x86/h264_deblock: Avoid reloading constant No change in benchmarks. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00
Andreas Rheinhardt	eaaf45fd79	avcodec/x86/h264_deblock_10bit: Simplify r0+4*r1 Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00
Andreas Rheinhardt	aab0946eae	avcodec/x86/h264_deblock_10bit: Remove mmxext functions Now that the SSE2/AVX functions are no longer restricted to those systems having an aligned stack, the MMXEXT functions are always overridden (except for ancient systems without SSE2), so remove them. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00
Andreas Rheinhardt	dbdf514c17	avcodec/x86/h264_deblock_10bit: Remove custom stack allocation code Allocate it via cglobal as usual. This makes the SSE2/AVX functions available when HAVE_ALIGNED_STACK is false; it also avoids modifying rsp unnecessarily in the deblock_h_luma_intra_10 functions on Win64. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00
Andreas Rheinhardt	b1140d3c98	avcodec/x86/h264_deblock: Remove obsolete macro parameters They are a remnant of the MMX functions (which processed only eight pixels at a time, so that it was called twice via a wrapper; the actual MMX function had "v8" in its name instead of simply v) which have been removed in commit `4618f36a24`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00
Andreas Rheinhardt	899475326b	avcodec/x86/h264_deblock: Simplify splatting Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00
Andreas Rheinhardt	a22149ab3d	avcodec/x86/h264_deblock: Remove always-false branches These functions are always called with alpha and beta > 0. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00
Andreas Rheinhardt	982244818b	avcodec/x86/h264_deblock: Remove unused macros Forgotten in `4618f36a24`. Also remove a PASS8ROWS wrapper that seems to have been always unused. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00
Andreas Rheinhardt	6e65d1c945	avcodec/motion_est: Fix left shifts of negative numbers Fixes ticket #21486. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:46:39 +01:00
Jun Zhao	8966101fa6	lavc/hevc: add aarch64 neon for 12-bit dequant Implement NEON optimization for HEVC dequant at 12-bit depth. For 12-bit: shift = 15 - 12 - log2_size = 3 - log2_size. When shift is negative, we use shl (shift left) instead of srshr. Performance benchmark on Apple M4: ./tests/checkasm/checkasm --test=hevc_dequant --bench hevc_dequant_4x4_12_c: 9.9 ( 1.00x) hevc_dequant_4x4_12_neon: 5.7 ( 1.74x) hevc_dequant_8x8_12_c: 1.7 ( 1.00x) hevc_dequant_8x8_12_neon: 1.3 ( 1.30x) hevc_dequant_16x16_12_c: 131.1 ( 1.00x) hevc_dequant_16x16_12_neon: 7.9 (16.52x) hevc_dequant_32x32_12_c: 69.7 ( 1.00x) hevc_dequant_32x32_12_neon: 28.4 ( 2.46x) Signed-off-by: Jun Zhao <barryjzhao@tencent.com>	2026-01-25 06:55:26 +00:00

1 2 3 4 5 ...

53457 commits