Michael Niedermayer
8f57b04fe5
avcodec/hevc/sei: Use get_bits64() in decode_nal_sei_3d_reference_displays_info()
...
Fixes: Assertion n>=0 && n<=32 failed at ./libavcodec/get_bits.h:426
Fixes: 468435217/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-4644127078940672
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-05 20:20:08 +00:00
Michael Niedermayer
af86f0ffcc
avcodec/dca_xll: Clear padding in ff_dca_xll_parse()
...
Fixes: Use of uninitialized memory
Fixes: 472020020/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DCA_DEC_fuzzer-6433045331902464
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-05 18:12:46 +01:00
Michael Niedermayer
189bc0aaf5
avcodec/dxv: Clear tex_data padding on reallocation
...
dxv assumes that newly reallocated memory in tex_data is not uninitialized
thus we have to do that too in case of reallocation in ff_lzf_uncompress()
Fixes: 475000819/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXV_DEC_fuzzer-5571269310611456
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-05 16:29:08 +01:00
Michael Niedermayer
0f35146e27
avcodec/lzf: Remove size messing from ff_lzf_uncompress()
...
size represents the output size
randomly changing it but not reseting it on errors leaks uninitialized memory.
Fixes: 475000819/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXV_DEC_fuzzer-5571269310611456
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-05 16:29:08 +01:00
Michael Niedermayer
5db50e8775
avcodec/ffv1enc: refine end condition
...
In the case where the last sorted value was -1u and we where on the first
pass of run1 we failed to fill the last few values of bitmap
No real world testcase is known
Fixes: use of uninitialized memory
Fixes: 460333808/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FFV1_fuzzer-6370167888347136
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-05 16:07:13 +01:00
Michael Niedermayer
11a5afea31
avcodec/dca_xll: Check get_rice_array()
...
Fixes: use of uninitialized memory
Fixes: 451655450/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DCA_DEC_fuzzer-6527248623796224
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-05 14:37:59 +01:00
Jun Zhao
27dd2f1c70
lavc/hevc: fix missing # in ldrsw immediate offset
...
The ldrsw instruction requires immediate offset with # prefix.
This fixes the syntax error introduced in commit 26752368f0
(aarch64/h26x: Add put_hevc_pel_bi_w_pixels) where the
load_bi_w_pixels_param macro was added.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-02-05 09:13:22 +08:00
Zhao Zhili
e250854ecf
aarch64/h264pred: disable inefficient functions
...
These assembly optimizations have been identified as "performance
regressions." Due to advancements in modern CPU micro-architectures
and compiler optimization the C implementations now consistently
outperform these handwritten routines.
Test Name A55-clang M1 A76-gcc-14 A510-clang A715-clang X3-clang
--------------------------------------------------------------------------------------------------------------------
pred8x8_dc_8_neon 55.9 ( 0.79x)! 0.2 ( 0.31x)! 35.7 ( 0.63x)! 98.3 ( 0.37x)! 35.9 ( 0.45x)! 33.6 ( 0.38x)!
pred8x8_dc_10_neon 57.0 ( 1.04x) 0.3 ( 0.36x)! 35.9 ( 0.94x)! 98.2 ( 0.53x)! 35.8 ( 0.58x)! 33.2 ( 0.50x)!
pred8x8_dc_128_8_neon 26.0 ( 0.69x)! 0.1 ( 0.43x)! 15.3 ( 0.73x)! 46.4 ( 0.36x)! 10.6 ( 0.48x)! 10.3 ( 1.09x)
pred8x8_dc_128_10_neon 25.3 ( 0.99x)! 0.1 ( 0.42x)! 19.3 ( 0.48x)! 44.5 ( 0.42x)! 10.0 ( 0.61x)! 11.0 ( 1.00x)
pred8x8_left_dc_8_neon 46.9 ( 0.72x)! 0.2 ( 0.26x)! 30.2 ( 0.49x)! 71.4 ( 0.39x)! 29.8 ( 0.35x)! 26.5 ( 0.44x)!
pred8x8_left_dc_10_neon 45.4 ( 0.82x)! 0.2 ( 0.29x)! 28.1 ( 0.67x)! 70.2 ( 0.47x)! 30.0 ( 0.38x)! 26.5 ( 0.43x)!
pred16x16_dc_8_neon 74.4 ( 1.34x) 0.3 ( 0.62x)! 44.7 ( 0.89x)! 128.0 ( 0.79x)! 48.5 ( 0.67x)! 39.4 ( 0.71x)!
pred16x16_dc_128_8_neon 37.9 ( 0.79x)! 0.1 ( 0.60x)! 20.1 ( 0.80x)! 41.8 ( 0.46x)! 16.2 ( 0.81x)! 12.8 ( 0.95x)!
pred16x16_left_dc_8_neon 69.9 ( 1.19x) 0.3 ( 0.46x)! 49.6 ( 0.54x)! 116.8 ( 0.62x)! 52.8 ( 0.45x)! 44.2 ( 0.51x)!
pred8x8_hori_8_neon 30.6 ( 1.39x) 0.1 ( 0.45x)! 19.4 ( 0.81x)! 71.0 ( 0.50x)! 15.9 ( 0.55x)! 12.2 ( 0.94x)!
pred8x8_hori_10_neon* 29.3 ( 1.82x) 0.1 ( 0.59x)! 18.5 ( 1.56x) 68.9 ( 0.64x)! 15.8 ( 0.62x)! 11.8 ( 0.97x)!
pred8x8_top_dc_8_neon 35.8 ( 0.96x)! 0.1 ( 0.59x)! 16.8 ( 0.81x)! 58.9 ( 0.44x)! 11.3 ( 0.89x)! 11.4 ( 0.99x)!
pred8x8_top_dc_10_neon 37.4 ( 1.24x) 0.1 ( 0.92x)! 20.4 ( 0.81x)! 59.5 ( 0.69x)! 10.5 ( 1.48x) 11.8 ( 1.02x)
pred8x8_vertical_8_neon 18.3 ( 1.08x) 0.1 ( 0.54x)! 12.8 ( 0.89x)! 37.2 ( 0.40x)! 8.3 ( 0.77x)! 11.2 ( 1.00x)
pred8x8_vertical_10_neon 19.0 ( 1.24x) 0.1 ( 0.55x)! 15.3 ( 0.62x)! 39.7 ( 0.50x)! 8.2 ( 0.91x)! 11.1 ( 0.99x)!
- pred8x8_horizontal_10 also underperforms on new architectures, but useful on A55 and A76.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2026-02-04 09:06:37 +00:00
Zhao Zhili
f54841d375
avcodec/aarch64: add pngdsp
...
Test Name A55-gcc-11 M1-clang A76-gcc-12 A510-clang X3-clang
-------------------------------------------------------------------------------------------------------------------
add_bytes_l2_4096_neon 1807.2 ( 2.01x) 1.6 ( 1.94x) 333.0 ( 6.35x) 1058.2 ( 2.34x) 214.3 ( 1.99x)
add_paeth_prediction_3_neon 33036.1 ( 2.41x) 145.1 ( 1.66x) 20443.3 ( 1.97x) 35225.1 ( 1.23x) 19420.8 ( 1.05x)
add_paeth_prediction_4_neon 24368.6 ( 3.26x) 106.7 ( 2.01x) 15163.8 ( 2.77x) 26454.7 ( 1.62x) 14319.0 ( 1.35x)
add_paeth_prediction_6_neon 17900.6 ( 4.44x) 72.0 ( 2.70x) 10214.3 ( 4.20x) 18296.9 ( 2.27x) 9693.1 ( 1.97x)
add_paeth_prediction_8_neon 12615.4 ( 6.31x) 54.1 ( 2.58x) 7706.0 ( 5.45x) 13733.3 ( 2.94x) 7272.6 ( 2.63x)
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2026-02-04 12:05:35 +08:00
Oliver Chang
a795ca89fa
avcodec/qdm2: fix heap-use-after-free in qdm2_decode_frame
...
The `sub_packet` index in `QDM2Context` was not reset to 0 when
`qdm2_decode_frame` started processing a new packet. If an error
occurred during the decoding of a previous packet, `sub_packet` would
retain a non-zero value.
In subsequent calls to `qdm2_decode_frame` with a new packet, this
non-zero `sub_packet` value caused `qdm2_decode` to skip
`qdm2_decode_super_block`. This function is responsible for initializing
packet lists with pointers to the current packet's data. Skipping it led
to the use of stale pointers from the previous (freed) packet, resulting
in a heap-use-after-free vulnerability.
This patch explicitly resets `s->sub_packet = 0` at the beginning of
`qdm2_decode_frame`, ensuring correct initialization for each new
packet.
Fixes: OSS-Fuzz issue 476179569
(https://issues.oss-fuzz.com/issues/476179569 ).
2026-02-03 18:17:32 +00:00
Michael Niedermayer
2df0ef601a
avcodec/jpeg2000dec: allow bpno of -1
...
Fixes: tickets/4663/levels30.jp2
The file decodes without error messages and no integer overflows
The file before the broader M_b check did decode with error messages and integer overflows but also no visual artifacts
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-03 12:39:32 +01:00
Michael Niedermayer
e1472a4e0c
avcodec/jpeg2000dec: allow M_b == 31
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-03 12:39:32 +01:00
Michael Niedermayer
8a3c7c9c32
avcodec/jpeg2000dec: Print bpno level when erroring out
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-03 12:39:32 +01:00
Michael Niedermayer
2efffa9ecd
avcodec/jpeg2000dec: Print M_b value when asking for a sample
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-03 12:39:31 +01:00
Frank Plowman
364d5dda91
lavc/vvc: Fix unchecked error codes from add_reconstructed_area
2026-01-31 13:46:13 +00:00
Frank Plowman
f9740eb969
lavc/vvc: Fix unchecked error codes from set_qp_y
...
Fixes: clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VVC_fuzzer-4957602162475008
2026-01-31 13:46:13 +00:00
Martin Storsjö
f74c551eaa
aarch64: Fix indentation of a few instructions
...
This file is excempt from the indent checker script, as there
are a few other bits in it that the script wants to reformat
into slightly worse form, or which might not warrant being
reformatted.
But these instructions should indeed be indented this way.
2026-01-30 05:21:27 +00:00
James Almer
041d108958
avcodec/opus/enc: don't remove more samples than needed from the last packet
...
The hardcoded extra 120 samples results in the side data reporting the need to
discard the entire packet rather than the padding samples.
This is in line with the behavior of the libopus encoder.
Signed-off-by: James Almer <jamrial@gmail.com>
2026-01-29 21:09:02 -03:00
James Almer
c3aea7628c
avcodec/opus/enc: set avctx->frame_size to a better guess based on encoder configuration
...
Signed-off-by: James Almer <jamrial@gmail.com>
2026-01-29 21:09:02 -03:00
Andreas Rheinhardt
ca5504fb5c
avcodec/liblc3dec: Simplify sample fmt selection
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 14:08:15 +01:00
Andreas Rheinhardt
ba1aea762b
avcodec/liblc3{dec,enc}: Simplify sample_size, is_planar check
...
Sample size is always sizeof(float), is planar is a simple if
given that these codecs only support float and planar float.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 14:08:15 +01:00
Andreas Rheinhardt
436b74b725
avcodec/x86/hevc/dequant: Add SSSE3 dequant ASM function
...
hevc_dequant_4x4_8_c (GCC): 20.2 ( 1.00x)
hevc_dequant_4x4_8_c (Clang): 21.7 ( 1.00x)
hevc_dequant_4x4_8_ssse3: 5.8 ( 3.51x)
hevc_dequant_8x8_8_c (GCC): 32.9 ( 1.00x)
hevc_dequant_8x8_8_c (Clang): 78.7 ( 1.00x)
hevc_dequant_8x8_8_ssse3: 6.8 ( 4.83x)
hevc_dequant_16x16_8_c (GCC): 105.1 ( 1.00x)
hevc_dequant_16x16_8_c (Clang): 151.1 ( 1.00x)
hevc_dequant_16x16_8_ssse3: 19.3 ( 5.45x)
hevc_dequant_32x32_8_c (GCC): 415.7 ( 1.00x)
hevc_dequant_32x32_8_c (Clang): 602.3 ( 1.00x)
hevc_dequant_32x32_8_ssse3: 78.2 ( 5.32x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 12:25:33 +01:00
Andreas Rheinhardt
cf359a7907
avcodec/hevc/dsp: Add alignment for dequant
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 12:25:33 +01:00
Andreas Rheinhardt
0c7f87b136
avcodec/hevc/dsp_template: Optimize impossible branches away
...
Saves 1856B of .text here.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 12:25:33 +01:00
Andreas Rheinhardt
2729c52988
avcodec/x86/hevc/deblock: Reduce usage of GPRs
...
Don't use two GPRs to store two words from xmm registers;
shuffle these words so that they are fit into one GPR.
This reduces the amount of GPRs used and leads to tiny speedups
here. Also avoid rex prefixes whenever possible (for lines
that needed to be modified anyway).
Old benchmarks:
hevc_h_loop_filter_luma8_skip_c: 23.8 ( 1.00x)
hevc_h_loop_filter_luma8_skip_sse2: 8.5 ( 2.80x)
hevc_h_loop_filter_luma8_skip_ssse3: 7.2 ( 3.29x)
hevc_h_loop_filter_luma8_skip_avx: 6.4 ( 3.71x)
hevc_h_loop_filter_luma8_strong_c: 150.4 ( 1.00x)
hevc_h_loop_filter_luma8_strong_sse2: 34.4 ( 4.37x)
hevc_h_loop_filter_luma8_strong_ssse3: 34.5 ( 4.36x)
hevc_h_loop_filter_luma8_strong_avx: 32.3 ( 4.65x)
hevc_h_loop_filter_luma8_weak_c: 103.2 ( 1.00x)
hevc_h_loop_filter_luma8_weak_sse2: 34.5 ( 2.99x)
hevc_h_loop_filter_luma8_weak_ssse3: 7.3 (14.22x)
hevc_h_loop_filter_luma8_weak_avx: 32.4 ( 3.18x)
hevc_h_loop_filter_luma10_skip_c: 23.5 ( 1.00x)
hevc_h_loop_filter_luma10_skip_sse2: 6.6 ( 3.58x)
hevc_h_loop_filter_luma10_skip_ssse3: 6.1 ( 3.86x)
hevc_h_loop_filter_luma10_skip_avx: 5.4 ( 4.34x)
hevc_h_loop_filter_luma10_strong_c: 161.8 ( 1.00x)
hevc_h_loop_filter_luma10_strong_sse2: 32.2 ( 5.03x)
hevc_h_loop_filter_luma10_strong_ssse3: 30.4 ( 5.33x)
hevc_h_loop_filter_luma10_strong_avx: 30.3 ( 5.33x)
hevc_h_loop_filter_luma10_weak_c: 23.5 ( 1.00x)
hevc_h_loop_filter_luma10_weak_sse2: 6.6 ( 3.58x)
hevc_h_loop_filter_luma10_weak_ssse3: 6.1 ( 3.85x)
hevc_h_loop_filter_luma10_weak_avx: 5.4 ( 4.35x)
hevc_h_loop_filter_luma12_skip_c: 18.8 ( 1.00x)
hevc_h_loop_filter_luma12_skip_sse2: 6.6 ( 2.87x)
hevc_h_loop_filter_luma12_skip_ssse3: 6.1 ( 3.08x)
hevc_h_loop_filter_luma12_skip_avx: 6.2 ( 3.06x)
hevc_h_loop_filter_luma12_strong_c: 159.0 ( 1.00x)
hevc_h_loop_filter_luma12_strong_sse2: 36.3 ( 4.38x)
hevc_h_loop_filter_luma12_strong_ssse3: 36.1 ( 4.40x)
hevc_h_loop_filter_luma12_strong_avx: 33.5 ( 4.75x)
hevc_h_loop_filter_luma12_weak_c: 40.1 ( 1.00x)
hevc_h_loop_filter_luma12_weak_sse2: 35.5 ( 1.13x)
hevc_h_loop_filter_luma12_weak_ssse3: 36.1 ( 1.11x)
hevc_h_loop_filter_luma12_weak_avx: 6.2 ( 6.52x)
hevc_v_loop_filter_luma8_skip_c: 25.5 ( 1.00x)
hevc_v_loop_filter_luma8_skip_sse2: 10.6 ( 2.40x)
hevc_v_loop_filter_luma8_skip_ssse3: 11.4 ( 2.24x)
hevc_v_loop_filter_luma8_skip_avx: 8.3 ( 3.07x)
hevc_v_loop_filter_luma8_strong_c: 146.8 ( 1.00x)
hevc_v_loop_filter_luma8_strong_sse2: 43.9 ( 3.35x)
hevc_v_loop_filter_luma8_strong_ssse3: 43.7 ( 3.36x)
hevc_v_loop_filter_luma8_strong_avx: 42.3 ( 3.47x)
hevc_v_loop_filter_luma8_weak_c: 25.5 ( 1.00x)
hevc_v_loop_filter_luma8_weak_sse2: 10.6 ( 2.40x)
hevc_v_loop_filter_luma8_weak_ssse3: 44.0 ( 0.58x)
hevc_v_loop_filter_luma8_weak_avx: 8.3 ( 3.09x)
hevc_v_loop_filter_luma10_skip_c: 20.0 ( 1.00x)
hevc_v_loop_filter_luma10_skip_sse2: 11.3 ( 1.77x)
hevc_v_loop_filter_luma10_skip_ssse3: 11.0 ( 1.82x)
hevc_v_loop_filter_luma10_skip_avx: 9.3 ( 2.15x)
hevc_v_loop_filter_luma10_strong_c: 193.5 ( 1.00x)
hevc_v_loop_filter_luma10_strong_sse2: 46.1 ( 4.19x)
hevc_v_loop_filter_luma10_strong_ssse3: 44.2 ( 4.38x)
hevc_v_loop_filter_luma10_strong_avx: 44.4 ( 4.35x)
hevc_v_loop_filter_luma10_weak_c: 90.3 ( 1.00x)
hevc_v_loop_filter_luma10_weak_sse2: 46.3 ( 1.95x)
hevc_v_loop_filter_luma10_weak_ssse3: 10.8 ( 8.37x)
hevc_v_loop_filter_luma10_weak_avx: 44.4 ( 2.03x)
hevc_v_loop_filter_luma12_skip_c: 16.8 ( 1.00x)
hevc_v_loop_filter_luma12_skip_sse2: 11.8 ( 1.42x)
hevc_v_loop_filter_luma12_skip_ssse3: 11.7 ( 1.43x)
hevc_v_loop_filter_luma12_skip_avx: 8.7 ( 1.93x)
hevc_v_loop_filter_luma12_strong_c: 159.3 ( 1.00x)
hevc_v_loop_filter_luma12_strong_sse2: 45.3 ( 3.52x)
hevc_v_loop_filter_luma12_strong_ssse3: 60.3 ( 2.64x)
hevc_v_loop_filter_luma12_strong_avx: 44.1 ( 3.61x)
hevc_v_loop_filter_luma12_weak_c: 63.6 ( 1.00x)
hevc_v_loop_filter_luma12_weak_sse2: 45.3 ( 1.40x)
hevc_v_loop_filter_luma12_weak_ssse3: 11.7 ( 5.41x)
hevc_v_loop_filter_luma12_weak_avx: 43.9 ( 1.45x)
New benchmarks:
hevc_h_loop_filter_luma8_skip_c: 24.2 ( 1.00x)
hevc_h_loop_filter_luma8_skip_sse2: 8.6 ( 2.82x)
hevc_h_loop_filter_luma8_skip_ssse3: 7.0 ( 3.46x)
hevc_h_loop_filter_luma8_skip_avx: 6.8 ( 3.54x)
hevc_h_loop_filter_luma8_strong_c: 150.4 ( 1.00x)
hevc_h_loop_filter_luma8_strong_sse2: 33.3 ( 4.52x)
hevc_h_loop_filter_luma8_strong_ssse3: 32.7 ( 4.61x)
hevc_h_loop_filter_luma8_strong_avx: 32.7 ( 4.60x)
hevc_h_loop_filter_luma8_weak_c: 104.0 ( 1.00x)
hevc_h_loop_filter_luma8_weak_sse2: 33.2 ( 3.13x)
hevc_h_loop_filter_luma8_weak_ssse3: 7.0 (14.91x)
hevc_h_loop_filter_luma8_weak_avx: 31.3 ( 3.32x)
hevc_h_loop_filter_luma10_skip_c: 19.2 ( 1.00x)
hevc_h_loop_filter_luma10_skip_sse2: 6.2 ( 3.08x)
hevc_h_loop_filter_luma10_skip_ssse3: 6.2 ( 3.08x)
hevc_h_loop_filter_luma10_skip_avx: 5.0 ( 3.85x)
hevc_h_loop_filter_luma10_strong_c: 159.8 ( 1.00x)
hevc_h_loop_filter_luma10_strong_sse2: 30.0 ( 5.32x)
hevc_h_loop_filter_luma10_strong_ssse3: 29.2 ( 5.48x)
hevc_h_loop_filter_luma10_strong_avx: 28.6 ( 5.58x)
hevc_h_loop_filter_luma10_weak_c: 19.2 ( 1.00x)
hevc_h_loop_filter_luma10_weak_sse2: 6.2 ( 3.09x)
hevc_h_loop_filter_luma10_weak_ssse3: 6.2 ( 3.09x)
hevc_h_loop_filter_luma10_weak_avx: 5.0 ( 3.88x)
hevc_h_loop_filter_luma12_skip_c: 18.7 ( 1.00x)
hevc_h_loop_filter_luma12_skip_sse2: 6.2 ( 3.00x)
hevc_h_loop_filter_luma12_skip_ssse3: 5.7 ( 3.27x)
hevc_h_loop_filter_luma12_skip_avx: 5.2 ( 3.61x)
hevc_h_loop_filter_luma12_strong_c: 160.2 ( 1.00x)
hevc_h_loop_filter_luma12_strong_sse2: 34.2 ( 4.68x)
hevc_h_loop_filter_luma12_strong_ssse3: 29.3 ( 5.48x)
hevc_h_loop_filter_luma12_strong_avx: 31.4 ( 5.10x)
hevc_h_loop_filter_luma12_weak_c: 40.2 ( 1.00x)
hevc_h_loop_filter_luma12_weak_sse2: 35.2 ( 1.14x)
hevc_h_loop_filter_luma12_weak_ssse3: 29.3 ( 1.37x)
hevc_h_loop_filter_luma12_weak_avx: 5.0 ( 8.09x)
hevc_v_loop_filter_luma8_skip_c: 25.6 ( 1.00x)
hevc_v_loop_filter_luma8_skip_sse2: 10.2 ( 2.52x)
hevc_v_loop_filter_luma8_skip_ssse3: 10.5 ( 2.45x)
hevc_v_loop_filter_luma8_skip_avx: 8.2 ( 3.11x)
hevc_v_loop_filter_luma8_strong_c: 147.1 ( 1.00x)
hevc_v_loop_filter_luma8_strong_sse2: 42.6 ( 3.45x)
hevc_v_loop_filter_luma8_strong_ssse3: 42.4 ( 3.47x)
hevc_v_loop_filter_luma8_strong_avx: 40.1 ( 3.67x)
hevc_v_loop_filter_luma8_weak_c: 25.6 ( 1.00x)
hevc_v_loop_filter_luma8_weak_sse2: 10.6 ( 2.42x)
hevc_v_loop_filter_luma8_weak_ssse3: 42.7 ( 0.60x)
hevc_v_loop_filter_luma8_weak_avx: 8.2 ( 3.11x)
hevc_v_loop_filter_luma10_skip_c: 16.7 ( 1.00x)
hevc_v_loop_filter_luma10_skip_sse2: 11.0 ( 1.52x)
hevc_v_loop_filter_luma10_skip_ssse3: 10.5 ( 1.59x)
hevc_v_loop_filter_luma10_skip_avx: 9.6 ( 1.74x)
hevc_v_loop_filter_luma10_strong_c: 190.0 ( 1.00x)
hevc_v_loop_filter_luma10_strong_sse2: 44.8 ( 4.24x)
hevc_v_loop_filter_luma10_strong_ssse3: 42.3 ( 4.49x)
hevc_v_loop_filter_luma10_strong_avx: 42.5 ( 4.47x)
hevc_v_loop_filter_luma10_weak_c: 88.3 ( 1.00x)
hevc_v_loop_filter_luma10_weak_sse2: 45.7 ( 1.93x)
hevc_v_loop_filter_luma10_weak_ssse3: 10.5 ( 8.40x)
hevc_v_loop_filter_luma10_weak_avx: 42.4 ( 2.09x)
hevc_v_loop_filter_luma12_skip_c: 16.7 ( 1.00x)
hevc_v_loop_filter_luma12_skip_sse2: 11.7 ( 1.42x)
hevc_v_loop_filter_luma12_skip_ssse3: 10.5 ( 1.59x)
hevc_v_loop_filter_luma12_skip_avx: 8.8 ( 1.90x)
hevc_v_loop_filter_luma12_strong_c: 159.4 ( 1.00x)
hevc_v_loop_filter_luma12_strong_sse2: 45.2 ( 3.53x)
hevc_v_loop_filter_luma12_strong_ssse3: 59.3 ( 2.69x)
hevc_v_loop_filter_luma12_strong_avx: 41.7 ( 3.82x)
hevc_v_loop_filter_luma12_weak_c: 63.3 ( 1.00x)
hevc_v_loop_filter_luma12_weak_sse2: 44.9 ( 1.41x)
hevc_v_loop_filter_luma12_weak_ssse3: 10.5 ( 6.02x)
hevc_v_loop_filter_luma12_weak_avx: 41.7 ( 1.52x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 11:54:57 +01:00
Andreas Rheinhardt
0843252229
avcodec/x86/hevc/deblock: avoid unused GPR
...
r12 is unused, so use it instead of r13 to reduce
the amount of push/pops.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 11:54:57 +01:00
Andreas Rheinhardt
0aad8b860a
avcodec/x86/hevc/deblock: Avoid vmovdqa
...
(It would even be possible to avoid a clobbering m10 in
MASKED_COPY and the mask register (%3) in MASKED_COPY2
when VEX encoding is in use.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 11:54:57 +01:00
Andreas Rheinhardt
c940128fff
avcodec/x86/vp9lpf: Avoid vmovdqa
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 11:54:57 +01:00
Andreas Rheinhardt
c898ddb8fe
avcodec/x86/cfhddsp: Reduce number of xmm registers used
...
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 01:33:40 +01:00
Andreas Rheinhardt
848c3ca772
avcodec/x86/cfhddsp: Avoid pmaddwd
...
The result of using pmaddwd with the coefficients 1,-1,...,1,-1
is just the negative of using pmaddwd with the coefficients
-1,1,...,-1,1, so avoid one pmaddwd.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 01:33:37 +01:00
Andreas Rheinhardt
6224445753
avcodec/x86/cfhdencdsp: Avoid += x, -= x
...
Avoid incrementing lowq and highq inside the loop by using
complex addressing modes, avoiding to undo said modification
at the end of the horizontal loop.
For inputq, modify istrideq outside of the loop so that
it is only modified once at the end of the horizontal loop.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 01:33:34 +01:00
Andreas Rheinhardt
7dd6487800
avcodec/x86/cfhdencdsp: Don't load twice
...
Sign extend the integer arguments directly from the stack
instead of loading qwords, followed by sign-extending the
lower half.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 01:33:30 +01:00
Andreas Rheinhardt
91c7710412
avcodec/x86/cfhdencdsp: Avoid unnecessary constants
...
Up until now, cfhdencdsp used constants consisting
of -1, 1, ...,-1,1 words and 1, -1,...,1,-1 words
for use as constants in pmaddwd. But one can use
the same constants if one shuffles the words in
a dword the opposite order. Similarly for some other
constants. This also allowed to avoid a register in
chfdenc_vert_filter.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 01:33:23 +01:00
Andreas Rheinhardt
cd3d8116fb
avcodec/x86/cfhdencdsp: Avoid load of -1
...
It can be easily generated at runtime.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 01:32:57 +01:00
Kasidis Arunruangsirilert
e9e8a32b29
avcodec/nvenc: add 4-way multi nvenc split frame encoding support
2026-01-27 12:58:46 +00:00
Diego de Souza
499b5f5f92
avcodec/nvenc: add b_adapt option for HEVC encoder
...
The b_adapt option allows users to control adaptive B-frame decision
when lookahead is enabled in HEVC encoding. This feature was already
available for H.264 and AV1 encoders, but was missing from HEVC.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
2026-01-27 12:58:08 +00:00
Andreas Rheinhardt
bf4d5037b4
avcodec/h264dsp: Remove redundant h264 from H264DSPCtx member names
...
These names are a remnant of dsputil when all the DSP functions
from all codecs were part of DSPcontext.
Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Reviewed-by: Sean McGovern <gseanmcg@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:25 +01:00
Andreas Rheinhardt
489aaf4e1c
avcodec/x86/h264_deblock: Don't sign-extend stride
...
Unnecessary (and wrong) since d5d699ab6e .
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
db66e057eb
avcodec/x86/h264_deblock: Avoid reload
...
Old benchmarks:
h264_h_loop_filter_luma_8bpp_c: 60.0 ( 1.00x)
h264_h_loop_filter_luma_8bpp_sse2: 65.4 ( 0.92x)
h264_h_loop_filter_luma_8bpp_avx: 65.3 ( 0.92x)
New benchmarks:
h264_h_loop_filter_luma_8bpp_c: 60.4 ( 1.00x)
h264_h_loop_filter_luma_8bpp_sse2: 62.0 ( 0.97x)
h264_h_loop_filter_luma_8bpp_avx: 61.7 ( 0.98x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
8428a412bc
avcodec/x86/h264_deblock: Avoid MMX in deblock_h_luma_8
...
Old benchmarks:
h264_h_loop_filter_luma_8bpp_c: 59.9 ( 1.00x)
h264_h_loop_filter_luma_8bpp_sse2: 67.9 ( 0.88x)
h264_h_loop_filter_luma_8bpp_avx: 67.4 ( 0.89x)
New benchmarks:
h264_h_loop_filter_luma_8bpp_c: 60.0 ( 1.00x)
h264_h_loop_filter_luma_8bpp_sse2: 65.4 ( 0.92x)
h264_h_loop_filter_luma_8bpp_avx: 65.3 ( 0.92x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
9882973935
avcodec/x86/h264_deblock: Avoid reloading constant
...
No change in benchmarks.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
eaaf45fd79
avcodec/x86/h264_deblock_10bit: Simplify r0+4*r1
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
aab0946eae
avcodec/x86/h264_deblock_10bit: Remove mmxext functions
...
Now that the SSE2/AVX functions are no longer restricted
to those systems having an aligned stack, the MMXEXT functions
are always overridden (except for ancient systems without
SSE2), so remove them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
dbdf514c17
avcodec/x86/h264_deblock_10bit: Remove custom stack allocation code
...
Allocate it via cglobal as usual. This makes the SSE2/AVX functions
available when HAVE_ALIGNED_STACK is false; it also avoids
modifying rsp unnecessarily in the deblock_h_luma_intra_10 functions
on Win64.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
b1140d3c98
avcodec/x86/h264_deblock: Remove obsolete macro parameters
...
They are a remnant of the MMX functions (which processed
only eight pixels at a time, so that it was called twice
via a wrapper; the actual MMX function had "v8" in its name
instead of simply v) which have been removed in commit
4618f36a24 .
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
899475326b
avcodec/x86/h264_deblock: Simplify splatting
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
a22149ab3d
avcodec/x86/h264_deblock: Remove always-false branches
...
These functions are always called with alpha and beta > 0.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
982244818b
avcodec/x86/h264_deblock: Remove unused macros
...
Forgotten in 4618f36a24 .
Also remove a PASS8ROWS wrapper that seems to have been always
unused.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
6e65d1c945
avcodec/motion_est: Fix left shifts of negative numbers
...
Fixes ticket #21486 .
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:46:39 +01:00
Jun Zhao
8966101fa6
lavc/hevc: add aarch64 neon for 12-bit dequant
...
Implement NEON optimization for HEVC dequant at 12-bit depth.
For 12-bit: shift = 15 - 12 - log2_size = 3 - log2_size. When shift
is negative, we use shl (shift left) instead of srshr.
Performance benchmark on Apple M4:
./tests/checkasm/checkasm --test=hevc_dequant --bench
hevc_dequant_4x4_12_c: 9.9 ( 1.00x)
hevc_dequant_4x4_12_neon: 5.7 ( 1.74x)
hevc_dequant_8x8_12_c: 1.7 ( 1.00x)
hevc_dequant_8x8_12_neon: 1.3 ( 1.30x)
hevc_dequant_16x16_12_c: 131.1 ( 1.00x)
hevc_dequant_16x16_12_neon: 7.9 (16.52x)
hevc_dequant_32x32_12_c: 69.7 ( 1.00x)
hevc_dequant_32x32_12_neon: 28.4 ( 2.46x)
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-01-25 06:55:26 +00:00