Commit graph

53457 commits

Author SHA1 Message Date
Michael Niedermayer
8f57b04fe5 avcodec/hevc/sei: Use get_bits64() in decode_nal_sei_3d_reference_displays_info()
Fixes: Assertion n>=0 && n<=32 failed at ./libavcodec/get_bits.h:426
Fixes: 468435217/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-4644127078940672

Found-by:  continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-05 20:20:08 +00:00
Michael Niedermayer
af86f0ffcc
avcodec/dca_xll: Clear padding in ff_dca_xll_parse()
Fixes: Use of uninitialized memory
Fixes: 472020020/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DCA_DEC_fuzzer-6433045331902464

Found-by:  continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-05 18:12:46 +01:00
Michael Niedermayer
189bc0aaf5
avcodec/dxv: Clear tex_data padding on reallocation
dxv assumes that newly reallocated memory in tex_data is not uninitialized
thus we have to do that too in case of reallocation in ff_lzf_uncompress()

Fixes: 475000819/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXV_DEC_fuzzer-5571269310611456

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-05 16:29:08 +01:00
Michael Niedermayer
0f35146e27
avcodec/lzf: Remove size messing from ff_lzf_uncompress()
size represents the output size
randomly changing it but not reseting it on errors leaks uninitialized memory.

Fixes: 475000819/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXV_DEC_fuzzer-5571269310611456

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-05 16:29:08 +01:00
Michael Niedermayer
5db50e8775
avcodec/ffv1enc: refine end condition
In the case where the last sorted value was -1u and we where on the first
pass of run1 we failed to fill the last few values of bitmap

No real world testcase is known

Fixes: use of uninitialized memory
Fixes: 460333808/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FFV1_fuzzer-6370167888347136

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-05 16:07:13 +01:00
Michael Niedermayer
11a5afea31
avcodec/dca_xll: Check get_rice_array()
Fixes: use of uninitialized memory
Fixes: 451655450/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DCA_DEC_fuzzer-6527248623796224

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-05 14:37:59 +01:00
Jun Zhao
27dd2f1c70 lavc/hevc: fix missing # in ldrsw immediate offset
The ldrsw instruction requires immediate offset with # prefix.
This fixes the syntax error introduced in commit 26752368f0
(aarch64/h26x: Add put_hevc_pel_bi_w_pixels) where the
load_bi_w_pixels_param macro was added.

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-02-05 09:13:22 +08:00
Zhao Zhili
e250854ecf aarch64/h264pred: disable inefficient functions
These assembly optimizations have been identified as "performance
regressions." Due to advancements in modern CPU micro-architectures
and compiler optimization the C implementations now consistently
outperform these handwritten routines.

Test Name          	 A55-clang       M1             A76-gcc-14      A510-clang      A715-clang      X3-clang
--------------------------------------------------------------------------------------------------------------------
pred8x8_dc_8_neon        55.9 ( 0.79x)!  0.2 ( 0.31x)!  35.7 ( 0.63x)!  98.3 ( 0.37x)!  35.9 ( 0.45x)!  33.6 ( 0.38x)!
pred8x8_dc_10_neon       57.0 ( 1.04x)   0.3 ( 0.36x)!  35.9 ( 0.94x)!  98.2 ( 0.53x)!  35.8 ( 0.58x)!  33.2 ( 0.50x)!
pred8x8_dc_128_8_neon    26.0 ( 0.69x)!  0.1 ( 0.43x)!  15.3 ( 0.73x)!  46.4 ( 0.36x)!  10.6 ( 0.48x)!  10.3 ( 1.09x)
pred8x8_dc_128_10_neon   25.3 ( 0.99x)!  0.1 ( 0.42x)!  19.3 ( 0.48x)!  44.5 ( 0.42x)!  10.0 ( 0.61x)!  11.0 ( 1.00x)
pred8x8_left_dc_8_neon   46.9 ( 0.72x)!  0.2 ( 0.26x)!  30.2 ( 0.49x)!  71.4 ( 0.39x)!  29.8 ( 0.35x)!  26.5 ( 0.44x)!
pred8x8_left_dc_10_neon  45.4 ( 0.82x)!  0.2 ( 0.29x)!  28.1 ( 0.67x)!  70.2 ( 0.47x)!  30.0 ( 0.38x)!  26.5 ( 0.43x)!
pred16x16_dc_8_neon      74.4 ( 1.34x)   0.3 ( 0.62x)!  44.7 ( 0.89x)!  128.0 ( 0.79x)! 48.5 ( 0.67x)!  39.4 ( 0.71x)!
pred16x16_dc_128_8_neon  37.9 ( 0.79x)!  0.1 ( 0.60x)!  20.1 ( 0.80x)!  41.8 ( 0.46x)!  16.2 ( 0.81x)!  12.8 ( 0.95x)!
pred16x16_left_dc_8_neon 69.9 ( 1.19x)   0.3 ( 0.46x)!  49.6 ( 0.54x)!  116.8 ( 0.62x)! 52.8 ( 0.45x)!  44.2 ( 0.51x)!
pred8x8_hori_8_neon      30.6 ( 1.39x)   0.1 ( 0.45x)!  19.4 ( 0.81x)!  71.0 ( 0.50x)!  15.9 ( 0.55x)!  12.2 ( 0.94x)!
pred8x8_hori_10_neon*    29.3 ( 1.82x)   0.1 ( 0.59x)!  18.5 ( 1.56x)   68.9 ( 0.64x)!  15.8 ( 0.62x)!  11.8 ( 0.97x)!
pred8x8_top_dc_8_neon    35.8 ( 0.96x)!  0.1 ( 0.59x)!  16.8 ( 0.81x)!  58.9 ( 0.44x)!  11.3 ( 0.89x)!  11.4 ( 0.99x)!
pred8x8_top_dc_10_neon   37.4 ( 1.24x)   0.1 ( 0.92x)!  20.4 ( 0.81x)!  59.5 ( 0.69x)!  10.5 ( 1.48x)   11.8 ( 1.02x)
pred8x8_vertical_8_neon  18.3 ( 1.08x)   0.1 ( 0.54x)!  12.8 ( 0.89x)!  37.2 ( 0.40x)!   8.3 ( 0.77x)!  11.2 ( 1.00x)
pred8x8_vertical_10_neon 19.0 ( 1.24x)   0.1 ( 0.55x)!  15.3 ( 0.62x)!  39.7 ( 0.50x)!   8.2 ( 0.91x)!  11.1 ( 0.99x)!

- pred8x8_horizontal_10 also underperforms on new architectures, but useful on A55 and A76.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2026-02-04 09:06:37 +00:00
Zhao Zhili
f54841d375 avcodec/aarch64: add pngdsp
Test Name                    A55-gcc-11        M1-clang           A76-gcc-12          A510-clang        X3-clang
-------------------------------------------------------------------------------------------------------------------
add_bytes_l2_4096_neon        1807.2 ( 2.01x)    1.6 ( 1.94x)    333.0 ( 6.35x)   1058.2 ( 2.34x)    214.3 ( 1.99x)
add_paeth_prediction_3_neon  33036.1 ( 2.41x)  145.1 ( 1.66x)  20443.3 ( 1.97x)  35225.1 ( 1.23x)  19420.8 ( 1.05x)
add_paeth_prediction_4_neon  24368.6 ( 3.26x)  106.7 ( 2.01x)  15163.8 ( 2.77x)  26454.7 ( 1.62x)  14319.0 ( 1.35x)
add_paeth_prediction_6_neon  17900.6 ( 4.44x)   72.0 ( 2.70x)  10214.3 ( 4.20x)  18296.9 ( 2.27x)   9693.1 ( 1.97x)
add_paeth_prediction_8_neon  12615.4 ( 6.31x)   54.1 ( 2.58x)   7706.0 ( 5.45x)  13733.3 ( 2.94x)   7272.6 ( 2.63x)

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2026-02-04 12:05:35 +08:00
Oliver Chang
a795ca89fa avcodec/qdm2: fix heap-use-after-free in qdm2_decode_frame
The `sub_packet` index in `QDM2Context` was not reset to 0 when
`qdm2_decode_frame` started processing a new packet. If an error
occurred during the decoding of a previous packet, `sub_packet` would
retain a non-zero value.

In subsequent calls to `qdm2_decode_frame` with a new packet, this
non-zero `sub_packet` value caused `qdm2_decode` to skip
`qdm2_decode_super_block`. This function is responsible for initializing
packet lists with pointers to the current packet's data. Skipping it led
to the use of stale pointers from the previous (freed) packet, resulting
in a heap-use-after-free vulnerability.

This patch explicitly resets `s->sub_packet = 0` at the beginning of
`qdm2_decode_frame`, ensuring correct initialization for each new
packet.

Fixes: OSS-Fuzz issue 476179569
(https://issues.oss-fuzz.com/issues/476179569).
2026-02-03 18:17:32 +00:00
Michael Niedermayer
2df0ef601a
avcodec/jpeg2000dec: allow bpno of -1
Fixes: tickets/4663/levels30.jp2

The file decodes without error messages and no integer overflows
The file before the broader M_b check did decode with error messages and integer overflows but also no visual artifacts

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-03 12:39:32 +01:00
Michael Niedermayer
e1472a4e0c
avcodec/jpeg2000dec: allow M_b == 31
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-03 12:39:32 +01:00
Michael Niedermayer
8a3c7c9c32
avcodec/jpeg2000dec: Print bpno level when erroring out
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-03 12:39:32 +01:00
Michael Niedermayer
2efffa9ecd
avcodec/jpeg2000dec: Print M_b value when asking for a sample
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-03 12:39:31 +01:00
Frank Plowman
364d5dda91 lavc/vvc: Fix unchecked error codes from add_reconstructed_area 2026-01-31 13:46:13 +00:00
Frank Plowman
f9740eb969 lavc/vvc: Fix unchecked error codes from set_qp_y
Fixes: clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VVC_fuzzer-4957602162475008
2026-01-31 13:46:13 +00:00
Martin Storsjö
f74c551eaa aarch64: Fix indentation of a few instructions
This file is excempt from the indent checker script, as there
are a few other bits in it that the script wants to reformat
into slightly worse form, or which might not warrant being
reformatted.

But these instructions should indeed be indented this way.
2026-01-30 05:21:27 +00:00
James Almer
041d108958 avcodec/opus/enc: don't remove more samples than needed from the last packet
The hardcoded extra 120 samples results in the side data reporting the need to
discard the entire packet rather than the padding samples.
This is in line with the behavior of the libopus encoder.

Signed-off-by: James Almer <jamrial@gmail.com>
2026-01-29 21:09:02 -03:00
James Almer
c3aea7628c avcodec/opus/enc: set avctx->frame_size to a better guess based on encoder configuration
Signed-off-by: James Almer <jamrial@gmail.com>
2026-01-29 21:09:02 -03:00
Andreas Rheinhardt
ca5504fb5c avcodec/liblc3dec: Simplify sample fmt selection
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 14:08:15 +01:00
Andreas Rheinhardt
ba1aea762b avcodec/liblc3{dec,enc}: Simplify sample_size, is_planar check
Sample size is always sizeof(float), is planar is a simple if
given that these codecs only support float and planar float.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 14:08:15 +01:00
Andreas Rheinhardt
436b74b725 avcodec/x86/hevc/dequant: Add SSSE3 dequant ASM function
hevc_dequant_4x4_8_c (GCC):                             20.2 ( 1.00x)
hevc_dequant_4x4_8_c (Clang):                           21.7 ( 1.00x)
hevc_dequant_4x4_8_ssse3:                                5.8 ( 3.51x)
hevc_dequant_8x8_8_c (GCC):                             32.9 ( 1.00x)
hevc_dequant_8x8_8_c (Clang):                           78.7 ( 1.00x)
hevc_dequant_8x8_8_ssse3:                                6.8 ( 4.83x)
hevc_dequant_16x16_8_c (GCC):                          105.1 ( 1.00x)
hevc_dequant_16x16_8_c (Clang):                        151.1 ( 1.00x)
hevc_dequant_16x16_8_ssse3:                             19.3 ( 5.45x)
hevc_dequant_32x32_8_c (GCC):                          415.7 ( 1.00x)
hevc_dequant_32x32_8_c (Clang):                        602.3 ( 1.00x)
hevc_dequant_32x32_8_ssse3:                             78.2 ( 5.32x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 12:25:33 +01:00
Andreas Rheinhardt
cf359a7907 avcodec/hevc/dsp: Add alignment for dequant
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 12:25:33 +01:00
Andreas Rheinhardt
0c7f87b136 avcodec/hevc/dsp_template: Optimize impossible branches away
Saves 1856B of .text here.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 12:25:33 +01:00
Andreas Rheinhardt
2729c52988 avcodec/x86/hevc/deblock: Reduce usage of GPRs
Don't use two GPRs to store two words from xmm registers;
shuffle these words so that they are fit into one GPR.
This reduces the amount of GPRs used and leads to tiny speedups
here. Also avoid rex prefixes whenever possible (for lines
that needed to be modified anyway).

Old benchmarks:
hevc_h_loop_filter_luma8_skip_c:                        23.8 ( 1.00x)
hevc_h_loop_filter_luma8_skip_sse2:                      8.5 ( 2.80x)
hevc_h_loop_filter_luma8_skip_ssse3:                     7.2 ( 3.29x)
hevc_h_loop_filter_luma8_skip_avx:                       6.4 ( 3.71x)
hevc_h_loop_filter_luma8_strong_c:                     150.4 ( 1.00x)
hevc_h_loop_filter_luma8_strong_sse2:                   34.4 ( 4.37x)
hevc_h_loop_filter_luma8_strong_ssse3:                  34.5 ( 4.36x)
hevc_h_loop_filter_luma8_strong_avx:                    32.3 ( 4.65x)
hevc_h_loop_filter_luma8_weak_c:                       103.2 ( 1.00x)
hevc_h_loop_filter_luma8_weak_sse2:                     34.5 ( 2.99x)
hevc_h_loop_filter_luma8_weak_ssse3:                     7.3 (14.22x)
hevc_h_loop_filter_luma8_weak_avx:                      32.4 ( 3.18x)
hevc_h_loop_filter_luma10_skip_c:                       23.5 ( 1.00x)
hevc_h_loop_filter_luma10_skip_sse2:                     6.6 ( 3.58x)
hevc_h_loop_filter_luma10_skip_ssse3:                    6.1 ( 3.86x)
hevc_h_loop_filter_luma10_skip_avx:                      5.4 ( 4.34x)
hevc_h_loop_filter_luma10_strong_c:                    161.8 ( 1.00x)
hevc_h_loop_filter_luma10_strong_sse2:                  32.2 ( 5.03x)
hevc_h_loop_filter_luma10_strong_ssse3:                 30.4 ( 5.33x)
hevc_h_loop_filter_luma10_strong_avx:                   30.3 ( 5.33x)
hevc_h_loop_filter_luma10_weak_c:                       23.5 ( 1.00x)
hevc_h_loop_filter_luma10_weak_sse2:                     6.6 ( 3.58x)
hevc_h_loop_filter_luma10_weak_ssse3:                    6.1 ( 3.85x)
hevc_h_loop_filter_luma10_weak_avx:                      5.4 ( 4.35x)
hevc_h_loop_filter_luma12_skip_c:                       18.8 ( 1.00x)
hevc_h_loop_filter_luma12_skip_sse2:                     6.6 ( 2.87x)
hevc_h_loop_filter_luma12_skip_ssse3:                    6.1 ( 3.08x)
hevc_h_loop_filter_luma12_skip_avx:                      6.2 ( 3.06x)
hevc_h_loop_filter_luma12_strong_c:                    159.0 ( 1.00x)
hevc_h_loop_filter_luma12_strong_sse2:                  36.3 ( 4.38x)
hevc_h_loop_filter_luma12_strong_ssse3:                 36.1 ( 4.40x)
hevc_h_loop_filter_luma12_strong_avx:                   33.5 ( 4.75x)
hevc_h_loop_filter_luma12_weak_c:                       40.1 ( 1.00x)
hevc_h_loop_filter_luma12_weak_sse2:                    35.5 ( 1.13x)
hevc_h_loop_filter_luma12_weak_ssse3:                   36.1 ( 1.11x)
hevc_h_loop_filter_luma12_weak_avx:                      6.2 ( 6.52x)
hevc_v_loop_filter_luma8_skip_c:                        25.5 ( 1.00x)
hevc_v_loop_filter_luma8_skip_sse2:                     10.6 ( 2.40x)
hevc_v_loop_filter_luma8_skip_ssse3:                    11.4 ( 2.24x)
hevc_v_loop_filter_luma8_skip_avx:                       8.3 ( 3.07x)
hevc_v_loop_filter_luma8_strong_c:                     146.8 ( 1.00x)
hevc_v_loop_filter_luma8_strong_sse2:                   43.9 ( 3.35x)
hevc_v_loop_filter_luma8_strong_ssse3:                  43.7 ( 3.36x)
hevc_v_loop_filter_luma8_strong_avx:                    42.3 ( 3.47x)
hevc_v_loop_filter_luma8_weak_c:                        25.5 ( 1.00x)
hevc_v_loop_filter_luma8_weak_sse2:                     10.6 ( 2.40x)
hevc_v_loop_filter_luma8_weak_ssse3:                    44.0 ( 0.58x)
hevc_v_loop_filter_luma8_weak_avx:                       8.3 ( 3.09x)
hevc_v_loop_filter_luma10_skip_c:                       20.0 ( 1.00x)
hevc_v_loop_filter_luma10_skip_sse2:                    11.3 ( 1.77x)
hevc_v_loop_filter_luma10_skip_ssse3:                   11.0 ( 1.82x)
hevc_v_loop_filter_luma10_skip_avx:                      9.3 ( 2.15x)
hevc_v_loop_filter_luma10_strong_c:                    193.5 ( 1.00x)
hevc_v_loop_filter_luma10_strong_sse2:                  46.1 ( 4.19x)
hevc_v_loop_filter_luma10_strong_ssse3:                 44.2 ( 4.38x)
hevc_v_loop_filter_luma10_strong_avx:                   44.4 ( 4.35x)
hevc_v_loop_filter_luma10_weak_c:                       90.3 ( 1.00x)
hevc_v_loop_filter_luma10_weak_sse2:                    46.3 ( 1.95x)
hevc_v_loop_filter_luma10_weak_ssse3:                   10.8 ( 8.37x)
hevc_v_loop_filter_luma10_weak_avx:                     44.4 ( 2.03x)
hevc_v_loop_filter_luma12_skip_c:                       16.8 ( 1.00x)
hevc_v_loop_filter_luma12_skip_sse2:                    11.8 ( 1.42x)
hevc_v_loop_filter_luma12_skip_ssse3:                   11.7 ( 1.43x)
hevc_v_loop_filter_luma12_skip_avx:                      8.7 ( 1.93x)
hevc_v_loop_filter_luma12_strong_c:                    159.3 ( 1.00x)
hevc_v_loop_filter_luma12_strong_sse2:                  45.3 ( 3.52x)
hevc_v_loop_filter_luma12_strong_ssse3:                 60.3 ( 2.64x)
hevc_v_loop_filter_luma12_strong_avx:                   44.1 ( 3.61x)
hevc_v_loop_filter_luma12_weak_c:                       63.6 ( 1.00x)
hevc_v_loop_filter_luma12_weak_sse2:                    45.3 ( 1.40x)
hevc_v_loop_filter_luma12_weak_ssse3:                   11.7 ( 5.41x)
hevc_v_loop_filter_luma12_weak_avx:                     43.9 ( 1.45x)

New benchmarks:
hevc_h_loop_filter_luma8_skip_c:                        24.2 ( 1.00x)
hevc_h_loop_filter_luma8_skip_sse2:                      8.6 ( 2.82x)
hevc_h_loop_filter_luma8_skip_ssse3:                     7.0 ( 3.46x)
hevc_h_loop_filter_luma8_skip_avx:                       6.8 ( 3.54x)
hevc_h_loop_filter_luma8_strong_c:                     150.4 ( 1.00x)
hevc_h_loop_filter_luma8_strong_sse2:                   33.3 ( 4.52x)
hevc_h_loop_filter_luma8_strong_ssse3:                  32.7 ( 4.61x)
hevc_h_loop_filter_luma8_strong_avx:                    32.7 ( 4.60x)
hevc_h_loop_filter_luma8_weak_c:                       104.0 ( 1.00x)
hevc_h_loop_filter_luma8_weak_sse2:                     33.2 ( 3.13x)
hevc_h_loop_filter_luma8_weak_ssse3:                     7.0 (14.91x)
hevc_h_loop_filter_luma8_weak_avx:                      31.3 ( 3.32x)
hevc_h_loop_filter_luma10_skip_c:                       19.2 ( 1.00x)
hevc_h_loop_filter_luma10_skip_sse2:                     6.2 ( 3.08x)
hevc_h_loop_filter_luma10_skip_ssse3:                    6.2 ( 3.08x)
hevc_h_loop_filter_luma10_skip_avx:                      5.0 ( 3.85x)
hevc_h_loop_filter_luma10_strong_c:                    159.8 ( 1.00x)
hevc_h_loop_filter_luma10_strong_sse2:                  30.0 ( 5.32x)
hevc_h_loop_filter_luma10_strong_ssse3:                 29.2 ( 5.48x)
hevc_h_loop_filter_luma10_strong_avx:                   28.6 ( 5.58x)
hevc_h_loop_filter_luma10_weak_c:                       19.2 ( 1.00x)
hevc_h_loop_filter_luma10_weak_sse2:                     6.2 ( 3.09x)
hevc_h_loop_filter_luma10_weak_ssse3:                    6.2 ( 3.09x)
hevc_h_loop_filter_luma10_weak_avx:                      5.0 ( 3.88x)
hevc_h_loop_filter_luma12_skip_c:                       18.7 ( 1.00x)
hevc_h_loop_filter_luma12_skip_sse2:                     6.2 ( 3.00x)
hevc_h_loop_filter_luma12_skip_ssse3:                    5.7 ( 3.27x)
hevc_h_loop_filter_luma12_skip_avx:                      5.2 ( 3.61x)
hevc_h_loop_filter_luma12_strong_c:                    160.2 ( 1.00x)
hevc_h_loop_filter_luma12_strong_sse2:                  34.2 ( 4.68x)
hevc_h_loop_filter_luma12_strong_ssse3:                 29.3 ( 5.48x)
hevc_h_loop_filter_luma12_strong_avx:                   31.4 ( 5.10x)
hevc_h_loop_filter_luma12_weak_c:                       40.2 ( 1.00x)
hevc_h_loop_filter_luma12_weak_sse2:                    35.2 ( 1.14x)
hevc_h_loop_filter_luma12_weak_ssse3:                   29.3 ( 1.37x)
hevc_h_loop_filter_luma12_weak_avx:                      5.0 ( 8.09x)
hevc_v_loop_filter_luma8_skip_c:                        25.6 ( 1.00x)
hevc_v_loop_filter_luma8_skip_sse2:                     10.2 ( 2.52x)
hevc_v_loop_filter_luma8_skip_ssse3:                    10.5 ( 2.45x)
hevc_v_loop_filter_luma8_skip_avx:                       8.2 ( 3.11x)
hevc_v_loop_filter_luma8_strong_c:                     147.1 ( 1.00x)
hevc_v_loop_filter_luma8_strong_sse2:                   42.6 ( 3.45x)
hevc_v_loop_filter_luma8_strong_ssse3:                  42.4 ( 3.47x)
hevc_v_loop_filter_luma8_strong_avx:                    40.1 ( 3.67x)
hevc_v_loop_filter_luma8_weak_c:                        25.6 ( 1.00x)
hevc_v_loop_filter_luma8_weak_sse2:                     10.6 ( 2.42x)
hevc_v_loop_filter_luma8_weak_ssse3:                    42.7 ( 0.60x)
hevc_v_loop_filter_luma8_weak_avx:                       8.2 ( 3.11x)
hevc_v_loop_filter_luma10_skip_c:                       16.7 ( 1.00x)
hevc_v_loop_filter_luma10_skip_sse2:                    11.0 ( 1.52x)
hevc_v_loop_filter_luma10_skip_ssse3:                   10.5 ( 1.59x)
hevc_v_loop_filter_luma10_skip_avx:                      9.6 ( 1.74x)
hevc_v_loop_filter_luma10_strong_c:                    190.0 ( 1.00x)
hevc_v_loop_filter_luma10_strong_sse2:                  44.8 ( 4.24x)
hevc_v_loop_filter_luma10_strong_ssse3:                 42.3 ( 4.49x)
hevc_v_loop_filter_luma10_strong_avx:                   42.5 ( 4.47x)
hevc_v_loop_filter_luma10_weak_c:                       88.3 ( 1.00x)
hevc_v_loop_filter_luma10_weak_sse2:                    45.7 ( 1.93x)
hevc_v_loop_filter_luma10_weak_ssse3:                   10.5 ( 8.40x)
hevc_v_loop_filter_luma10_weak_avx:                     42.4 ( 2.09x)
hevc_v_loop_filter_luma12_skip_c:                       16.7 ( 1.00x)
hevc_v_loop_filter_luma12_skip_sse2:                    11.7 ( 1.42x)
hevc_v_loop_filter_luma12_skip_ssse3:                   10.5 ( 1.59x)
hevc_v_loop_filter_luma12_skip_avx:                      8.8 ( 1.90x)
hevc_v_loop_filter_luma12_strong_c:                    159.4 ( 1.00x)
hevc_v_loop_filter_luma12_strong_sse2:                  45.2 ( 3.53x)
hevc_v_loop_filter_luma12_strong_ssse3:                 59.3 ( 2.69x)
hevc_v_loop_filter_luma12_strong_avx:                   41.7 ( 3.82x)
hevc_v_loop_filter_luma12_weak_c:                       63.3 ( 1.00x)
hevc_v_loop_filter_luma12_weak_sse2:                    44.9 ( 1.41x)
hevc_v_loop_filter_luma12_weak_ssse3:                   10.5 ( 6.02x)
hevc_v_loop_filter_luma12_weak_avx:                     41.7 ( 1.52x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 11:54:57 +01:00
Andreas Rheinhardt
0843252229 avcodec/x86/hevc/deblock: avoid unused GPR
r12 is unused, so use it instead of r13 to reduce
the amount of push/pops.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 11:54:57 +01:00
Andreas Rheinhardt
0aad8b860a avcodec/x86/hevc/deblock: Avoid vmovdqa
(It would even be possible to avoid a clobbering m10 in
MASKED_COPY and the mask register (%3) in MASKED_COPY2
when VEX encoding is in use.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 11:54:57 +01:00
Andreas Rheinhardt
c940128fff avcodec/x86/vp9lpf: Avoid vmovdqa
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 11:54:57 +01:00
Andreas Rheinhardt
c898ddb8fe avcodec/x86/cfhddsp: Reduce number of xmm registers used
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 01:33:40 +01:00
Andreas Rheinhardt
848c3ca772 avcodec/x86/cfhddsp: Avoid pmaddwd
The result of using pmaddwd with the coefficients 1,-1,...,1,-1
is just the negative of using pmaddwd with the coefficients
-1,1,...,-1,1, so avoid one pmaddwd.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 01:33:37 +01:00
Andreas Rheinhardt
6224445753 avcodec/x86/cfhdencdsp: Avoid += x, -= x
Avoid incrementing lowq and highq inside the loop by using
complex addressing modes, avoiding to undo said modification
at the end of the horizontal loop.
For inputq, modify istrideq outside of the loop so that
it is only modified once at the end of the horizontal loop.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 01:33:34 +01:00
Andreas Rheinhardt
7dd6487800 avcodec/x86/cfhdencdsp: Don't load twice
Sign extend the integer arguments directly from the stack
instead of loading qwords, followed by sign-extending the
lower half.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 01:33:30 +01:00
Andreas Rheinhardt
91c7710412 avcodec/x86/cfhdencdsp: Avoid unnecessary constants
Up until now, cfhdencdsp used constants consisting
of -1, 1, ...,-1,1 words and 1, -1,...,1,-1 words
for use as constants in pmaddwd. But one can use
the same constants if one shuffles the words in
a dword the opposite order. Similarly for some other
constants. This also allowed to avoid a register in
chfdenc_vert_filter.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 01:33:23 +01:00
Andreas Rheinhardt
cd3d8116fb avcodec/x86/cfhdencdsp: Avoid load of -1
It can be easily generated at runtime.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-29 01:32:57 +01:00
Kasidis Arunruangsirilert
e9e8a32b29 avcodec/nvenc: add 4-way multi nvenc split frame encoding support 2026-01-27 12:58:46 +00:00
Diego de Souza
499b5f5f92 avcodec/nvenc: add b_adapt option for HEVC encoder
The b_adapt option allows users to control adaptive B-frame decision
when lookahead is enabled in HEVC encoding. This feature was already
available for H.264 and AV1 encoders, but was missing from HEVC.

Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
2026-01-27 12:58:08 +00:00
Andreas Rheinhardt
bf4d5037b4 avcodec/h264dsp: Remove redundant h264 from H264DSPCtx member names
These names are a remnant of dsputil when all the DSP functions
from all codecs were part of DSPcontext.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Reviewed-by: Sean McGovern <gseanmcg@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:25 +01:00
Andreas Rheinhardt
489aaf4e1c avcodec/x86/h264_deblock: Don't sign-extend stride
Unnecessary (and wrong) since d5d699ab6e.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
db66e057eb avcodec/x86/h264_deblock: Avoid reload
Old benchmarks:
h264_h_loop_filter_luma_8bpp_c:                         60.0 ( 1.00x)
h264_h_loop_filter_luma_8bpp_sse2:                      65.4 ( 0.92x)
h264_h_loop_filter_luma_8bpp_avx:                       65.3 ( 0.92x)

New benchmarks:
h264_h_loop_filter_luma_8bpp_c:                         60.4 ( 1.00x)
h264_h_loop_filter_luma_8bpp_sse2:                      62.0 ( 0.97x)
h264_h_loop_filter_luma_8bpp_avx:                       61.7 ( 0.98x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
8428a412bc avcodec/x86/h264_deblock: Avoid MMX in deblock_h_luma_8
Old benchmarks:
h264_h_loop_filter_luma_8bpp_c:                         59.9 ( 1.00x)
h264_h_loop_filter_luma_8bpp_sse2:                      67.9 ( 0.88x)
h264_h_loop_filter_luma_8bpp_avx:                       67.4 ( 0.89x)

New benchmarks:
h264_h_loop_filter_luma_8bpp_c:                         60.0 ( 1.00x)
h264_h_loop_filter_luma_8bpp_sse2:                      65.4 ( 0.92x)
h264_h_loop_filter_luma_8bpp_avx:                       65.3 ( 0.92x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
9882973935 avcodec/x86/h264_deblock: Avoid reloading constant
No change in benchmarks.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
eaaf45fd79 avcodec/x86/h264_deblock_10bit: Simplify r0+4*r1
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
aab0946eae avcodec/x86/h264_deblock_10bit: Remove mmxext functions
Now that the SSE2/AVX functions are no longer restricted
to those systems having an aligned stack, the MMXEXT functions
are always overridden (except for ancient systems without
SSE2), so remove them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
dbdf514c17 avcodec/x86/h264_deblock_10bit: Remove custom stack allocation code
Allocate it via cglobal as usual. This makes the SSE2/AVX functions
available when HAVE_ALIGNED_STACK is false; it also avoids
modifying rsp unnecessarily in the deblock_h_luma_intra_10 functions
on Win64.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
b1140d3c98 avcodec/x86/h264_deblock: Remove obsolete macro parameters
They are a remnant of the MMX functions (which processed
only eight pixels at a time, so that it was called twice
via a wrapper; the actual MMX function had "v8" in its name
instead of simply v) which have been removed in commit
4618f36a24.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
899475326b avcodec/x86/h264_deblock: Simplify splatting
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
a22149ab3d avcodec/x86/h264_deblock: Remove always-false branches
These functions are always called with alpha and beta > 0.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
982244818b avcodec/x86/h264_deblock: Remove unused macros
Forgotten in 4618f36a24.
Also remove a PASS8ROWS wrapper that seems to have been always
unused.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
6e65d1c945 avcodec/motion_est: Fix left shifts of negative numbers
Fixes ticket #21486.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:46:39 +01:00
Jun Zhao
8966101fa6 lavc/hevc: add aarch64 neon for 12-bit dequant
Implement NEON optimization for HEVC dequant at 12-bit depth.

For 12-bit: shift = 15 - 12 - log2_size = 3 - log2_size. When shift
is negative, we use shl (shift left) instead of srshr.

Performance benchmark on Apple M4:
./tests/checkasm/checkasm --test=hevc_dequant --bench
hevc_dequant_4x4_12_c:                                   9.9 ( 1.00x)
hevc_dequant_4x4_12_neon:                                5.7 ( 1.74x)

hevc_dequant_8x8_12_c:                                   1.7 ( 1.00x)
hevc_dequant_8x8_12_neon:                                1.3 ( 1.30x)

hevc_dequant_16x16_12_c:                               131.1 ( 1.00x)
hevc_dequant_16x16_12_neon:                              7.9 (16.52x)

hevc_dequant_32x32_12_c:                                69.7 ( 1.00x)
hevc_dequant_32x32_12_neon:                             28.4 ( 2.46x)

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-01-25 06:55:26 +00:00