Andreas Rheinhardt
aa483bc422
avcodec/x86/bswapdsp: Avoid aligned vs unaligned codepaths for AVX2
...
For modern cpus (like those supporting AVX2) loads and stores
using the unaligned versions of instructions are as fast
as aligned ones if the address is aligned, so remove
the aligned AVX2 version (and the alignment check) and just
use the unaligned one.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-27 18:25:43 +01:00
Andreas Rheinhardt
55afe49dd0
avcodec/x86/bswapdsp: combine shifting, avoid check for AVX2
...
This avoids a check and a shift if >=8 elements are processed;
it adds a check if < 8 elements are processed (which should
be rare).
No change in benchmarks here.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-27 18:25:31 +01:00
Andreas Rheinhardt
3e6fa5153e
avcodec/x86/bswapdsp: Avoid register copies
...
No change in benchmarks here.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-27 18:25:01 +01:00
Zhao Zhili
86d2fae59f
avcodec: use int instead of enum for AVOption fields
...
AVOption with AV_OPT_TYPE_INT assumes the field is int (4 bytes),
but enum size is implementation-defined and may be smaller.
This can cause memory corruption when AVOption writes 4 bytes
to a field that is only 1-2 bytes, potentially overwriting
adjacent struct members.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2026-02-26 11:40:09 +08:00
Ling, Edison
00d3417b71
avcodec/d3d12va_encode: Add H264 entropy coder parameter support
...
Add parameter `coder` for users to select entropy coding in D3D12 H264
encoding.
Named constants `cabac` (1) and `cavlc` (0) are supported.
Default is CABAC (1). If the driver does not support CABAC, a warning is
logged and encoding falls back to CAVLC.
Usage:
CABAC (default): `-coder cabac` or `-coder 1`
CAVLC: `-coder cavlc` or `-coder 0`
Sample command line:
```
ffmpeg -hwaccel d3d12va -hwaccel_output_format d3d12 -i input.mp4 -c:v h264_d3d12va -coder cavlc -y output.mp4
```
2026-02-26 02:19:21 +00:00
Werner Robitza
5ba2525c7a
avcodec/libsvtav1: enable 2-pass encoding
...
This patch enables two-pass encoding for libsvtav1 by implementing
support for AV_CODEC_FLAG_PASS1 and AV_CODEC_FLAG_PASS2.
Previously, users requiring two-pass encoding with SVT-AV1 had to use
the standalone SvtAv1EncApp tool. This patch allows 2-pass encoding
directly through FFmpeg.
Based on patch by Fredrik Lundkvist, with review feedback from James
Almer and Andreas Rheinhardt.
See: https://ffmpeg.org/pipermail/ffmpeg-devel/2024-May/327452.html
Changes:
- Use AV_BASE64_DECODE_SIZE macro for buffer size calculation
- Allocate own buffer for rc_stats_buffer (non-ownership pointer)
- Error handling with buffer cleanup
Signed-off-by: Werner Robitza <werner.robitza@gmail.com>
2026-02-25 16:43:53 +01:00
Andreas Rheinhardt
dc65dcec22
avcodec/vvc/inter: Combine offsets early
...
For bi-predicted weighted averages, only the sum
of the two offsets is ever used, so add the two early.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-25 12:08:33 +01:00
stevxiao
fc7c38f9da
avcodec/d3d12va_encode: add detailed ValidationFlags error reporting for video encoders check feature support
...
Improves error diagnostics for D3D12 video encoders check feature support by adding
detailed ValidationFlags reporting when driver validation fails.
This made it easy for users to identify which specific feature was
unsupported without manually decoding the flags.
Signed-off-by: younengxiao <steven.xiao@amd.com>
2026-02-25 08:47:14 +00:00
James Almer
145f6e5878
avcodec/cbs_h2645: split into separate files per module
...
This file is becoming too bloated and hard to read, so split it into separate
files, each having codec specific methods.
This will also speed up compilation when using several concurrent jobs.
Signed-off-by: James Almer <jamrial@gmail.com>
2026-02-24 10:32:20 -03:00
Michael Niedermayer
7761b8fbac
avcodec/imm5: Dont pass EAGAIN on as is
...
Fixes: Assertion consumed != (-(11)) failed at libavcodec/decode.c:465
Fixes: 471587358/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_IMM5_fuzzer-4737412376100864
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-23 23:58:11 +01:00
Michael Niedermayer
302f198ba5
avcodec/mjpegdec: Check for multiple exif
...
Fixes: memleak
Fixes: 477993717/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AMV_DEC_fuzzer-4515108431921152
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-23 23:52:37 +01:00
Michael Niedermayer
2ab23ec729
avcodec/interplayacm: Check input for fill_block()
...
Fixes: Timeout
Fixes: 476763877/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_INTERPLAY_ACM_fuzzer-4515681843609600
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-23 23:50:49 +01:00
Michael Niedermayer
538824fd84
avcodec/hdrdec: Check input size before buffer allocation
...
Fixes: Timeout
Fixes: 471948155/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HDR_DEC_fuzzer-5679690418552832
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-23 23:28:09 +01:00
Michael Niedermayer
55bb6e2646
avcodec/tmv: Move space check before buffer allocation
...
Fixes: Timeout
Fixes: 471664630/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TMV_fuzzer-5291752530706432
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-23 23:26:20 +01:00
Michael Niedermayer
4446dfb0e3
avcodec/flashsv: Check for input space before (re)allocating frame
...
Fixes: Timeout
Fixes: 471605680/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FLASHSV2_DEC_fuzzer-6210773459468288
Fixes: 471605920/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FLASHSV_DEC_fuzzer-6230719287590912
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-23 22:59:44 +01:00
Michael Niedermayer
40cafc25cf
avcodec/mdec: Check input space vs minimal block size
...
Fixes: Timeout
Fixes: 481006706/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MDEC_fuzzer-6122832651419648
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-23 22:54:38 +01:00
Michael Niedermayer
73681f888d
avcodec/h264_parser: Check remaining input length in loop in scan_mmco_reset()
...
Fixes: read of uninitialized memory
Fixes: 476177761/clusterfuzz-testcase-minimized-ffmpeg_dem_H264_fuzzer-6400884824408064
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-23 22:43:28 +01:00
Marvin Scholz
fba9fc0c6b
lavc: wmadec: limit variable scopes
...
Moves the loop variable declarations to the actual loops,
narrowing their scopes.
2026-02-23 15:29:27 +00:00
Marvin Scholz
d219be03d6
lavc: wmadec: assert channels count
...
This should never exceed MAX_CHANNELS, else there will be several
out of bounds writes.
2026-02-23 15:29:27 +00:00
Lynne
baad75cafa
aacdec_usac: add support for parsing Mpsp212 (MPEG surround)
...
This commit adds the full bitstream parsing for Mps212.
2026-02-23 07:57:57 +01:00
Lynne
86977fdb6b
aacdec_tab: add Mps212 tables
...
To be used in the following commit.
2026-02-23 07:57:57 +01:00
Lynne
a4ab4a98c4
aacdec_tab: split up tables init
2026-02-23 07:57:57 +01:00
Michael Niedermayer
7e10579f49
avcodec/exr: fix AVERROR typo
...
Fixes: out of array read
Fixes: 485866440/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_EXR_DEC_fuzzer-4520520419966976
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-23 01:44:49 +00:00
Andreas Rheinhardt
53a9a34e23
avcodec/snow: Reduce sizeof(SnowContext)
...
Each SubBand currently contains an array of 519 uint8_t[32],
yet most of these are unused: For both the decoder and the
encoder, at most 34 contexts are actually used: The only
variable index is context+2, where context is the result
of av_log2() and therefore in the 0..31 range.
There are also several accesses using compile-time indices,
the highest of which is 30. FATE passes with 31 contexts
and maybe these are enough, but I don't know.
Reducing the number to 34 reduces sizeof(SnowContext)
from 2141664B to 155104B here (on x64).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 22:05:16 +01:00
Andreas Rheinhardt
bb92009386
avcodec/snow: Only allocate emu_edge_buffer for encoder
...
Also allocate it during init and move it to the encoder's context.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 22:05:15 +01:00
Lynne
13e063ceec
vulkan/ffv1: properly initialize the linecache
2026-02-22 03:39:23 +01:00
Michael Niedermayer
99515a3342
avcodec/jpeg2000htdec: Check Lcup and Lref
...
Fixes: use of uninitialized memory
Fixes: 482494999/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_DEC_fuzzer-6467586186608640
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-22 02:31:06 +00:00
Andreas Rheinhardt
6c1c1720cf
avcodec/x86/vvc/dsp_init: Mark dsp init function as av_cold
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 01:05:12 +01:00
Andreas Rheinhardt
af3f8f5bd2
avcodec/x86/vvc/of: Break dependency chain
...
Don't extract and update one word of one and the same register
at a time; use separate src and dst registers, so that pextrw
and bsr can be done in parallel. Also use movd instead of pinsrw
for the first word.
Old benchmarks:
apply_bdof_8_8x16_c: 3275.2 ( 1.00x)
apply_bdof_8_8x16_avx2: 487.6 ( 6.72x)
apply_bdof_8_16x8_c: 3243.1 ( 1.00x)
apply_bdof_8_16x8_avx2: 284.4 (11.40x)
apply_bdof_8_16x16_c: 6501.8 ( 1.00x)
apply_bdof_8_16x16_avx2: 570.0 (11.41x)
apply_bdof_10_8x16_c: 3286.5 ( 1.00x)
apply_bdof_10_8x16_avx2: 461.7 ( 7.12x)
apply_bdof_10_16x8_c: 3274.5 ( 1.00x)
apply_bdof_10_16x8_avx2: 271.4 (12.06x)
apply_bdof_10_16x16_c: 6590.0 ( 1.00x)
apply_bdof_10_16x16_avx2: 543.9 (12.12x)
apply_bdof_12_8x16_c: 3307.6 ( 1.00x)
apply_bdof_12_8x16_avx2: 462.2 ( 7.16x)
apply_bdof_12_16x8_c: 3287.4 ( 1.00x)
apply_bdof_12_16x8_avx2: 271.8 (12.10x)
apply_bdof_12_16x16_c: 6465.7 ( 1.00x)
apply_bdof_12_16x16_avx2: 543.8 (11.89x)
New benchmarks:
apply_bdof_8_8x16_c: 3255.7 ( 1.00x)
apply_bdof_8_8x16_avx2: 349.3 ( 9.32x)
apply_bdof_8_16x8_c: 3262.5 ( 1.00x)
apply_bdof_8_16x8_avx2: 214.8 (15.19x)
apply_bdof_8_16x16_c: 6471.6 ( 1.00x)
apply_bdof_8_16x16_avx2: 429.8 (15.06x)
apply_bdof_10_8x16_c: 3227.7 ( 1.00x)
apply_bdof_10_8x16_avx2: 321.6 (10.04x)
apply_bdof_10_16x8_c: 3250.2 ( 1.00x)
apply_bdof_10_16x8_avx2: 201.2 (16.16x)
apply_bdof_10_16x16_c: 6476.5 ( 1.00x)
apply_bdof_10_16x16_avx2: 400.9 (16.16x)
apply_bdof_12_8x16_c: 3230.7 ( 1.00x)
apply_bdof_12_8x16_avx2: 321.8 (10.04x)
apply_bdof_12_16x8_c: 3210.5 ( 1.00x)
apply_bdof_12_16x8_avx2: 200.9 (15.98x)
apply_bdof_12_16x16_c: 6474.5 ( 1.00x)
apply_bdof_12_16x16_avx2: 400.2 (16.18x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 01:05:12 +01:00
Andreas Rheinhardt
19dc7b79a4
avcodec/x86/vvc/of: Unify shuffling
...
One can use the same shuffles for the width 8 and width 16
case if one also changes the permutation in vpermd (that always
follows pshufb for width 16).
This also allows to load it before checking width.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 01:03:22 +01:00
Andreas Rheinhardt
8e82416434
avcodec/x86/vvc/of: Avoid unused register
...
Avoids a push+pop.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 01:02:20 +01:00
Andreas Rheinhardt
81fb70c833
avcodec/x86/vvc/mc,dsp_init: Avoid pointless wrappers for w_avg
...
They only add overhead (in form of another function call,
sign-extending some parameters to 64bit (although the upper
bits are not used at all) and rederiving the actual number
of bits (from the maximum value (1<<bpp)-1)).
Old benchmarks:
w_avg_8_2x2_c: 16.4 ( 1.00x)
w_avg_8_2x2_avx2: 12.9 ( 1.27x)
w_avg_8_4x4_c: 48.0 ( 1.00x)
w_avg_8_4x4_avx2: 14.9 ( 3.23x)
w_avg_8_8x8_c: 168.2 ( 1.00x)
w_avg_8_8x8_avx2: 22.4 ( 7.49x)
w_avg_8_16x16_c: 396.5 ( 1.00x)
w_avg_8_16x16_avx2: 47.9 ( 8.28x)
w_avg_8_32x32_c: 1466.3 ( 1.00x)
w_avg_8_32x32_avx2: 172.8 ( 8.48x)
w_avg_8_64x64_c: 5629.3 ( 1.00x)
w_avg_8_64x64_avx2: 678.7 ( 8.29x)
w_avg_8_128x128_c: 22122.4 ( 1.00x)
w_avg_8_128x128_avx2: 2743.5 ( 8.06x)
w_avg_10_2x2_c: 18.7 ( 1.00x)
w_avg_10_2x2_avx2: 13.1 ( 1.43x)
w_avg_10_4x4_c: 50.3 ( 1.00x)
w_avg_10_4x4_avx2: 15.9 ( 3.17x)
w_avg_10_8x8_c: 109.3 ( 1.00x)
w_avg_10_8x8_avx2: 20.6 ( 5.30x)
w_avg_10_16x16_c: 395.5 ( 1.00x)
w_avg_10_16x16_avx2: 44.8 ( 8.83x)
w_avg_10_32x32_c: 1534.2 ( 1.00x)
w_avg_10_32x32_avx2: 141.4 (10.85x)
w_avg_10_64x64_c: 6003.6 ( 1.00x)
w_avg_10_64x64_avx2: 557.4 (10.77x)
w_avg_10_128x128_c: 23722.7 ( 1.00x)
w_avg_10_128x128_avx2: 2205.0 (10.76x)
w_avg_12_2x2_c: 18.6 ( 1.00x)
w_avg_12_2x2_avx2: 13.1 ( 1.42x)
w_avg_12_4x4_c: 52.2 ( 1.00x)
w_avg_12_4x4_avx2: 16.1 ( 3.24x)
w_avg_12_8x8_c: 109.2 ( 1.00x)
w_avg_12_8x8_avx2: 20.6 ( 5.29x)
w_avg_12_16x16_c: 396.1 ( 1.00x)
w_avg_12_16x16_avx2: 45.0 ( 8.81x)
w_avg_12_32x32_c: 1532.6 ( 1.00x)
w_avg_12_32x32_avx2: 142.1 (10.79x)
w_avg_12_64x64_c: 6002.2 ( 1.00x)
w_avg_12_64x64_avx2: 557.3 (10.77x)
w_avg_12_128x128_c: 23748.7 ( 1.00x)
w_avg_12_128x128_avx2: 2206.4 (10.76x)
New benchmarks:
w_avg_8_2x2_c: 16.0 ( 1.00x)
w_avg_8_2x2_avx2: 9.3 ( 1.71x)
w_avg_8_4x4_c: 48.4 ( 1.00x)
w_avg_8_4x4_avx2: 12.4 ( 3.91x)
w_avg_8_8x8_c: 168.7 ( 1.00x)
w_avg_8_8x8_avx2: 21.1 ( 8.00x)
w_avg_8_16x16_c: 394.5 ( 1.00x)
w_avg_8_16x16_avx2: 46.2 ( 8.54x)
w_avg_8_32x32_c: 1456.3 ( 1.00x)
w_avg_8_32x32_avx2: 171.8 ( 8.48x)
w_avg_8_64x64_c: 5636.2 ( 1.00x)
w_avg_8_64x64_avx2: 676.9 ( 8.33x)
w_avg_8_128x128_c: 22129.1 ( 1.00x)
w_avg_8_128x128_avx2: 2734.3 ( 8.09x)
w_avg_10_2x2_c: 18.7 ( 1.00x)
w_avg_10_2x2_avx2: 10.3 ( 1.82x)
w_avg_10_4x4_c: 50.8 ( 1.00x)
w_avg_10_4x4_avx2: 13.4 ( 3.79x)
w_avg_10_8x8_c: 109.7 ( 1.00x)
w_avg_10_8x8_avx2: 20.4 ( 5.38x)
w_avg_10_16x16_c: 395.2 ( 1.00x)
w_avg_10_16x16_avx2: 41.7 ( 9.48x)
w_avg_10_32x32_c: 1535.6 ( 1.00x)
w_avg_10_32x32_avx2: 137.9 (11.13x)
w_avg_10_64x64_c: 6002.1 ( 1.00x)
w_avg_10_64x64_avx2: 548.5 (10.94x)
w_avg_10_128x128_c: 23742.7 ( 1.00x)
w_avg_10_128x128_avx2: 2179.8 (10.89x)
w_avg_12_2x2_c: 18.9 ( 1.00x)
w_avg_12_2x2_avx2: 10.3 ( 1.84x)
w_avg_12_4x4_c: 52.4 ( 1.00x)
w_avg_12_4x4_avx2: 13.4 ( 3.91x)
w_avg_12_8x8_c: 109.2 ( 1.00x)
w_avg_12_8x8_avx2: 20.3 ( 5.39x)
w_avg_12_16x16_c: 396.3 ( 1.00x)
w_avg_12_16x16_avx2: 41.7 ( 9.51x)
w_avg_12_32x32_c: 1532.6 ( 1.00x)
w_avg_12_32x32_avx2: 138.6 (11.06x)
w_avg_12_64x64_c: 5996.7 ( 1.00x)
w_avg_12_64x64_avx2: 549.6 (10.91x)
w_avg_12_128x128_c: 23738.0 ( 1.00x)
w_avg_12_128x128_avx2: 2177.2 (10.90x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 01:01:27 +01:00
Andreas Rheinhardt
ea78402e9c
avcodec/x86/vvc/mc,dsp_init: Avoid pointless wrappers for avg
...
Up until now, there were two averaging assembly functions,
one for eight bit content and one for <=16 bit content;
there are also three C-wrappers around these functions,
for 8, 10 and 12 bpp. These wrappers simply forward the
maximum permissible value (i.e. (1<<bpp)-1) and promote
some integer values to ptrdiff_t.
Yet these wrappers are absolutely useless: The assembly functions
rederive the bpp from the maximum and only the integer part
of the promoted ptrdiff_t values is ever used. Of course,
these wrappers also entail an additional call (not a tail call,
because the additional maximum parameter is passed on the stack).
Remove the wrappers and add per-bpp assembly functions instead.
Given that the only difference between 10 and 12 bits are some
constants in registers, the main part of these functions can be
shared (given that this code uses a jumptable, it can even
be done without adding any additional jump).
Old benchmarks:
avg_8_2x2_c: 11.4 ( 1.00x)
avg_8_2x2_avx2: 7.9 ( 1.44x)
avg_8_4x4_c: 30.7 ( 1.00x)
avg_8_4x4_avx2: 10.4 ( 2.95x)
avg_8_8x8_c: 134.5 ( 1.00x)
avg_8_8x8_avx2: 16.6 ( 8.12x)
avg_8_16x16_c: 255.6 ( 1.00x)
avg_8_16x16_avx2: 28.2 ( 9.07x)
avg_8_32x32_c: 897.7 ( 1.00x)
avg_8_32x32_avx2: 83.9 (10.70x)
avg_8_64x64_c: 3320.0 ( 1.00x)
avg_8_64x64_avx2: 321.1 (10.34x)
avg_8_128x128_c: 12981.8 ( 1.00x)
avg_8_128x128_avx2: 1480.1 ( 8.77x)
avg_10_2x2_c: 12.0 ( 1.00x)
avg_10_2x2_avx2: 8.4 ( 1.43x)
avg_10_4x4_c: 34.9 ( 1.00x)
avg_10_4x4_avx2: 9.8 ( 3.56x)
avg_10_8x8_c: 76.8 ( 1.00x)
avg_10_8x8_avx2: 15.1 ( 5.08x)
avg_10_16x16_c: 256.6 ( 1.00x)
avg_10_16x16_avx2: 25.1 (10.20x)
avg_10_32x32_c: 932.9 ( 1.00x)
avg_10_32x32_avx2: 73.4 (12.72x)
avg_10_64x64_c: 3517.9 ( 1.00x)
avg_10_64x64_avx2: 414.8 ( 8.48x)
avg_10_128x128_c: 13695.3 ( 1.00x)
avg_10_128x128_avx2: 1648.1 ( 8.31x)
avg_12_2x2_c: 13.1 ( 1.00x)
avg_12_2x2_avx2: 8.6 ( 1.53x)
avg_12_4x4_c: 35.4 ( 1.00x)
avg_12_4x4_avx2: 10.1 ( 3.49x)
avg_12_8x8_c: 76.6 ( 1.00x)
avg_12_8x8_avx2: 16.7 ( 4.60x)
avg_12_16x16_c: 256.6 ( 1.00x)
avg_12_16x16_avx2: 25.5 (10.07x)
avg_12_32x32_c: 933.2 ( 1.00x)
avg_12_32x32_avx2: 75.7 (12.34x)
avg_12_64x64_c: 3519.1 ( 1.00x)
avg_12_64x64_avx2: 416.8 ( 8.44x)
avg_12_128x128_c: 13695.1 ( 1.00x)
avg_12_128x128_avx2: 1651.6 ( 8.29x)
New benchmarks:
avg_8_2x2_c: 11.5 ( 1.00x)
avg_8_2x2_avx2: 6.0 ( 1.91x)
avg_8_4x4_c: 29.7 ( 1.00x)
avg_8_4x4_avx2: 8.0 ( 3.72x)
avg_8_8x8_c: 131.4 ( 1.00x)
avg_8_8x8_avx2: 12.2 (10.74x)
avg_8_16x16_c: 254.3 ( 1.00x)
avg_8_16x16_avx2: 24.8 (10.25x)
avg_8_32x32_c: 897.7 ( 1.00x)
avg_8_32x32_avx2: 77.8 (11.54x)
avg_8_64x64_c: 3321.3 ( 1.00x)
avg_8_64x64_avx2: 318.7 (10.42x)
avg_8_128x128_c: 12988.4 ( 1.00x)
avg_8_128x128_avx2: 1430.1 ( 9.08x)
avg_10_2x2_c: 12.1 ( 1.00x)
avg_10_2x2_avx2: 5.7 ( 2.13x)
avg_10_4x4_c: 35.0 ( 1.00x)
avg_10_4x4_avx2: 9.0 ( 3.88x)
avg_10_8x8_c: 77.2 ( 1.00x)
avg_10_8x8_avx2: 12.4 ( 6.24x)
avg_10_16x16_c: 256.2 ( 1.00x)
avg_10_16x16_avx2: 24.3 (10.56x)
avg_10_32x32_c: 932.9 ( 1.00x)
avg_10_32x32_avx2: 71.9 (12.97x)
avg_10_64x64_c: 3516.8 ( 1.00x)
avg_10_64x64_avx2: 414.7 ( 8.48x)
avg_10_128x128_c: 13693.7 ( 1.00x)
avg_10_128x128_avx2: 1609.3 ( 8.51x)
avg_12_2x2_c: 14.1 ( 1.00x)
avg_12_2x2_avx2: 5.7 ( 2.48x)
avg_12_4x4_c: 35.8 ( 1.00x)
avg_12_4x4_avx2: 9.0 ( 3.96x)
avg_12_8x8_c: 76.9 ( 1.00x)
avg_12_8x8_avx2: 12.4 ( 6.22x)
avg_12_16x16_c: 256.5 ( 1.00x)
avg_12_16x16_avx2: 24.4 (10.50x)
avg_12_32x32_c: 934.1 ( 1.00x)
avg_12_32x32_avx2: 72.0 (12.97x)
avg_12_64x64_c: 3518.2 ( 1.00x)
avg_12_64x64_avx2: 414.8 ( 8.48x)
avg_12_128x128_c: 13689.5 ( 1.00x)
avg_12_128x128_avx2: 1611.1 ( 8.50x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 00:58:33 +01:00
Andreas Rheinhardt
5a60b3f1a6
avcodec/x86/vvc/mc: Remove always-false branches
...
The C versions of the average and weighted average functions
contains "FFMAX(3, 15 - BIT_DEPTH)" and the code here followed
this; yet it is only instantiated for bit depths 8, 10 and 12,
for which the above is just 15-BIT_DEPTH. So the comparisons
are unnecessary.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
59f8ff4c18
avcodec/x86/vvc/mc: Remove unused constants
...
Also avoid overaligning .rodata.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
eabf52e787
avcodec/x86/vvc/mc: Avoid unused work
...
The high quadword of these registers is zero for width 2.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
9317fb2b2e
avcodec/x86/vvc/mc: Avoid ymm registers where possible
...
Widths 2 and 4 fit into xmm registers.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
caa0ae0cfb
avcodec/x86/vvc/mc: Avoid pextr[dq], v{insert,extract}i128
...
Use mov[dq], movdqu instead if the least significant parts
are set (i.e. if the immediate value is 0x0).
Old benchmarks:
avg_8_2x2_c: 11.3 ( 1.00x)
avg_8_2x2_avx2: 7.5 ( 1.50x)
avg_8_4x4_c: 31.2 ( 1.00x)
avg_8_4x4_avx2: 10.7 ( 2.91x)
avg_8_8x8_c: 133.5 ( 1.00x)
avg_8_8x8_avx2: 21.2 ( 6.30x)
avg_8_16x16_c: 254.7 ( 1.00x)
avg_8_16x16_avx2: 30.1 ( 8.46x)
avg_8_32x32_c: 896.9 ( 1.00x)
avg_8_32x32_avx2: 103.9 ( 8.63x)
avg_8_64x64_c: 3320.7 ( 1.00x)
avg_8_64x64_avx2: 539.4 ( 6.16x)
avg_8_128x128_c: 12991.5 ( 1.00x)
avg_8_128x128_avx2: 1661.3 ( 7.82x)
avg_10_2x2_c: 21.3 ( 1.00x)
avg_10_2x2_avx2: 8.3 ( 2.55x)
avg_10_4x4_c: 34.9 ( 1.00x)
avg_10_4x4_avx2: 10.6 ( 3.28x)
avg_10_8x8_c: 76.3 ( 1.00x)
avg_10_8x8_avx2: 20.2 ( 3.77x)
avg_10_16x16_c: 255.9 ( 1.00x)
avg_10_16x16_avx2: 24.1 (10.60x)
avg_10_32x32_c: 932.4 ( 1.00x)
avg_10_32x32_avx2: 73.3 (12.72x)
avg_10_64x64_c: 3516.4 ( 1.00x)
avg_10_64x64_avx2: 601.7 ( 5.84x)
avg_10_128x128_c: 13690.6 ( 1.00x)
avg_10_128x128_avx2: 1613.2 ( 8.49x)
avg_12_2x2_c: 14.0 ( 1.00x)
avg_12_2x2_avx2: 8.3 ( 1.67x)
avg_12_4x4_c: 35.3 ( 1.00x)
avg_12_4x4_avx2: 10.9 ( 3.26x)
avg_12_8x8_c: 76.5 ( 1.00x)
avg_12_8x8_avx2: 20.3 ( 3.77x)
avg_12_16x16_c: 256.7 ( 1.00x)
avg_12_16x16_avx2: 24.1 (10.63x)
avg_12_32x32_c: 932.5 ( 1.00x)
avg_12_32x32_avx2: 73.3 (12.72x)
avg_12_64x64_c: 3520.5 ( 1.00x)
avg_12_64x64_avx2: 602.6 ( 5.84x)
avg_12_128x128_c: 13689.6 ( 1.00x)
avg_12_128x128_avx2: 1613.1 ( 8.49x)
w_avg_8_2x2_c: 16.7 ( 1.00x)
w_avg_8_2x2_avx2: 13.4 ( 1.25x)
w_avg_8_4x4_c: 44.5 ( 1.00x)
w_avg_8_4x4_avx2: 15.9 ( 2.81x)
w_avg_8_8x8_c: 166.1 ( 1.00x)
w_avg_8_8x8_avx2: 45.7 ( 3.63x)
w_avg_8_16x16_c: 392.9 ( 1.00x)
w_avg_8_16x16_avx2: 57.8 ( 6.80x)
w_avg_8_32x32_c: 1455.5 ( 1.00x)
w_avg_8_32x32_avx2: 215.0 ( 6.77x)
w_avg_8_64x64_c: 5621.8 ( 1.00x)
w_avg_8_64x64_avx2: 875.2 ( 6.42x)
w_avg_8_128x128_c: 22131.3 ( 1.00x)
w_avg_8_128x128_avx2: 3390.1 ( 6.53x)
w_avg_10_2x2_c: 18.0 ( 1.00x)
w_avg_10_2x2_avx2: 14.0 ( 1.28x)
w_avg_10_4x4_c: 53.9 ( 1.00x)
w_avg_10_4x4_avx2: 15.9 ( 3.40x)
w_avg_10_8x8_c: 109.5 ( 1.00x)
w_avg_10_8x8_avx2: 40.4 ( 2.71x)
w_avg_10_16x16_c: 395.7 ( 1.00x)
w_avg_10_16x16_avx2: 44.7 ( 8.86x)
w_avg_10_32x32_c: 1532.7 ( 1.00x)
w_avg_10_32x32_avx2: 142.4 (10.77x)
w_avg_10_64x64_c: 6007.7 ( 1.00x)
w_avg_10_64x64_avx2: 745.5 ( 8.06x)
w_avg_10_128x128_c: 23719.7 ( 1.00x)
w_avg_10_128x128_avx2: 2217.7 (10.70x)
w_avg_12_2x2_c: 18.9 ( 1.00x)
w_avg_12_2x2_avx2: 13.6 ( 1.38x)
w_avg_12_4x4_c: 47.5 ( 1.00x)
w_avg_12_4x4_avx2: 15.9 ( 2.99x)
w_avg_12_8x8_c: 109.3 ( 1.00x)
w_avg_12_8x8_avx2: 40.9 ( 2.67x)
w_avg_12_16x16_c: 395.6 ( 1.00x)
w_avg_12_16x16_avx2: 44.8 ( 8.84x)
w_avg_12_32x32_c: 1531.0 ( 1.00x)
w_avg_12_32x32_avx2: 141.8 (10.80x)
w_avg_12_64x64_c: 6016.7 ( 1.00x)
w_avg_12_64x64_avx2: 732.8 ( 8.21x)
w_avg_12_128x128_c: 23762.2 ( 1.00x)
w_avg_12_128x128_avx2: 2223.4 (10.69x)
New benchmarks:
avg_8_2x2_c: 11.3 ( 1.00x)
avg_8_2x2_avx2: 7.6 ( 1.49x)
avg_8_4x4_c: 31.2 ( 1.00x)
avg_8_4x4_avx2: 10.8 ( 2.89x)
avg_8_8x8_c: 131.6 ( 1.00x)
avg_8_8x8_avx2: 15.6 ( 8.42x)
avg_8_16x16_c: 255.3 ( 1.00x)
avg_8_16x16_avx2: 27.9 ( 9.16x)
avg_8_32x32_c: 897.9 ( 1.00x)
avg_8_32x32_avx2: 81.2 (11.06x)
avg_8_64x64_c: 3320.0 ( 1.00x)
avg_8_64x64_avx2: 335.1 ( 9.91x)
avg_8_128x128_c: 12999.1 ( 1.00x)
avg_8_128x128_avx2: 1456.3 ( 8.93x)
avg_10_2x2_c: 12.0 ( 1.00x)
avg_10_2x2_avx2: 8.6 ( 1.40x)
avg_10_4x4_c: 34.9 ( 1.00x)
avg_10_4x4_avx2: 9.7 ( 3.61x)
avg_10_8x8_c: 76.7 ( 1.00x)
avg_10_8x8_avx2: 16.3 ( 4.69x)
avg_10_16x16_c: 256.3 ( 1.00x)
avg_10_16x16_avx2: 25.2 (10.18x)
avg_10_32x32_c: 932.8 ( 1.00x)
avg_10_32x32_avx2: 73.3 (12.72x)
avg_10_64x64_c: 3518.8 ( 1.00x)
avg_10_64x64_avx2: 416.8 ( 8.44x)
avg_10_128x128_c: 13691.6 ( 1.00x)
avg_10_128x128_avx2: 1612.9 ( 8.49x)
avg_12_2x2_c: 14.1 ( 1.00x)
avg_12_2x2_avx2: 8.7 ( 1.62x)
avg_12_4x4_c: 35.7 ( 1.00x)
avg_12_4x4_avx2: 9.7 ( 3.68x)
avg_12_8x8_c: 77.0 ( 1.00x)
avg_12_8x8_avx2: 16.9 ( 4.57x)
avg_12_16x16_c: 256.2 ( 1.00x)
avg_12_16x16_avx2: 25.7 ( 9.96x)
avg_12_32x32_c: 933.5 ( 1.00x)
avg_12_32x32_avx2: 74.0 (12.62x)
avg_12_64x64_c: 3516.4 ( 1.00x)
avg_12_64x64_avx2: 408.7 ( 8.60x)
avg_12_128x128_c: 13691.6 ( 1.00x)
avg_12_128x128_avx2: 1613.8 ( 8.48x)
w_avg_8_2x2_c: 16.7 ( 1.00x)
w_avg_8_2x2_avx2: 14.0 ( 1.19x)
w_avg_8_4x4_c: 48.2 ( 1.00x)
w_avg_8_4x4_avx2: 16.1 ( 3.00x)
w_avg_8_8x8_c: 168.0 ( 1.00x)
w_avg_8_8x8_avx2: 22.5 ( 7.47x)
w_avg_8_16x16_c: 392.5 ( 1.00x)
w_avg_8_16x16_avx2: 47.9 ( 8.19x)
w_avg_8_32x32_c: 1453.7 ( 1.00x)
w_avg_8_32x32_avx2: 176.1 ( 8.26x)
w_avg_8_64x64_c: 5631.4 ( 1.00x)
w_avg_8_64x64_avx2: 690.8 ( 8.15x)
w_avg_8_128x128_c: 22139.5 ( 1.00x)
w_avg_8_128x128_avx2: 2742.4 ( 8.07x)
w_avg_10_2x2_c: 18.1 ( 1.00x)
w_avg_10_2x2_avx2: 13.8 ( 1.31x)
w_avg_10_4x4_c: 47.0 ( 1.00x)
w_avg_10_4x4_avx2: 16.4 ( 2.87x)
w_avg_10_8x8_c: 110.0 ( 1.00x)
w_avg_10_8x8_avx2: 21.6 ( 5.09x)
w_avg_10_16x16_c: 395.2 ( 1.00x)
w_avg_10_16x16_avx2: 45.4 ( 8.71x)
w_avg_10_32x32_c: 1533.8 ( 1.00x)
w_avg_10_32x32_avx2: 142.6 (10.76x)
w_avg_10_64x64_c: 6004.4 ( 1.00x)
w_avg_10_64x64_avx2: 672.8 ( 8.92x)
w_avg_10_128x128_c: 23748.5 ( 1.00x)
w_avg_10_128x128_avx2: 2198.0 (10.80x)
w_avg_12_2x2_c: 17.2 ( 1.00x)
w_avg_12_2x2_avx2: 13.9 ( 1.24x)
w_avg_12_4x4_c: 51.4 ( 1.00x)
w_avg_12_4x4_avx2: 16.5 ( 3.11x)
w_avg_12_8x8_c: 109.1 ( 1.00x)
w_avg_12_8x8_avx2: 22.0 ( 4.96x)
w_avg_12_16x16_c: 395.9 ( 1.00x)
w_avg_12_16x16_avx2: 44.9 ( 8.81x)
w_avg_12_32x32_c: 1533.5 ( 1.00x)
w_avg_12_32x32_avx2: 142.3 (10.78x)
w_avg_12_64x64_c: 6002.0 ( 1.00x)
w_avg_12_64x64_avx2: 557.5 (10.77x)
w_avg_12_128x128_c: 23749.5 ( 1.00x)
w_avg_12_128x128_avx2: 2202.0 (10.79x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
7bf9c1e3f6
avcodec/x86/vvc/mc: Avoid redundant clipping for 8bit
...
It is already done by packuswb.
Old benchmarks:
avg_8_2x2_c: 11.1 ( 1.00x)
avg_8_2x2_avx2: 8.6 ( 1.28x)
avg_8_4x4_c: 30.0 ( 1.00x)
avg_8_4x4_avx2: 10.8 ( 2.78x)
avg_8_8x8_c: 132.0 ( 1.00x)
avg_8_8x8_avx2: 25.7 ( 5.14x)
avg_8_16x16_c: 254.6 ( 1.00x)
avg_8_16x16_avx2: 33.2 ( 7.67x)
avg_8_32x32_c: 897.5 ( 1.00x)
avg_8_32x32_avx2: 115.6 ( 7.76x)
avg_8_64x64_c: 3316.9 ( 1.00x)
avg_8_64x64_avx2: 626.5 ( 5.29x)
avg_8_128x128_c: 12973.6 ( 1.00x)
avg_8_128x128_avx2: 1914.0 ( 6.78x)
w_avg_8_2x2_c: 16.7 ( 1.00x)
w_avg_8_2x2_avx2: 14.4 ( 1.16x)
w_avg_8_4x4_c: 48.2 ( 1.00x)
w_avg_8_4x4_avx2: 16.5 ( 2.92x)
w_avg_8_8x8_c: 168.1 ( 1.00x)
w_avg_8_8x8_avx2: 49.7 ( 3.38x)
w_avg_8_16x16_c: 392.4 ( 1.00x)
w_avg_8_16x16_avx2: 61.1 ( 6.43x)
w_avg_8_32x32_c: 1455.3 ( 1.00x)
w_avg_8_32x32_avx2: 224.6 ( 6.48x)
w_avg_8_64x64_c: 5632.1 ( 1.00x)
w_avg_8_64x64_avx2: 896.9 ( 6.28x)
w_avg_8_128x128_c: 22136.3 ( 1.00x)
w_avg_8_128x128_avx2: 3626.7 ( 6.10x)
New benchmarks:
avg_8_2x2_c: 12.3 ( 1.00x)
avg_8_2x2_avx2: 8.1 ( 1.52x)
avg_8_4x4_c: 30.3 ( 1.00x)
avg_8_4x4_avx2: 11.3 ( 2.67x)
avg_8_8x8_c: 131.8 ( 1.00x)
avg_8_8x8_avx2: 21.3 ( 6.20x)
avg_8_16x16_c: 255.0 ( 1.00x)
avg_8_16x16_avx2: 30.6 ( 8.33x)
avg_8_32x32_c: 898.5 ( 1.00x)
avg_8_32x32_avx2: 104.9 ( 8.57x)
avg_8_64x64_c: 3317.7 ( 1.00x)
avg_8_64x64_avx2: 540.9 ( 6.13x)
avg_8_128x128_c: 12986.5 ( 1.00x)
avg_8_128x128_avx2: 1663.4 ( 7.81x)
w_avg_8_2x2_c: 16.8 ( 1.00x)
w_avg_8_2x2_avx2: 13.9 ( 1.21x)
w_avg_8_4x4_c: 48.2 ( 1.00x)
w_avg_8_4x4_avx2: 16.2 ( 2.98x)
w_avg_8_8x8_c: 168.6 ( 1.00x)
w_avg_8_8x8_avx2: 46.3 ( 3.64x)
w_avg_8_16x16_c: 392.4 ( 1.00x)
w_avg_8_16x16_avx2: 57.7 ( 6.80x)
w_avg_8_32x32_c: 1454.6 ( 1.00x)
w_avg_8_32x32_avx2: 214.6 ( 6.78x)
w_avg_8_64x64_c: 5638.4 ( 1.00x)
w_avg_8_64x64_avx2: 875.6 ( 6.44x)
w_avg_8_128x128_c: 22133.5 ( 1.00x)
w_avg_8_128x128_avx2: 3334.3 ( 6.64x)
Also saves 550B of .text here. The improvements will likely
be even better on Win64, because it avoids using two nonvolatile
registers in the weighted average case.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 00:57:56 +01:00
Michael Niedermayer
c98346ffaa
avcodec/libtheoraenc: make keyframe mask unsigned and handle its larger range
...
Fixes: left shift of 1 by 31 places cannot be represented in type 'int'
Fixes: 473579864/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_LIBTHEORA_fuzzer-5835688160591872
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-21 22:43:41 +00:00
Andreas Rheinhardt
3be4545b67
avcodec/vvc/inter: Deduplicate applying averaging
...
Reviewed-by: Frank Plowman <post@frankplowman.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-21 12:48:50 +01:00
Andreas Rheinhardt
324fd0bc46
avcodec/vvc/inter: Remove redundant variable, fix shadowing
...
Reviewed-by: Frank Plowman <post@frankplowman.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-21 12:48:50 +01:00
Andreas Rheinhardt
6777d5cd48
avcodec/vvc/inter: Remove always-false/true checks
...
derive_weight() is only called when pred_flag is PF_BI,
which only happens in B slices.
Reviewed-by: Frank Plowman <post@frankplowman.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-21 12:48:50 +01:00
Ramiro Polla
a0b55a0491
avcodec/mjpegdec: fix indentation and some white spaces
2026-02-20 16:32:10 +01:00
Ramiro Polla
0accfde281
avcodec/jpeglsdec: fix decoding of jpegls files with restart markers
2026-02-20 16:32:10 +01:00
Ramiro Polla
5672c410a6
avcodec/mjpegdec: unescape data for each restart marker individually
...
Instead of unescaping the entire image data buffer in advance, and then
having to perform heuristics to skip over where the restart markers
would have been, unescape the image data for each restart marker
individually.
2026-02-20 16:32:10 +01:00
Ramiro Polla
22771117a0
avcodec/mjpegdec: move get_bits_left() checks after handling of restart count
...
This commit doesn't really change much on its own, but it's helpful in
preparation for the following commit.
2026-02-20 16:32:10 +01:00
Ramiro Polla
3783b8f5e1
avcodec/mjpegdec: move vpred initialization out of loop in ljpeg_decode_rgb_scan()
...
The initialization code was only being run when mb_y was 0, so it could
just as well be moved out of the loop.
I haven't been able to find a bayer sample that has restart markers to
check whether vpred should be reinitialized at every restart. It would
seem logical that it should, but I have left this out until we find a
sample that does have restart markers.
2026-02-20 16:32:10 +01:00
Ramiro Polla
851cb118da
avcodec/jpegls: clear more JLSState fields inside ff_jpegls_init_state()
2026-02-20 16:32:10 +01:00
Ramiro Polla
3f2d4b49e6
avcodec/mjpegdec: split mjpeg_find_raw_scan_data() out of mjpeg_unescape_sos()
2026-02-20 16:32:10 +01:00