ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-04-18 16:40:23 +00:00

Author	SHA1	Message	Date
Andreas Rheinhardt	aa483bc422	avcodec/x86/bswapdsp: Avoid aligned vs unaligned codepaths for AVX2 For modern cpus (like those supporting AVX2) loads and stores using the unaligned versions of instructions are as fast as aligned ones if the address is aligned, so remove the aligned AVX2 version (and the alignment check) and just use the unaligned one. Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-27 18:25:43 +01:00
Andreas Rheinhardt	55afe49dd0	avcodec/x86/bswapdsp: combine shifting, avoid check for AVX2 This avoids a check and a shift if >=8 elements are processed; it adds a check if < 8 elements are processed (which should be rare). No change in benchmarks here. Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-27 18:25:31 +01:00
Andreas Rheinhardt	3e6fa5153e	avcodec/x86/bswapdsp: Avoid register copies No change in benchmarks here. Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-27 18:25:01 +01:00
Zhao Zhili	86d2fae59f	avcodec: use int instead of enum for AVOption fields AVOption with AV_OPT_TYPE_INT assumes the field is int (4 bytes), but enum size is implementation-defined and may be smaller. This can cause memory corruption when AVOption writes 4 bytes to a field that is only 1-2 bytes, potentially overwriting adjacent struct members. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2026-02-26 11:40:09 +08:00
Ling, Edison	00d3417b71	avcodec/d3d12va_encode: Add H264 entropy coder parameter support Add parameter `coder` for users to select entropy coding in D3D12 H264 encoding. Named constants `cabac` (1) and `cavlc` (0) are supported. Default is CABAC (1). If the driver does not support CABAC, a warning is logged and encoding falls back to CAVLC. Usage: CABAC (default): `-coder cabac` or `-coder 1` CAVLC: `-coder cavlc` or `-coder 0` Sample command line: ``` ffmpeg -hwaccel d3d12va -hwaccel_output_format d3d12 -i input.mp4 -c:v h264_d3d12va -coder cavlc -y output.mp4 ```	2026-02-26 02:19:21 +00:00
Werner Robitza	5ba2525c7a	avcodec/libsvtav1: enable 2-pass encoding This patch enables two-pass encoding for libsvtav1 by implementing support for AV_CODEC_FLAG_PASS1 and AV_CODEC_FLAG_PASS2. Previously, users requiring two-pass encoding with SVT-AV1 had to use the standalone SvtAv1EncApp tool. This patch allows 2-pass encoding directly through FFmpeg. Based on patch by Fredrik Lundkvist, with review feedback from James Almer and Andreas Rheinhardt. See: https://ffmpeg.org/pipermail/ffmpeg-devel/2024-May/327452.html Changes: - Use AV_BASE64_DECODE_SIZE macro for buffer size calculation - Allocate own buffer for rc_stats_buffer (non-ownership pointer) - Error handling with buffer cleanup Signed-off-by: Werner Robitza <werner.robitza@gmail.com>	2026-02-25 16:43:53 +01:00
Andreas Rheinhardt	dc65dcec22	avcodec/vvc/inter: Combine offsets early For bi-predicted weighted averages, only the sum of the two offsets is ever used, so add the two early. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-25 12:08:33 +01:00
stevxiao	fc7c38f9da	avcodec/d3d12va_encode: add detailed ValidationFlags error reporting for video encoders check feature support Improves error diagnostics for D3D12 video encoders check feature support by adding detailed ValidationFlags reporting when driver validation fails. This made it easy for users to identify which specific feature was unsupported without manually decoding the flags. Signed-off-by: younengxiao <steven.xiao@amd.com>	2026-02-25 08:47:14 +00:00
James Almer	145f6e5878	avcodec/cbs_h2645: split into separate files per module This file is becoming too bloated and hard to read, so split it into separate files, each having codec specific methods. This will also speed up compilation when using several concurrent jobs. Signed-off-by: James Almer <jamrial@gmail.com>	2026-02-24 10:32:20 -03:00
Michael Niedermayer	7761b8fbac	avcodec/imm5: Dont pass EAGAIN on as is Fixes: Assertion consumed != (-(11)) failed at libavcodec/decode.c:465 Fixes: 471587358/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_IMM5_fuzzer-4737412376100864 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-23 23:58:11 +01:00
Michael Niedermayer	302f198ba5	avcodec/mjpegdec: Check for multiple exif Fixes: memleak Fixes: 477993717/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AMV_DEC_fuzzer-4515108431921152 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-23 23:52:37 +01:00
Michael Niedermayer	2ab23ec729	avcodec/interplayacm: Check input for fill_block() Fixes: Timeout Fixes: 476763877/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_INTERPLAY_ACM_fuzzer-4515681843609600 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-23 23:50:49 +01:00
Michael Niedermayer	538824fd84	avcodec/hdrdec: Check input size before buffer allocation Fixes: Timeout Fixes: 471948155/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HDR_DEC_fuzzer-5679690418552832 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-23 23:28:09 +01:00
Michael Niedermayer	55bb6e2646	avcodec/tmv: Move space check before buffer allocation Fixes: Timeout Fixes: 471664630/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TMV_fuzzer-5291752530706432 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-23 23:26:20 +01:00
Michael Niedermayer	4446dfb0e3	avcodec/flashsv: Check for input space before (re)allocating frame Fixes: Timeout Fixes: 471605680/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FLASHSV2_DEC_fuzzer-6210773459468288 Fixes: 471605920/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FLASHSV_DEC_fuzzer-6230719287590912 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-23 22:59:44 +01:00
Michael Niedermayer	40cafc25cf	avcodec/mdec: Check input space vs minimal block size Fixes: Timeout Fixes: 481006706/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MDEC_fuzzer-6122832651419648 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-23 22:54:38 +01:00
Michael Niedermayer	73681f888d	avcodec/h264_parser: Check remaining input length in loop in scan_mmco_reset() Fixes: read of uninitialized memory Fixes: 476177761/clusterfuzz-testcase-minimized-ffmpeg_dem_H264_fuzzer-6400884824408064 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-23 22:43:28 +01:00
Marvin Scholz	fba9fc0c6b	lavc: wmadec: limit variable scopes Moves the loop variable declarations to the actual loops, narrowing their scopes.	2026-02-23 15:29:27 +00:00
Marvin Scholz	d219be03d6	lavc: wmadec: assert channels count This should never exceed MAX_CHANNELS, else there will be several out of bounds writes.	2026-02-23 15:29:27 +00:00
Lynne	baad75cafa	aacdec_usac: add support for parsing Mpsp212 (MPEG surround) This commit adds the full bitstream parsing for Mps212.	2026-02-23 07:57:57 +01:00
Lynne	86977fdb6b	aacdec_tab: add Mps212 tables To be used in the following commit.	2026-02-23 07:57:57 +01:00
Lynne	a4ab4a98c4	aacdec_tab: split up tables init	2026-02-23 07:57:57 +01:00
Michael Niedermayer	7e10579f49	avcodec/exr: fix AVERROR typo Fixes: out of array read Fixes: 485866440/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_EXR_DEC_fuzzer-4520520419966976 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-23 01:44:49 +00:00
Andreas Rheinhardt	53a9a34e23	avcodec/snow: Reduce sizeof(SnowContext) Each SubBand currently contains an array of 519 uint8_t[32], yet most of these are unused: For both the decoder and the encoder, at most 34 contexts are actually used: The only variable index is context+2, where context is the result of av_log2() and therefore in the 0..31 range. There are also several accesses using compile-time indices, the highest of which is 30. FATE passes with 31 contexts and maybe these are enough, but I don't know. Reducing the number to 34 reduces sizeof(SnowContext) from 2141664B to 155104B here (on x64). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-22 22:05:16 +01:00
Andreas Rheinhardt	bb92009386	avcodec/snow: Only allocate emu_edge_buffer for encoder Also allocate it during init and move it to the encoder's context. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-22 22:05:15 +01:00
Lynne	13e063ceec	vulkan/ffv1: properly initialize the linecache	2026-02-22 03:39:23 +01:00
Michael Niedermayer	99515a3342	avcodec/jpeg2000htdec: Check Lcup and Lref Fixes: use of uninitialized memory Fixes: 482494999/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_DEC_fuzzer-6467586186608640 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-22 02:31:06 +00:00
Andreas Rheinhardt	6c1c1720cf	avcodec/x86/vvc/dsp_init: Mark dsp init function as av_cold Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-22 01:05:12 +01:00
Andreas Rheinhardt	af3f8f5bd2	avcodec/x86/vvc/of: Break dependency chain Don't extract and update one word of one and the same register at a time; use separate src and dst registers, so that pextrw and bsr can be done in parallel. Also use movd instead of pinsrw for the first word. Old benchmarks: apply_bdof_8_8x16_c: 3275.2 ( 1.00x) apply_bdof_8_8x16_avx2: 487.6 ( 6.72x) apply_bdof_8_16x8_c: 3243.1 ( 1.00x) apply_bdof_8_16x8_avx2: 284.4 (11.40x) apply_bdof_8_16x16_c: 6501.8 ( 1.00x) apply_bdof_8_16x16_avx2: 570.0 (11.41x) apply_bdof_10_8x16_c: 3286.5 ( 1.00x) apply_bdof_10_8x16_avx2: 461.7 ( 7.12x) apply_bdof_10_16x8_c: 3274.5 ( 1.00x) apply_bdof_10_16x8_avx2: 271.4 (12.06x) apply_bdof_10_16x16_c: 6590.0 ( 1.00x) apply_bdof_10_16x16_avx2: 543.9 (12.12x) apply_bdof_12_8x16_c: 3307.6 ( 1.00x) apply_bdof_12_8x16_avx2: 462.2 ( 7.16x) apply_bdof_12_16x8_c: 3287.4 ( 1.00x) apply_bdof_12_16x8_avx2: 271.8 (12.10x) apply_bdof_12_16x16_c: 6465.7 ( 1.00x) apply_bdof_12_16x16_avx2: 543.8 (11.89x) New benchmarks: apply_bdof_8_8x16_c: 3255.7 ( 1.00x) apply_bdof_8_8x16_avx2: 349.3 ( 9.32x) apply_bdof_8_16x8_c: 3262.5 ( 1.00x) apply_bdof_8_16x8_avx2: 214.8 (15.19x) apply_bdof_8_16x16_c: 6471.6 ( 1.00x) apply_bdof_8_16x16_avx2: 429.8 (15.06x) apply_bdof_10_8x16_c: 3227.7 ( 1.00x) apply_bdof_10_8x16_avx2: 321.6 (10.04x) apply_bdof_10_16x8_c: 3250.2 ( 1.00x) apply_bdof_10_16x8_avx2: 201.2 (16.16x) apply_bdof_10_16x16_c: 6476.5 ( 1.00x) apply_bdof_10_16x16_avx2: 400.9 (16.16x) apply_bdof_12_8x16_c: 3230.7 ( 1.00x) apply_bdof_12_8x16_avx2: 321.8 (10.04x) apply_bdof_12_16x8_c: 3210.5 ( 1.00x) apply_bdof_12_16x8_avx2: 200.9 (15.98x) apply_bdof_12_16x16_c: 6474.5 ( 1.00x) apply_bdof_12_16x16_avx2: 400.2 (16.18x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-22 01:05:12 +01:00
Andreas Rheinhardt	19dc7b79a4	avcodec/x86/vvc/of: Unify shuffling One can use the same shuffles for the width 8 and width 16 case if one also changes the permutation in vpermd (that always follows pshufb for width 16). This also allows to load it before checking width. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-22 01:03:22 +01:00
Andreas Rheinhardt	8e82416434	avcodec/x86/vvc/of: Avoid unused register Avoids a push+pop. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-22 01:02:20 +01:00
Andreas Rheinhardt	81fb70c833	avcodec/x86/vvc/mc,dsp_init: Avoid pointless wrappers for w_avg They only add overhead (in form of another function call, sign-extending some parameters to 64bit (although the upper bits are not used at all) and rederiving the actual number of bits (from the maximum value (1<<bpp)-1)). Old benchmarks: w_avg_8_2x2_c: 16.4 ( 1.00x) w_avg_8_2x2_avx2: 12.9 ( 1.27x) w_avg_8_4x4_c: 48.0 ( 1.00x) w_avg_8_4x4_avx2: 14.9 ( 3.23x) w_avg_8_8x8_c: 168.2 ( 1.00x) w_avg_8_8x8_avx2: 22.4 ( 7.49x) w_avg_8_16x16_c: 396.5 ( 1.00x) w_avg_8_16x16_avx2: 47.9 ( 8.28x) w_avg_8_32x32_c: 1466.3 ( 1.00x) w_avg_8_32x32_avx2: 172.8 ( 8.48x) w_avg_8_64x64_c: 5629.3 ( 1.00x) w_avg_8_64x64_avx2: 678.7 ( 8.29x) w_avg_8_128x128_c: 22122.4 ( 1.00x) w_avg_8_128x128_avx2: 2743.5 ( 8.06x) w_avg_10_2x2_c: 18.7 ( 1.00x) w_avg_10_2x2_avx2: 13.1 ( 1.43x) w_avg_10_4x4_c: 50.3 ( 1.00x) w_avg_10_4x4_avx2: 15.9 ( 3.17x) w_avg_10_8x8_c: 109.3 ( 1.00x) w_avg_10_8x8_avx2: 20.6 ( 5.30x) w_avg_10_16x16_c: 395.5 ( 1.00x) w_avg_10_16x16_avx2: 44.8 ( 8.83x) w_avg_10_32x32_c: 1534.2 ( 1.00x) w_avg_10_32x32_avx2: 141.4 (10.85x) w_avg_10_64x64_c: 6003.6 ( 1.00x) w_avg_10_64x64_avx2: 557.4 (10.77x) w_avg_10_128x128_c: 23722.7 ( 1.00x) w_avg_10_128x128_avx2: 2205.0 (10.76x) w_avg_12_2x2_c: 18.6 ( 1.00x) w_avg_12_2x2_avx2: 13.1 ( 1.42x) w_avg_12_4x4_c: 52.2 ( 1.00x) w_avg_12_4x4_avx2: 16.1 ( 3.24x) w_avg_12_8x8_c: 109.2 ( 1.00x) w_avg_12_8x8_avx2: 20.6 ( 5.29x) w_avg_12_16x16_c: 396.1 ( 1.00x) w_avg_12_16x16_avx2: 45.0 ( 8.81x) w_avg_12_32x32_c: 1532.6 ( 1.00x) w_avg_12_32x32_avx2: 142.1 (10.79x) w_avg_12_64x64_c: 6002.2 ( 1.00x) w_avg_12_64x64_avx2: 557.3 (10.77x) w_avg_12_128x128_c: 23748.7 ( 1.00x) w_avg_12_128x128_avx2: 2206.4 (10.76x) New benchmarks: w_avg_8_2x2_c: 16.0 ( 1.00x) w_avg_8_2x2_avx2: 9.3 ( 1.71x) w_avg_8_4x4_c: 48.4 ( 1.00x) w_avg_8_4x4_avx2: 12.4 ( 3.91x) w_avg_8_8x8_c: 168.7 ( 1.00x) w_avg_8_8x8_avx2: 21.1 ( 8.00x) w_avg_8_16x16_c: 394.5 ( 1.00x) w_avg_8_16x16_avx2: 46.2 ( 8.54x) w_avg_8_32x32_c: 1456.3 ( 1.00x) w_avg_8_32x32_avx2: 171.8 ( 8.48x) w_avg_8_64x64_c: 5636.2 ( 1.00x) w_avg_8_64x64_avx2: 676.9 ( 8.33x) w_avg_8_128x128_c: 22129.1 ( 1.00x) w_avg_8_128x128_avx2: 2734.3 ( 8.09x) w_avg_10_2x2_c: 18.7 ( 1.00x) w_avg_10_2x2_avx2: 10.3 ( 1.82x) w_avg_10_4x4_c: 50.8 ( 1.00x) w_avg_10_4x4_avx2: 13.4 ( 3.79x) w_avg_10_8x8_c: 109.7 ( 1.00x) w_avg_10_8x8_avx2: 20.4 ( 5.38x) w_avg_10_16x16_c: 395.2 ( 1.00x) w_avg_10_16x16_avx2: 41.7 ( 9.48x) w_avg_10_32x32_c: 1535.6 ( 1.00x) w_avg_10_32x32_avx2: 137.9 (11.13x) w_avg_10_64x64_c: 6002.1 ( 1.00x) w_avg_10_64x64_avx2: 548.5 (10.94x) w_avg_10_128x128_c: 23742.7 ( 1.00x) w_avg_10_128x128_avx2: 2179.8 (10.89x) w_avg_12_2x2_c: 18.9 ( 1.00x) w_avg_12_2x2_avx2: 10.3 ( 1.84x) w_avg_12_4x4_c: 52.4 ( 1.00x) w_avg_12_4x4_avx2: 13.4 ( 3.91x) w_avg_12_8x8_c: 109.2 ( 1.00x) w_avg_12_8x8_avx2: 20.3 ( 5.39x) w_avg_12_16x16_c: 396.3 ( 1.00x) w_avg_12_16x16_avx2: 41.7 ( 9.51x) w_avg_12_32x32_c: 1532.6 ( 1.00x) w_avg_12_32x32_avx2: 138.6 (11.06x) w_avg_12_64x64_c: 5996.7 ( 1.00x) w_avg_12_64x64_avx2: 549.6 (10.91x) w_avg_12_128x128_c: 23738.0 ( 1.00x) w_avg_12_128x128_avx2: 2177.2 (10.90x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-22 01:01:27 +01:00
Andreas Rheinhardt	ea78402e9c	avcodec/x86/vvc/mc,dsp_init: Avoid pointless wrappers for avg Up until now, there were two averaging assembly functions, one for eight bit content and one for <=16 bit content; there are also three C-wrappers around these functions, for 8, 10 and 12 bpp. These wrappers simply forward the maximum permissible value (i.e. (1<<bpp)-1) and promote some integer values to ptrdiff_t. Yet these wrappers are absolutely useless: The assembly functions rederive the bpp from the maximum and only the integer part of the promoted ptrdiff_t values is ever used. Of course, these wrappers also entail an additional call (not a tail call, because the additional maximum parameter is passed on the stack). Remove the wrappers and add per-bpp assembly functions instead. Given that the only difference between 10 and 12 bits are some constants in registers, the main part of these functions can be shared (given that this code uses a jumptable, it can even be done without adding any additional jump). Old benchmarks: avg_8_2x2_c: 11.4 ( 1.00x) avg_8_2x2_avx2: 7.9 ( 1.44x) avg_8_4x4_c: 30.7 ( 1.00x) avg_8_4x4_avx2: 10.4 ( 2.95x) avg_8_8x8_c: 134.5 ( 1.00x) avg_8_8x8_avx2: 16.6 ( 8.12x) avg_8_16x16_c: 255.6 ( 1.00x) avg_8_16x16_avx2: 28.2 ( 9.07x) avg_8_32x32_c: 897.7 ( 1.00x) avg_8_32x32_avx2: 83.9 (10.70x) avg_8_64x64_c: 3320.0 ( 1.00x) avg_8_64x64_avx2: 321.1 (10.34x) avg_8_128x128_c: 12981.8 ( 1.00x) avg_8_128x128_avx2: 1480.1 ( 8.77x) avg_10_2x2_c: 12.0 ( 1.00x) avg_10_2x2_avx2: 8.4 ( 1.43x) avg_10_4x4_c: 34.9 ( 1.00x) avg_10_4x4_avx2: 9.8 ( 3.56x) avg_10_8x8_c: 76.8 ( 1.00x) avg_10_8x8_avx2: 15.1 ( 5.08x) avg_10_16x16_c: 256.6 ( 1.00x) avg_10_16x16_avx2: 25.1 (10.20x) avg_10_32x32_c: 932.9 ( 1.00x) avg_10_32x32_avx2: 73.4 (12.72x) avg_10_64x64_c: 3517.9 ( 1.00x) avg_10_64x64_avx2: 414.8 ( 8.48x) avg_10_128x128_c: 13695.3 ( 1.00x) avg_10_128x128_avx2: 1648.1 ( 8.31x) avg_12_2x2_c: 13.1 ( 1.00x) avg_12_2x2_avx2: 8.6 ( 1.53x) avg_12_4x4_c: 35.4 ( 1.00x) avg_12_4x4_avx2: 10.1 ( 3.49x) avg_12_8x8_c: 76.6 ( 1.00x) avg_12_8x8_avx2: 16.7 ( 4.60x) avg_12_16x16_c: 256.6 ( 1.00x) avg_12_16x16_avx2: 25.5 (10.07x) avg_12_32x32_c: 933.2 ( 1.00x) avg_12_32x32_avx2: 75.7 (12.34x) avg_12_64x64_c: 3519.1 ( 1.00x) avg_12_64x64_avx2: 416.8 ( 8.44x) avg_12_128x128_c: 13695.1 ( 1.00x) avg_12_128x128_avx2: 1651.6 ( 8.29x) New benchmarks: avg_8_2x2_c: 11.5 ( 1.00x) avg_8_2x2_avx2: 6.0 ( 1.91x) avg_8_4x4_c: 29.7 ( 1.00x) avg_8_4x4_avx2: 8.0 ( 3.72x) avg_8_8x8_c: 131.4 ( 1.00x) avg_8_8x8_avx2: 12.2 (10.74x) avg_8_16x16_c: 254.3 ( 1.00x) avg_8_16x16_avx2: 24.8 (10.25x) avg_8_32x32_c: 897.7 ( 1.00x) avg_8_32x32_avx2: 77.8 (11.54x) avg_8_64x64_c: 3321.3 ( 1.00x) avg_8_64x64_avx2: 318.7 (10.42x) avg_8_128x128_c: 12988.4 ( 1.00x) avg_8_128x128_avx2: 1430.1 ( 9.08x) avg_10_2x2_c: 12.1 ( 1.00x) avg_10_2x2_avx2: 5.7 ( 2.13x) avg_10_4x4_c: 35.0 ( 1.00x) avg_10_4x4_avx2: 9.0 ( 3.88x) avg_10_8x8_c: 77.2 ( 1.00x) avg_10_8x8_avx2: 12.4 ( 6.24x) avg_10_16x16_c: 256.2 ( 1.00x) avg_10_16x16_avx2: 24.3 (10.56x) avg_10_32x32_c: 932.9 ( 1.00x) avg_10_32x32_avx2: 71.9 (12.97x) avg_10_64x64_c: 3516.8 ( 1.00x) avg_10_64x64_avx2: 414.7 ( 8.48x) avg_10_128x128_c: 13693.7 ( 1.00x) avg_10_128x128_avx2: 1609.3 ( 8.51x) avg_12_2x2_c: 14.1 ( 1.00x) avg_12_2x2_avx2: 5.7 ( 2.48x) avg_12_4x4_c: 35.8 ( 1.00x) avg_12_4x4_avx2: 9.0 ( 3.96x) avg_12_8x8_c: 76.9 ( 1.00x) avg_12_8x8_avx2: 12.4 ( 6.22x) avg_12_16x16_c: 256.5 ( 1.00x) avg_12_16x16_avx2: 24.4 (10.50x) avg_12_32x32_c: 934.1 ( 1.00x) avg_12_32x32_avx2: 72.0 (12.97x) avg_12_64x64_c: 3518.2 ( 1.00x) avg_12_64x64_avx2: 414.8 ( 8.48x) avg_12_128x128_c: 13689.5 ( 1.00x) avg_12_128x128_avx2: 1611.1 ( 8.50x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-22 00:58:33 +01:00
Andreas Rheinhardt	5a60b3f1a6	avcodec/x86/vvc/mc: Remove always-false branches The C versions of the average and weighted average functions contains "FFMAX(3, 15 - BIT_DEPTH)" and the code here followed this; yet it is only instantiated for bit depths 8, 10 and 12, for which the above is just 15-BIT_DEPTH. So the comparisons are unnecessary. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-22 00:57:56 +01:00
Andreas Rheinhardt	59f8ff4c18	avcodec/x86/vvc/mc: Remove unused constants Also avoid overaligning .rodata. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-22 00:57:56 +01:00
Andreas Rheinhardt	eabf52e787	avcodec/x86/vvc/mc: Avoid unused work The high quadword of these registers is zero for width 2. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-22 00:57:56 +01:00
Andreas Rheinhardt	9317fb2b2e	avcodec/x86/vvc/mc: Avoid ymm registers where possible Widths 2 and 4 fit into xmm registers. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-22 00:57:56 +01:00
Andreas Rheinhardt	caa0ae0cfb	avcodec/x86/vvc/mc: Avoid pextr[dq], v{insert,extract}i128 Use mov[dq], movdqu instead if the least significant parts are set (i.e. if the immediate value is 0x0). Old benchmarks: avg_8_2x2_c: 11.3 ( 1.00x) avg_8_2x2_avx2: 7.5 ( 1.50x) avg_8_4x4_c: 31.2 ( 1.00x) avg_8_4x4_avx2: 10.7 ( 2.91x) avg_8_8x8_c: 133.5 ( 1.00x) avg_8_8x8_avx2: 21.2 ( 6.30x) avg_8_16x16_c: 254.7 ( 1.00x) avg_8_16x16_avx2: 30.1 ( 8.46x) avg_8_32x32_c: 896.9 ( 1.00x) avg_8_32x32_avx2: 103.9 ( 8.63x) avg_8_64x64_c: 3320.7 ( 1.00x) avg_8_64x64_avx2: 539.4 ( 6.16x) avg_8_128x128_c: 12991.5 ( 1.00x) avg_8_128x128_avx2: 1661.3 ( 7.82x) avg_10_2x2_c: 21.3 ( 1.00x) avg_10_2x2_avx2: 8.3 ( 2.55x) avg_10_4x4_c: 34.9 ( 1.00x) avg_10_4x4_avx2: 10.6 ( 3.28x) avg_10_8x8_c: 76.3 ( 1.00x) avg_10_8x8_avx2: 20.2 ( 3.77x) avg_10_16x16_c: 255.9 ( 1.00x) avg_10_16x16_avx2: 24.1 (10.60x) avg_10_32x32_c: 932.4 ( 1.00x) avg_10_32x32_avx2: 73.3 (12.72x) avg_10_64x64_c: 3516.4 ( 1.00x) avg_10_64x64_avx2: 601.7 ( 5.84x) avg_10_128x128_c: 13690.6 ( 1.00x) avg_10_128x128_avx2: 1613.2 ( 8.49x) avg_12_2x2_c: 14.0 ( 1.00x) avg_12_2x2_avx2: 8.3 ( 1.67x) avg_12_4x4_c: 35.3 ( 1.00x) avg_12_4x4_avx2: 10.9 ( 3.26x) avg_12_8x8_c: 76.5 ( 1.00x) avg_12_8x8_avx2: 20.3 ( 3.77x) avg_12_16x16_c: 256.7 ( 1.00x) avg_12_16x16_avx2: 24.1 (10.63x) avg_12_32x32_c: 932.5 ( 1.00x) avg_12_32x32_avx2: 73.3 (12.72x) avg_12_64x64_c: 3520.5 ( 1.00x) avg_12_64x64_avx2: 602.6 ( 5.84x) avg_12_128x128_c: 13689.6 ( 1.00x) avg_12_128x128_avx2: 1613.1 ( 8.49x) w_avg_8_2x2_c: 16.7 ( 1.00x) w_avg_8_2x2_avx2: 13.4 ( 1.25x) w_avg_8_4x4_c: 44.5 ( 1.00x) w_avg_8_4x4_avx2: 15.9 ( 2.81x) w_avg_8_8x8_c: 166.1 ( 1.00x) w_avg_8_8x8_avx2: 45.7 ( 3.63x) w_avg_8_16x16_c: 392.9 ( 1.00x) w_avg_8_16x16_avx2: 57.8 ( 6.80x) w_avg_8_32x32_c: 1455.5 ( 1.00x) w_avg_8_32x32_avx2: 215.0 ( 6.77x) w_avg_8_64x64_c: 5621.8 ( 1.00x) w_avg_8_64x64_avx2: 875.2 ( 6.42x) w_avg_8_128x128_c: 22131.3 ( 1.00x) w_avg_8_128x128_avx2: 3390.1 ( 6.53x) w_avg_10_2x2_c: 18.0 ( 1.00x) w_avg_10_2x2_avx2: 14.0 ( 1.28x) w_avg_10_4x4_c: 53.9 ( 1.00x) w_avg_10_4x4_avx2: 15.9 ( 3.40x) w_avg_10_8x8_c: 109.5 ( 1.00x) w_avg_10_8x8_avx2: 40.4 ( 2.71x) w_avg_10_16x16_c: 395.7 ( 1.00x) w_avg_10_16x16_avx2: 44.7 ( 8.86x) w_avg_10_32x32_c: 1532.7 ( 1.00x) w_avg_10_32x32_avx2: 142.4 (10.77x) w_avg_10_64x64_c: 6007.7 ( 1.00x) w_avg_10_64x64_avx2: 745.5 ( 8.06x) w_avg_10_128x128_c: 23719.7 ( 1.00x) w_avg_10_128x128_avx2: 2217.7 (10.70x) w_avg_12_2x2_c: 18.9 ( 1.00x) w_avg_12_2x2_avx2: 13.6 ( 1.38x) w_avg_12_4x4_c: 47.5 ( 1.00x) w_avg_12_4x4_avx2: 15.9 ( 2.99x) w_avg_12_8x8_c: 109.3 ( 1.00x) w_avg_12_8x8_avx2: 40.9 ( 2.67x) w_avg_12_16x16_c: 395.6 ( 1.00x) w_avg_12_16x16_avx2: 44.8 ( 8.84x) w_avg_12_32x32_c: 1531.0 ( 1.00x) w_avg_12_32x32_avx2: 141.8 (10.80x) w_avg_12_64x64_c: 6016.7 ( 1.00x) w_avg_12_64x64_avx2: 732.8 ( 8.21x) w_avg_12_128x128_c: 23762.2 ( 1.00x) w_avg_12_128x128_avx2: 2223.4 (10.69x) New benchmarks: avg_8_2x2_c: 11.3 ( 1.00x) avg_8_2x2_avx2: 7.6 ( 1.49x) avg_8_4x4_c: 31.2 ( 1.00x) avg_8_4x4_avx2: 10.8 ( 2.89x) avg_8_8x8_c: 131.6 ( 1.00x) avg_8_8x8_avx2: 15.6 ( 8.42x) avg_8_16x16_c: 255.3 ( 1.00x) avg_8_16x16_avx2: 27.9 ( 9.16x) avg_8_32x32_c: 897.9 ( 1.00x) avg_8_32x32_avx2: 81.2 (11.06x) avg_8_64x64_c: 3320.0 ( 1.00x) avg_8_64x64_avx2: 335.1 ( 9.91x) avg_8_128x128_c: 12999.1 ( 1.00x) avg_8_128x128_avx2: 1456.3 ( 8.93x) avg_10_2x2_c: 12.0 ( 1.00x) avg_10_2x2_avx2: 8.6 ( 1.40x) avg_10_4x4_c: 34.9 ( 1.00x) avg_10_4x4_avx2: 9.7 ( 3.61x) avg_10_8x8_c: 76.7 ( 1.00x) avg_10_8x8_avx2: 16.3 ( 4.69x) avg_10_16x16_c: 256.3 ( 1.00x) avg_10_16x16_avx2: 25.2 (10.18x) avg_10_32x32_c: 932.8 ( 1.00x) avg_10_32x32_avx2: 73.3 (12.72x) avg_10_64x64_c: 3518.8 ( 1.00x) avg_10_64x64_avx2: 416.8 ( 8.44x) avg_10_128x128_c: 13691.6 ( 1.00x) avg_10_128x128_avx2: 1612.9 ( 8.49x) avg_12_2x2_c: 14.1 ( 1.00x) avg_12_2x2_avx2: 8.7 ( 1.62x) avg_12_4x4_c: 35.7 ( 1.00x) avg_12_4x4_avx2: 9.7 ( 3.68x) avg_12_8x8_c: 77.0 ( 1.00x) avg_12_8x8_avx2: 16.9 ( 4.57x) avg_12_16x16_c: 256.2 ( 1.00x) avg_12_16x16_avx2: 25.7 ( 9.96x) avg_12_32x32_c: 933.5 ( 1.00x) avg_12_32x32_avx2: 74.0 (12.62x) avg_12_64x64_c: 3516.4 ( 1.00x) avg_12_64x64_avx2: 408.7 ( 8.60x) avg_12_128x128_c: 13691.6 ( 1.00x) avg_12_128x128_avx2: 1613.8 ( 8.48x) w_avg_8_2x2_c: 16.7 ( 1.00x) w_avg_8_2x2_avx2: 14.0 ( 1.19x) w_avg_8_4x4_c: 48.2 ( 1.00x) w_avg_8_4x4_avx2: 16.1 ( 3.00x) w_avg_8_8x8_c: 168.0 ( 1.00x) w_avg_8_8x8_avx2: 22.5 ( 7.47x) w_avg_8_16x16_c: 392.5 ( 1.00x) w_avg_8_16x16_avx2: 47.9 ( 8.19x) w_avg_8_32x32_c: 1453.7 ( 1.00x) w_avg_8_32x32_avx2: 176.1 ( 8.26x) w_avg_8_64x64_c: 5631.4 ( 1.00x) w_avg_8_64x64_avx2: 690.8 ( 8.15x) w_avg_8_128x128_c: 22139.5 ( 1.00x) w_avg_8_128x128_avx2: 2742.4 ( 8.07x) w_avg_10_2x2_c: 18.1 ( 1.00x) w_avg_10_2x2_avx2: 13.8 ( 1.31x) w_avg_10_4x4_c: 47.0 ( 1.00x) w_avg_10_4x4_avx2: 16.4 ( 2.87x) w_avg_10_8x8_c: 110.0 ( 1.00x) w_avg_10_8x8_avx2: 21.6 ( 5.09x) w_avg_10_16x16_c: 395.2 ( 1.00x) w_avg_10_16x16_avx2: 45.4 ( 8.71x) w_avg_10_32x32_c: 1533.8 ( 1.00x) w_avg_10_32x32_avx2: 142.6 (10.76x) w_avg_10_64x64_c: 6004.4 ( 1.00x) w_avg_10_64x64_avx2: 672.8 ( 8.92x) w_avg_10_128x128_c: 23748.5 ( 1.00x) w_avg_10_128x128_avx2: 2198.0 (10.80x) w_avg_12_2x2_c: 17.2 ( 1.00x) w_avg_12_2x2_avx2: 13.9 ( 1.24x) w_avg_12_4x4_c: 51.4 ( 1.00x) w_avg_12_4x4_avx2: 16.5 ( 3.11x) w_avg_12_8x8_c: 109.1 ( 1.00x) w_avg_12_8x8_avx2: 22.0 ( 4.96x) w_avg_12_16x16_c: 395.9 ( 1.00x) w_avg_12_16x16_avx2: 44.9 ( 8.81x) w_avg_12_32x32_c: 1533.5 ( 1.00x) w_avg_12_32x32_avx2: 142.3 (10.78x) w_avg_12_64x64_c: 6002.0 ( 1.00x) w_avg_12_64x64_avx2: 557.5 (10.77x) w_avg_12_128x128_c: 23749.5 ( 1.00x) w_avg_12_128x128_avx2: 2202.0 (10.79x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-22 00:57:56 +01:00
Andreas Rheinhardt	7bf9c1e3f6	avcodec/x86/vvc/mc: Avoid redundant clipping for 8bit It is already done by packuswb. Old benchmarks: avg_8_2x2_c: 11.1 ( 1.00x) avg_8_2x2_avx2: 8.6 ( 1.28x) avg_8_4x4_c: 30.0 ( 1.00x) avg_8_4x4_avx2: 10.8 ( 2.78x) avg_8_8x8_c: 132.0 ( 1.00x) avg_8_8x8_avx2: 25.7 ( 5.14x) avg_8_16x16_c: 254.6 ( 1.00x) avg_8_16x16_avx2: 33.2 ( 7.67x) avg_8_32x32_c: 897.5 ( 1.00x) avg_8_32x32_avx2: 115.6 ( 7.76x) avg_8_64x64_c: 3316.9 ( 1.00x) avg_8_64x64_avx2: 626.5 ( 5.29x) avg_8_128x128_c: 12973.6 ( 1.00x) avg_8_128x128_avx2: 1914.0 ( 6.78x) w_avg_8_2x2_c: 16.7 ( 1.00x) w_avg_8_2x2_avx2: 14.4 ( 1.16x) w_avg_8_4x4_c: 48.2 ( 1.00x) w_avg_8_4x4_avx2: 16.5 ( 2.92x) w_avg_8_8x8_c: 168.1 ( 1.00x) w_avg_8_8x8_avx2: 49.7 ( 3.38x) w_avg_8_16x16_c: 392.4 ( 1.00x) w_avg_8_16x16_avx2: 61.1 ( 6.43x) w_avg_8_32x32_c: 1455.3 ( 1.00x) w_avg_8_32x32_avx2: 224.6 ( 6.48x) w_avg_8_64x64_c: 5632.1 ( 1.00x) w_avg_8_64x64_avx2: 896.9 ( 6.28x) w_avg_8_128x128_c: 22136.3 ( 1.00x) w_avg_8_128x128_avx2: 3626.7 ( 6.10x) New benchmarks: avg_8_2x2_c: 12.3 ( 1.00x) avg_8_2x2_avx2: 8.1 ( 1.52x) avg_8_4x4_c: 30.3 ( 1.00x) avg_8_4x4_avx2: 11.3 ( 2.67x) avg_8_8x8_c: 131.8 ( 1.00x) avg_8_8x8_avx2: 21.3 ( 6.20x) avg_8_16x16_c: 255.0 ( 1.00x) avg_8_16x16_avx2: 30.6 ( 8.33x) avg_8_32x32_c: 898.5 ( 1.00x) avg_8_32x32_avx2: 104.9 ( 8.57x) avg_8_64x64_c: 3317.7 ( 1.00x) avg_8_64x64_avx2: 540.9 ( 6.13x) avg_8_128x128_c: 12986.5 ( 1.00x) avg_8_128x128_avx2: 1663.4 ( 7.81x) w_avg_8_2x2_c: 16.8 ( 1.00x) w_avg_8_2x2_avx2: 13.9 ( 1.21x) w_avg_8_4x4_c: 48.2 ( 1.00x) w_avg_8_4x4_avx2: 16.2 ( 2.98x) w_avg_8_8x8_c: 168.6 ( 1.00x) w_avg_8_8x8_avx2: 46.3 ( 3.64x) w_avg_8_16x16_c: 392.4 ( 1.00x) w_avg_8_16x16_avx2: 57.7 ( 6.80x) w_avg_8_32x32_c: 1454.6 ( 1.00x) w_avg_8_32x32_avx2: 214.6 ( 6.78x) w_avg_8_64x64_c: 5638.4 ( 1.00x) w_avg_8_64x64_avx2: 875.6 ( 6.44x) w_avg_8_128x128_c: 22133.5 ( 1.00x) w_avg_8_128x128_avx2: 3334.3 ( 6.64x) Also saves 550B of .text here. The improvements will likely be even better on Win64, because it avoids using two nonvolatile registers in the weighted average case. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-22 00:57:56 +01:00
Michael Niedermayer	c98346ffaa	avcodec/libtheoraenc: make keyframe mask unsigned and handle its larger range Fixes: left shift of 1 by 31 places cannot be represented in type 'int' Fixes: 473579864/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_LIBTHEORA_fuzzer-5835688160591872 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-21 22:43:41 +00:00
Andreas Rheinhardt	3be4545b67	avcodec/vvc/inter: Deduplicate applying averaging Reviewed-by: Frank Plowman <post@frankplowman.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-21 12:48:50 +01:00
Andreas Rheinhardt	324fd0bc46	avcodec/vvc/inter: Remove redundant variable, fix shadowing Reviewed-by: Frank Plowman <post@frankplowman.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-21 12:48:50 +01:00
Andreas Rheinhardt	6777d5cd48	avcodec/vvc/inter: Remove always-false/true checks derive_weight() is only called when pred_flag is PF_BI, which only happens in B slices. Reviewed-by: Frank Plowman <post@frankplowman.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-21 12:48:50 +01:00
Ramiro Polla	a0b55a0491	avcodec/mjpegdec: fix indentation and some white spaces	2026-02-20 16:32:10 +01:00
Ramiro Polla	0accfde281	avcodec/jpeglsdec: fix decoding of jpegls files with restart markers	2026-02-20 16:32:10 +01:00
Ramiro Polla	5672c410a6	avcodec/mjpegdec: unescape data for each restart marker individually Instead of unescaping the entire image data buffer in advance, and then having to perform heuristics to skip over where the restart markers would have been, unescape the image data for each restart marker individually.	2026-02-20 16:32:10 +01:00
Ramiro Polla	22771117a0	avcodec/mjpegdec: move get_bits_left() checks after handling of restart count This commit doesn't really change much on its own, but it's helpful in preparation for the following commit.	2026-02-20 16:32:10 +01:00
Ramiro Polla	3783b8f5e1	avcodec/mjpegdec: move vpred initialization out of loop in ljpeg_decode_rgb_scan() The initialization code was only being run when mb_y was 0, so it could just as well be moved out of the loop. I haven't been able to find a bayer sample that has restart markers to check whether vpred should be reinitialized at every restart. It would seem logical that it should, but I have left this out until we find a sample that does have restart markers.	2026-02-20 16:32:10 +01:00
Ramiro Polla	851cb118da	avcodec/jpegls: clear more JLSState fields inside ff_jpegls_init_state()	2026-02-20 16:32:10 +01:00
Ramiro Polla	3f2d4b49e6	avcodec/mjpegdec: split mjpeg_find_raw_scan_data() out of mjpeg_unescape_sos()	2026-02-20 16:32:10 +01:00

1 2 3 4 5 ...

53628 commits