Found-by: Anthropic agents; validated and reported by Ada Logics.
Signed-off-by: David Korczynski <david@adalogics.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Removes the special -I flag specified in the avcodec/bsf/ subdirectory.
This makes code copy-pastable to other parts of the ffmpeg codebase, as
well as simplifying the build script.
It also reduces ambiguity, since there are many instances of same-named
header files existing in both libavformat/ and libavcodec/
subdirectories.
Reverts: 0e4dfa4709
Reapplies: 41b73ae883
The aarch64 VP9 loopfilters actually violate aarch64 GCS
(Guarded Control Stack), even though we marked the code as GCS
compliant in 846746be4b.
This means that builds with GCS enabled, after that commit,
will crash when decoding VP9, on future hardware (or current
QEMU) that supports GCS. This also goes for ffmpeg version 8.1.1
where the GCS enabling was backported.
This matches the fix that was done for hevcdsp in
1f7ed8a78d.
This issue wasn't observed if running checkasm in QEMU - therefore,
I thought all GCS issues had been fixed by
846746be4b. (If I would have
tested the full "make fate" with QEMU, the issue would
have appeared though.)
However with the new checkasm, some of the GCS violations
do appear even in checkasm.
The reason is that the checkasm vp9 test intentionally craft
input pixels that attempt to trigger all the individual
separate cases in each input buffer (in
randomize_loopfilter_buffers). This means that the checkasm
tests actually never test or exercise the early exit cases,
which are the ones that violate GCS.
With the new checkasm, the call to "bench_new" always test
running the code at least once, even if not benchmarking.
As the input buffers weren't reinitialized between the test
and "bench_new", the pixel differences now differ from the
initial setup, so that the code now some times (often) would
end up hitting the early exit cases.
Ideally, the vp9 checkasm test would be repeated to cover all
cases of input buffers that allow early exits, in addition to
covering the case with all different cases in one block.
Context:
1. In the case sps_subpic_info_present=0, there is a single subpicture
which includes the entire picture.
2. When sps_subpic_info_present=0, we might be using Reference Picture
Resampling (RPR), in which picture sizes might differ in the PPS,
rather than in the SPS.
Because of 2., we can't rely on the sequence-level variables
sps_subpic_width_minus1 and sps_subpic_height_minus1 to derive the
picture-level variable num_entry_points, as the picture might have a
different size to the picture used when deriving those sequence-level
variables.
NV_ENC_CLOCK_TIMESTAMP_SET was changed in SDK 13.1: countingType was
replaced by countingTypeLSB and countingTypeMSB.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
Add NEON-optimized implementations for HEVC angular intra prediction
modes 10 (pure horizontal) and 26 (pure vertical) at 8-bit depth.
Mode 10 (Horizontal):
- Broadcasts left[y] to fill each row using ld2r/ld4r for efficiency
- Applies edge smoothing for luma blocks smaller than 32x32
Mode 26 (Vertical):
- Copies top reference row to all output rows
- Applies edge smoothing for luma blocks smaller than 32x32
Edge smoothing uses uhsub+usqadd to compute the filtered result
directly in 8-bit, avoiding widening to 16-bit intermediates.
The C pred_angular wrappers are made non-static with ff_ prefix to
allow the NEON dispatch to fall back to C for modes not yet optimized.
This will be reverted once all angular modes are implemented.
Note: since pred_angular[] is a per-size function pointer (not
per-mode), checkasm benchmarks will show '_neon' for all 33 modes
even though only modes 10/26 are truly accelerated; unoptimized
modes show ~1.0x speedup as they pass through the NEON wrapper to
the C fallback with negligible overhead.
Speedup over C on Apple M4 (checkasm --bench, 15-run average):
Mode 10 (Horizontal):
4x4: 4.66x 8x8: 5.80x 16x16: 16.86x 32x32: 24.89x
Mode 26 (Vertical):
4x4: 1.16x 8x8: 1.83x 16x16: 2.45x 32x32: 4.50x
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Set sps->vui.sar to {0,1} (unspecified) before the VUI parsing
block, matching the HEVC pattern at hevc_ps.c. The old
zero-init-to-1 workaround is now unreachable and is removed.
Suggested-by: James Almer <jamrial@gmail.com>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Per ITU-T H.264 (ISO/IEC 14496-10) Annex E.2.1 and ITU-T H.265
(ISO/IEC 23008-2) Annex E.3.1, when sar_width or sar_height is zero
the sample aspect ratio shall be considered unspecified. Internally
ffmpeg represents an unspecified SAR as 0/1, while fractions with a
zero denominator are not handled properly (den=0 is silently changed
to den=1 in h264_ps.c, turning an invalid 20480/0 into a "valid" but
impossibly extreme 20480/1); so we bridge the gap by replacing x/0
with 0/1 at the VUI parsing layer.
An av_log warning is added so an invalid SAR in the bitstream is
diagnosed rather than silently overwritten.
This fixes a problem with some video files provided by game
OddBallers when executed with Wine/Proton, which report SAR 20480/0.
Based on patch by Giovanni Mascellani <gmascellani@codeweavers.com>.
Fixes: ticket #23321
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
If this were to be checked, it should be checked generically,
not in every single encoder.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Instead use CODEC_PIXFMTS. Avoids deprecation warnings
from Clang and simplifies the removal of AVCodec.pix_fmts.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Some 7.1 DTS files seem to signal Lw/Rw channels that the decoder has been
mapping to SL/SR, despite the macro for the mask being called 7_1_WIDE.
This resulted in said samples reporting the same native layout as actual 7.1
samples with Lsr/Rsr/Lss/Rss (mapped to BL/BR/SL/SR).
If we were to be strict, Lw/Rw would map to WR/WL, but that would result in an
unusual native layout. Instead, lets map them to FLC/FRC, which will result in
the more common 7.1(wide) native layout.
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes: signed integer overflow: 314572800 * 8 cannot be represented in type 'int'
Tighten the guard to INT_MAX/14, which covers the largest expansion
factor used in the function currently.
Found-by: Jiale Yao <19888972804@163.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
fastaudio_decode() computes
subframes = pkt->size / (40 * channels);
frame->nb_samples = subframes * 256;
both as 32-bit signed multiplications. When pkt->size is large enough
to make subframes >= 2^24, the second multiplication overflows the
signed int range and frame->nb_samples wraps to a small value.
ff_get_buffer() then sizes the audio plane for that wrapped sample
count, while the decoder loop at line 152 still iterates the full
(unwrapped) subframes count, performing a 1024-byte memcpy per
subframe per channel. The 27th iteration (or first iteration with
nb_samples=0) writes one byte past the per-plane allocation,
yielding the ASan heap-buffer-overflow WRITE at libavcodec/fastaudio
.c:171 reported as ANT-2026-03891.
Reject the subframes value whose *256 product would overflow before
performing the multiplication. The bound INT_MAX / 256 (= 8388607)
keeps the existing two's-complement semantics of every reachable
input and rejects only the configurations that would have wrapped.
Reproducer: a crafted AVI declaring one mono audio chunk of
671_088_680 bytes (sparse) with the decoder forced via
'ffmpeg -c:a fastaudio -i evil.avi'.
Found-by: Anthropic agents; validated and reported by Ada Logics.
Signed-off-by: David Korczynski <david@adalogics.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
It is (arguably) a slightly better place for them and avoids
a forward declaration of enum AVExifHeaderMode which is not possible
in ISO C before C23 (and requires specifying the underlying type
with C23).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The list of codec descriptors is supposed to encompass all codec IDs;
it certainly encompasses all codec IDs used by de/encoders (this is
checked in the avcodec test program which is run via FATE).
So the avcodec_find_decoder()/avcodec_find_encoder() are pointless.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Reduces sizeof(AACPCEInfo) from 296 to 120 bytes.
This reduces .rodata by 4576B here.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Most parsers outright dislike anything being signaled as SIDE, as they expect layouts
to follow how ordering is pre-defined in non-0 channel_config values.
Signed-off-by: James Almer <jamrial@gmail.com>
The ADPCM_PSXC block loop in adpcm_decode_frame() (libavcodec/adpcm.c:
2770) iterates 'block < avpkt->size / block_align' times and, for
each block, consumes
channels * (1 + (block_align - 1) / channels)
input bytes via the *unchecked* bytestream2_get_byteu() reader. The
loop divides avpkt->size by block_align, so the loop bound is sound
only when the per-block consumption equals block_align — i.e. when
block_align is an exact multiple of channels. For any other
combination (e.g. block_align=9 with channels=8), each block consumes
more than block_align bytes; iterating avpkt->size/block_align
blocks then walks the input bytestream past avpkt->data +
avpkt->size, producing the heap-buffer-overflow READ at
libavcodec/bytestream.h:99 reported as ANT-2026-04052.
adpcm_decode_init() previously only enforced 'channels > 0' and
'block_align > 0' for PSXC. Tighten the init check to additionally
require 'block_align % channels == 0', which is the precise
invariant the decode loop depends on.
Reproducer: a crafted WAV header declaring channels=8, block_align=9
with the decoder forced via 'ffmpeg -c:a adpcm_psxc -i evil.wav'.
Found-by: Anthropic agents; validated and reported by Ada Logics.
Signed-off-by: David Korczynski <david@adalogics.com>
vc1_loop_filter() is only reached through the six C wrappers. Clang 14
keeps it out of line with plain static inline, adding a 224-byte stack
frame before the tiny bestcase path on rpi 5. gcc 12 already inlines
it.
rpi 5 clang 14:
before after
vc1_v_loop_filter4_bestcase_c 27.2 8.3 (3.3x)
vc1_h_loop_filter4_bestcase_c 26.4 10.2 (2.6x)
vc1_v_loop_filter8_bestcase_c 32.5 20.3 (1.6x)
vc1_h_loop_filter8_bestcase_c 31.7 19.5 (1.6x)
vc1_v_loop_filter16_bestcase_c 42.1 33.2 (1.3x)
vc1_h_loop_filter16_bestcase_c 41.6 25.3 (1.6x)
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The C version is faster than the previous asm with clang and gcc > 12 on
rpi5, since compiler basically does the same unroll.
sum64x5_neon: before after
Cortex-A76 (gcc 12.4): 72.3 (3.63x) 47.4 (5.56x)
Cortex-A76 (gcc 14.2): 72.3 (0.69x) 47.4 (1.05x)
Apple M1 (clang 16): 0.2 (0.98x) 0.2 (0.99x)
Signed-off-by: Zhao Zhili <quinkblack@foxmail.com>