The ADPCM_PSXC block loop in adpcm_decode_frame() (libavcodec/adpcm.c:
2770) iterates 'block < avpkt->size / block_align' times and, for
each block, consumes
channels * (1 + (block_align - 1) / channels)
input bytes via the *unchecked* bytestream2_get_byteu() reader. The
loop divides avpkt->size by block_align, so the loop bound is sound
only when the per-block consumption equals block_align — i.e. when
block_align is an exact multiple of channels. For any other
combination (e.g. block_align=9 with channels=8), each block consumes
more than block_align bytes; iterating avpkt->size/block_align
blocks then walks the input bytestream past avpkt->data +
avpkt->size, producing the heap-buffer-overflow READ at
libavcodec/bytestream.h:99 reported as ANT-2026-04052.
adpcm_decode_init() previously only enforced 'channels > 0' and
'block_align > 0' for PSXC. Tighten the init check to additionally
require 'block_align % channels == 0', which is the precise
invariant the decode loop depends on.
Reproducer: a crafted WAV header declaring channels=8, block_align=9
with the decoder forced via 'ffmpeg -c:a adpcm_psxc -i evil.wav'.
Found-by: Anthropic agents; validated and reported by Ada Logics.
Signed-off-by: David Korczynski <david@adalogics.com>
The device-only compilation path of vf_scale_cuda.h pulled in <stdint.h>
solely to obtain uint8_t for the CUdeviceptr typedef. On Windows-on-ARM
(aarch64 mingw) this drags in _mingw.h, whose ARM __prefetch intrinsic is
guarded by !__has_builtin(__prefetch). During clang's --cuda-device-only
pass __has_builtin has deferred/inconsistent semantics on the auxiliary
(host) target, so the guard mis-fires, the inline __prefetch definition is
emitted, and clang rejects it:
_mingw.h: error: definition of builtin function '__prefetch'
This broke the msys2-clangarm64 FATE slot once ffnvcodec (and thus the
nvcc-compiled CUDA filters) was enabled for aarch64 Windows.
uint8_t is unsigned char, so use that directly and drop the <stdint.h>
include. Device-only code should not depend on the host C runtime headers.
No functional or ABI change.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
vc1_loop_filter() is only reached through the six C wrappers. Clang 14
keeps it out of line with plain static inline, adding a 224-byte stack
frame before the tiny bestcase path on rpi 5. gcc 12 already inlines
it.
rpi 5 clang 14:
before after
vc1_v_loop_filter4_bestcase_c 27.2 8.3 (3.3x)
vc1_h_loop_filter4_bestcase_c 26.4 10.2 (2.6x)
vc1_v_loop_filter8_bestcase_c 32.5 20.3 (1.6x)
vc1_h_loop_filter8_bestcase_c 31.7 19.5 (1.6x)
vc1_v_loop_filter16_bestcase_c 42.1 33.2 (1.3x)
vc1_h_loop_filter16_bestcase_c 41.6 25.3 (1.6x)
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The C version is faster than the previous asm with clang and gcc > 12 on
rpi5, since compiler basically does the same unroll.
sum64x5_neon: before after
Cortex-A76 (gcc 12.4): 72.3 (3.63x) 47.4 (5.56x)
Cortex-A76 (gcc 14.2): 72.3 (0.69x) 47.4 (1.05x)
Apple M1 (clang 16): 0.2 (0.98x) 0.2 (0.99x)
Signed-off-by: Zhao Zhili <quinkblack@foxmail.com>
Unroll to 16 floats per iteration with four independent accumulators
and reduce them once after the loop.
scalarproduct_float_neon: before after
Apple M1 (clang 16): 0.9 (3.56x) 0.4 (9.18x)
Cortex-A76 (gcc 12.4): 118.7 (4.43x) 85.3 (6.15x)
Signed-off-by: Zhao Zhili <quinkblack@foxmail.com>
In yuv420_gbrp_ssse3, the boundary safeguard check "h_size * 3 >
FFABS(dstStride[0])" was erroneously set based on probably packed RGB24
formats (where each pixel spans 3 bytes per row).
For GBRP (planar GBR), each plane contains only 1 component per pixel
per row, meaning dstStride[0] corresponds exactly to width.
Multiplying h_size by 3 mistakenly triggers the condition for normal
widths, decreasing h_size by 8. This leaves the rightmost 8 pixels
of every row completely uninitialized (black).
Fix this by checking "h_size > FFABS(dstStride[0])" instead.
How to Reproduce the error:
1. Generate buggy and fixed outputs as PNGs using the 600x600 pipeline:
buggy output without the fix
$ ffmpeg -f lavfi -i color=c=red:size=600x600:rate=1 \
-vf format=yuv420p,format=gbrp \
-frames:v 1 -y buggy_red_600.png
fixed output with the fix
$ ffmpeg -f lavfi -i color=c=red:size=600x600:rate=1 \
-vf format=yuv420p,format=gbrp \
-frames:v 1 -y fixed_red_600.png
2. Verify buggy_red_600.png in an image viewer:
A strict, 8-pixel wide vertical black stripe (columns 592 to 599) is
clearly visible running top-to-bottom down the rightmost edge of the image.
3. Verify fixed_red_600.png in an image viewer as well:
The output renders a perfect, uniform, fully solid red square across
the entire 600x600 canvas, indicating the boundary bug is successfully resolved.
Found-by: Claude (Anthropic). Human-verified and reported by
Omkhar Arasaratnam <omkhar@linkedin.com>.
Signed-off-by: Omkhar Arasaratnam <omkhar@linkedin.com>
Add printing of AV_CODEC_CAP_ENCODER_REORDERED_OPAQUE,
AV_CODEC_CAP_ENCODER_FLUSH, and AV_CODEC_CAP_ENCODER_RECON_FRAME
capabilities that were defined but not displayed.
This writes 4 bytes but in SSE4 mode only produces 2 bytes per vector. We
can avoid over-writing by using the appropriately sized register.
Reproducible by:
make libswscale/tests/swscale
libswscale/tests/swscale -dst monob -unscaled 1 -flags unstable -align_src 1 -align_dst 1
Signed-off-by: Niklas Haas <git@haasn.dev>
These loops were both assuming that `h` lines need to be copied; but this
varies. First of all, for plane subsampling; but more importantly, when
vertically scaling, the input line count may be substantially lower than the
actual line count.
This fixes an out-of-bounds read/write when vertically upscaling with a tail
buffer.
Verifiable via e.g.:
make libswscale/tests/swscale
valgrind -- libswscale/tests/swscale -s 63x63 -src yuv444p -dst rgb24 \
-flags unstable -align_src 1 -align_dst 1
(As well as the SSIM scores, which drop from ~e-5 to ~e-3 without this fix)
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
libplacebo versions before v365 passed .flags = 0 when retrieving the queues
from imported Vulkan devices, so we have to error out in the case of a mismatch
to avoid undefined behavior (Vulkan spec).
See-Also: https://code.videolan.org/videolan/libplacebo/-/merge_requests/856
Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
These are needed for interop with e.g. libplacebo, which needs to know the
correct flags to call vkGetDeviceQueue2.
Signed-off-by: Niklas Haas <git@haasn.dev>
decode_tsd() computes the binomial coefficient c = C(k, p) incrementally.
this commit makes it less overflow prone
Fixes: 515703905/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_DEC_fuzzer-4890954254581760
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Later code will turn this into AVERROR_BUG
When returning sample_rate == 0 samples is considered a bug, we have no
nice choice but to error out cleanly
Fixes: assertion failure
Fixes: ffmpeg_AV_CODEC_ID_AAC_DEC_fuzzer crash-0a86d46fef2442b222ee34403c21f7f582ffccb0
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Log the script and direction picked by HarfBuzz, plus codepoint and
glyph counts, so the shaper choice can be verified. Differing
codepoint and glyph counts indicate reordering / ligation /
decomposition.
Codepoints are sampled before hb_shape(), which flips the buffer
content type to GLYPHS.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
shape_text_hb() set HB_SCRIPT_LATIN and called
hb_buffer_guess_segment_properties() on an empty buffer, so the
inference was a no-op. Bengali and other Indic / USE scripts reached
the default OT shaper instead of their script-specific shaper,
leaving the virama visible and consonants disjointed (e.g. স্টারমার
rendered as স্ টারমার).
Add the UTF-8 text first, keep the existing LTR direction used by the
FriBidi visual-order pipeline, then guess segment properties so the
script comes from the actual Unicode contents.
Fixes: https://code.ffmpeg.org/FFmpeg/FFmpeg/issues/23014
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Adding support to build FFmpeg with HW accelerated decode (nvdec) and
encode (nvenc) on aarch64 Windows, covering both the MinGW (mingw32/
mingw64) and MSVC (win32/win64) toolchains. The dynamically-loaded
NVIDIA codec headers and the CUDA loader are architecture-agnostic, so
the only gate was the target_os check in the aarch64/ppc64 branch.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Developers can attach sample files to a PR and list their target paths
within the fate-suite in a fate-samples block in the PR description:
```fate-samples
vorbis/tos.ogg
mov/some-new-sample.mov
```
A new inject-pr-samples.py script fetches the PR metadata from the
Forgejo API, resolves each listed path to its matching attachment by
filename, and downloads the files into the fate-suite directory before
FATE runs.
The script validates that pr-number is an integer, that paths are
relative, contain no '..', and are at most 3 components deep (matching
the deepest paths in the existing fate-suite). Attachment URLs are
restricted to the code.ffmpeg.org domain.
The script exports a new_samples=true/false output via $FORGEJO_OUTPUT.
After FATE completes, a final workflow step fails the run if any new
sample was injected, reminding contributors to add their samples to the
official fate-suite before the PR can be merged.
The script can also be used locally:
SAMPLES=/path/to/fate-suite .forgejo/inject-pr-samples.py <pr-number>