generate_missing_ref walked frame->f->data[] until a NULL slot, which
on alpha-video frames extended to data[3] and read
sps->hshift[3]/vshift[3] out of bounds.
The alpha plane is produced by the alpha layer via
replace_alpha_plane; the base decoder path never reads or writes it.
Bound the fill loop by the SPS coded plane count. This both removes
the out-of-bounds shift access and avoids an unnecessary full-frame
memset of the alpha plane.
Fixes: out of array read
Fixes: 500770604/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-6157374833623040
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
When an SPS uses the multi-layer extension (nuh_layer_id > 0 with
sps_max_sub_layers_minus1 == 7), width and height are taken from the
VPS rep_format without the av_image_check_size() validation that the
direct path performs. HEVC F.7.4.3.1.1 requires rep_format pic
dimensions to satisfy the constraints in 7.4.3.2.1, including
"pic_width_in_luma_samples shall not be equal to 0".
Run the same av_image_check_size() check in the multi-layer-extension
path so the SPS is rejected before it reaches setup_pps().
Fixes: VS-FF-2026-0003/poc.flv
Fixes: out of array access
Found-by: Vuln Seeker Cyber Security Team
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
3-tap [1,2,1]>>2: shared implementation body across size-specialized
entry points (8x8/16x16/32x32) to reduce code size. Fold the 3-tap
kernel into uhadd + urhadd: uhadd gives floor((prev+next)/2), then
urhadd rounds with curr to produce (prev + 2*curr + next + 2) >> 2
on 16 bytes in-place (no widen/narrow needed). Overlap-last technique
for tail avoids partial stores. Caller pads input arrays by 16 bytes
to guarantee safe over-read.
Strong smoothing (32x32): preloaded weight tables, interleaved
umull/umlal pairs (two 16-byte blocks at a time) to hide
rshrn-to-store latency, with paired st1 for 32-byte writes.
checkasm --bench --runs=15 (Apple M4, average of 3 trials):
ref_filter_3tap_8x8_8_neon: 4.1x
ref_filter_3tap_16x16_8_neon: 3.3x
ref_filter_3tap_32x32_8_neon: 2.5x
ref_filter_strong_8_neon: 1.9x
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Extract 3-tap [1,2,1]>>2 and strong intra smoothing from
intra_pred() into HEVCPredContext function pointers, preparing
for arch-specific overrides.
ref_filter_3tap[3] indexed by log2_size - 3 (sizes 8/16/32).
ref_filter_strong for 32x32 luma only.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Allows the compiler to optimize the the aliasing checks away
and saves 5376B here (GCC 15, -O3).
Also, avoid converting the stride to uint16_t for >8bpp:
stride /= sizeof(pixel) will use an unsigned division
(i.e. a logical right shift)*, which is not what is intended here.
*: If size_t is the corresponding unsigned type to ptrdiff_t
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Apple VideoToolbox is the dominant producer of hevc-alpha videos, but
early versions generates non-standard VPS extensions that fail to
parse and return AVERROR_INVALIDDATA. Fix this by returning
AVERROR_PATCHWELCOME instead of AVERROR_INVALIDDATA for unsupported
VPS extension configurations. Setting poc_lsb_not_present for the
alpha layer in the fallback path when it has no direct dependency
on the base layer, so that IDR slices on the alpha layer won't
incorrectly read pic_order_cnt_lsb.
Fix#22384
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Add NEON-optimized implementation for HEVC intra DC prediction at 8-bit
depth, supporting all block sizes (4x4 to 32x32).
DC prediction computes the average of top and left reference samples
using uaddlv, with urshr for rounded division. For luma blocks smaller
than 32x32, edge smoothing is applied: the first row and column are
blended toward the reference using (ref[i] + 3*dc + 2) >> 2 computed
entirely in the NEON domain. Fill stores use pre-computed address
patterns to break dependency chains.
Also adds the aarch64 initialization framework (Makefile, pred.c/pred.h
hooks, hevcpred_init_aarch64.c).
Speedup over C on Apple M4 (checkasm --bench):
4x4: 2.28x 8x8: 3.14x 16x16: 3.29x 32x32: 3.02x
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
We know that this is Dolby Vision Enhancement Layer and while it's not
handled, we can just reduce log spam for this, as it's if fact
recognized.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.
This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.
Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Previously, we set s->slice_initialized to 0 to prevent other slice
segments from depending on this slice segment only if hls_slice_header
failed. If decode_slice fails for some other reason, however, before
decode_slice_data is called to bring the context back into a consistent
state, then slices could depend on this slice segment while it is in an
invalid state. This can cause segmentation faults and other sorts of
nastiness. Patch fixes this by always setting s->slice_initialized to 0
while the state is inconsistent.
Resolves#11652.
For mv-hevc, the second layer of IDR frame can be a P slice.
long_term_rps wasn't been reset before the patch, which leading to
ff_hevc_frame_nb_refs return incorrect result.
This fix decoding failure for samples from Pico VR.
AVCodecParser has several fields which are not really meant
to be accessed by users, but it has no public-private
demarkation line, so these fields are technically public
and can therefore not simply be made private like
20f9727018 did for AVCodec.*
This commit therefore deprecates these fields and
schedules them to become private. All parsers have already
been switched to FFCodecParser, which (for now) is a union
of AVCodecParser and an unnamed clone of AVCodecParser
(new fields can be added at the end of this clone).
*: This is also the reason why split has never been removed despite
not being set for several years now.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The current code relies on AV_CODEC_ID_NONE being zero, so that
unused codec ids are set to their proper value. This commit adds
a macro to set unset ids to AV_CODEC_ID_NONE.
(The actual rationale for this macro is to simplify
the transition to making the private fields that are
currently public in avcodec.h really private.)
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: 439711052/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-4956250308935680
Fixes: out of array access
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: James Almer <jamrial@gmail.com>
Right now, the private contexts of every decoder supporting
H.274 film grain synthesis (namely H.264, HEVC and VVC)
contain a H274FilmGrainDatabase; said structure is very large
700442B before this commit) and takes up the overwhelming
majority of said contexts: Removing it reduces sizeof(H264Context)
by 92.88%, sizeof(HEVCContext) by 97.78% and sizeof(VVCContext)
by 99.86%. This is especially important for H.264 and HEVC
when using frame-threading.
The content of said film grain database does not depend on
any input parameter; it is shareable between all its users and
could be hardcoded in the binary (but isn't, because it is so huge).
This commit adds a database with static storage duration to h274.c
and uses it instead of the elements in the private contexts above.
It is still lazily initialized as-needed; a mutex is used
for the necessary synchronization. An alternative would be to use
an AV_ONCE to initialize the whole database either in the decoders'
init function (which would be wasteful given that most videos
don't use film grain synthesis) or in ff_h274_apply_film_grain().
Reviewed-by: Niklas Haas <ffmpeg@haasn.dev>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: use of uninitialized memory
Fixes: 378102648/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-5896308499480576
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The code uses int, unsigned int and uint16_t to store num_entry_point_offsets
This limits it to the smallest of the 3.
Alternatively uint16_t can be changed and then a larger limit used.
A Check will still be needed.
Fixes: 391974932/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-5966648879677440
Fixes: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: shift exponent 49 is too large for 32-bit type 'int'
Fixes: 398060145/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-5023082406543360
Reviewed-by: James Almer <jamrial@gmail.com>
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Abort as soon as we're done reading the slice header instead of running extra checks
that assume slice data may follow.
Signed-off-by: James Almer <jamrial@gmail.com>
Prevents printing bogus errors about the value being 0, when in fact we
overread the available slice buffer.
Signed-off-by: James Almer <jamrial@gmail.com>
All users (namely HEVC) that use ff_progress_frame_alloc()
should just use ff_thread_get_buffer(). Using
ff_progress_frame_get_buffer() is not a must; it is merely
a convenience wrapper.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is simpler, avoids several loops and also makes GCC no longer
emit bogus -Wstringop-overflow= warnings.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>