NV_ENC_CLOCK_TIMESTAMP_SET was changed in SDK 13.1: countingType was
replaced by countingTypeLSB and countingTypeMSB.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
Add NEON-optimized implementations for HEVC angular intra prediction
modes 10 (pure horizontal) and 26 (pure vertical) at 8-bit depth.
Mode 10 (Horizontal):
- Broadcasts left[y] to fill each row using ld2r/ld4r for efficiency
- Applies edge smoothing for luma blocks smaller than 32x32
Mode 26 (Vertical):
- Copies top reference row to all output rows
- Applies edge smoothing for luma blocks smaller than 32x32
Edge smoothing uses uhsub+usqadd to compute the filtered result
directly in 8-bit, avoiding widening to 16-bit intermediates.
The C pred_angular wrappers are made non-static with ff_ prefix to
allow the NEON dispatch to fall back to C for modes not yet optimized.
This will be reverted once all angular modes are implemented.
Note: since pred_angular[] is a per-size function pointer (not
per-mode), checkasm benchmarks will show '_neon' for all 33 modes
even though only modes 10/26 are truly accelerated; unoptimized
modes show ~1.0x speedup as they pass through the NEON wrapper to
the C fallback with negligible overhead.
Speedup over C on Apple M4 (checkasm --bench, 15-run average):
Mode 10 (Horizontal):
4x4: 4.66x 8x8: 5.80x 16x16: 16.86x 32x32: 24.89x
Mode 26 (Vertical):
4x4: 1.16x 8x8: 1.83x 16x16: 2.45x 32x32: 4.50x
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Set sps->vui.sar to {0,1} (unspecified) before the VUI parsing
block, matching the HEVC pattern at hevc_ps.c. The old
zero-init-to-1 workaround is now unreachable and is removed.
Suggested-by: James Almer <jamrial@gmail.com>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Per ITU-T H.264 (ISO/IEC 14496-10) Annex E.2.1 and ITU-T H.265
(ISO/IEC 23008-2) Annex E.3.1, when sar_width or sar_height is zero
the sample aspect ratio shall be considered unspecified. Internally
ffmpeg represents an unspecified SAR as 0/1, while fractions with a
zero denominator are not handled properly (den=0 is silently changed
to den=1 in h264_ps.c, turning an invalid 20480/0 into a "valid" but
impossibly extreme 20480/1); so we bridge the gap by replacing x/0
with 0/1 at the VUI parsing layer.
An av_log warning is added so an invalid SAR in the bitstream is
diagnosed rather than silently overwritten.
This fixes a problem with some video files provided by game
OddBallers when executed with Wine/Proton, which report SAR 20480/0.
Based on patch by Giovanni Mascellani <gmascellani@codeweavers.com>.
Fixes: ticket #23321
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
If this were to be checked, it should be checked generically,
not in every single encoder.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Instead use CODEC_PIXFMTS. Avoids deprecation warnings
from Clang and simplifies the removal of AVCodec.pix_fmts.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Some 7.1 DTS files seem to signal Lw/Rw channels that the decoder has been
mapping to SL/SR, despite the macro for the mask being called 7_1_WIDE.
This resulted in said samples reporting the same native layout as actual 7.1
samples with Lsr/Rsr/Lss/Rss (mapped to BL/BR/SL/SR).
If we were to be strict, Lw/Rw would map to WR/WL, but that would result in an
unusual native layout. Instead, lets map them to FLC/FRC, which will result in
the more common 7.1(wide) native layout.
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes: signed integer overflow: 314572800 * 8 cannot be represented in type 'int'
Tighten the guard to INT_MAX/14, which covers the largest expansion
factor used in the function currently.
Found-by: Jiale Yao <19888972804@163.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
fastaudio_decode() computes
subframes = pkt->size / (40 * channels);
frame->nb_samples = subframes * 256;
both as 32-bit signed multiplications. When pkt->size is large enough
to make subframes >= 2^24, the second multiplication overflows the
signed int range and frame->nb_samples wraps to a small value.
ff_get_buffer() then sizes the audio plane for that wrapped sample
count, while the decoder loop at line 152 still iterates the full
(unwrapped) subframes count, performing a 1024-byte memcpy per
subframe per channel. The 27th iteration (or first iteration with
nb_samples=0) writes one byte past the per-plane allocation,
yielding the ASan heap-buffer-overflow WRITE at libavcodec/fastaudio
.c:171 reported as ANT-2026-03891.
Reject the subframes value whose *256 product would overflow before
performing the multiplication. The bound INT_MAX / 256 (= 8388607)
keeps the existing two's-complement semantics of every reachable
input and rejects only the configurations that would have wrapped.
Reproducer: a crafted AVI declaring one mono audio chunk of
671_088_680 bytes (sparse) with the decoder forced via
'ffmpeg -c:a fastaudio -i evil.avi'.
Found-by: Anthropic agents; validated and reported by Ada Logics.
Signed-off-by: David Korczynski <david@adalogics.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
It is (arguably) a slightly better place for them and avoids
a forward declaration of enum AVExifHeaderMode which is not possible
in ISO C before C23 (and requires specifying the underlying type
with C23).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The list of codec descriptors is supposed to encompass all codec IDs;
it certainly encompasses all codec IDs used by de/encoders (this is
checked in the avcodec test program which is run via FATE).
So the avcodec_find_decoder()/avcodec_find_encoder() are pointless.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Reduces sizeof(AACPCEInfo) from 296 to 120 bytes.
This reduces .rodata by 4576B here.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Most parsers outright dislike anything being signaled as SIDE, as they expect layouts
to follow how ordering is pre-defined in non-0 channel_config values.
Signed-off-by: James Almer <jamrial@gmail.com>
The ADPCM_PSXC block loop in adpcm_decode_frame() (libavcodec/adpcm.c:
2770) iterates 'block < avpkt->size / block_align' times and, for
each block, consumes
channels * (1 + (block_align - 1) / channels)
input bytes via the *unchecked* bytestream2_get_byteu() reader. The
loop divides avpkt->size by block_align, so the loop bound is sound
only when the per-block consumption equals block_align — i.e. when
block_align is an exact multiple of channels. For any other
combination (e.g. block_align=9 with channels=8), each block consumes
more than block_align bytes; iterating avpkt->size/block_align
blocks then walks the input bytestream past avpkt->data +
avpkt->size, producing the heap-buffer-overflow READ at
libavcodec/bytestream.h:99 reported as ANT-2026-04052.
adpcm_decode_init() previously only enforced 'channels > 0' and
'block_align > 0' for PSXC. Tighten the init check to additionally
require 'block_align % channels == 0', which is the precise
invariant the decode loop depends on.
Reproducer: a crafted WAV header declaring channels=8, block_align=9
with the decoder forced via 'ffmpeg -c:a adpcm_psxc -i evil.wav'.
Found-by: Anthropic agents; validated and reported by Ada Logics.
Signed-off-by: David Korczynski <david@adalogics.com>
vc1_loop_filter() is only reached through the six C wrappers. Clang 14
keeps it out of line with plain static inline, adding a 224-byte stack
frame before the tiny bestcase path on rpi 5. gcc 12 already inlines
it.
rpi 5 clang 14:
before after
vc1_v_loop_filter4_bestcase_c 27.2 8.3 (3.3x)
vc1_h_loop_filter4_bestcase_c 26.4 10.2 (2.6x)
vc1_v_loop_filter8_bestcase_c 32.5 20.3 (1.6x)
vc1_h_loop_filter8_bestcase_c 31.7 19.5 (1.6x)
vc1_v_loop_filter16_bestcase_c 42.1 33.2 (1.3x)
vc1_h_loop_filter16_bestcase_c 41.6 25.3 (1.6x)
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The C version is faster than the previous asm with clang and gcc > 12 on
rpi5, since compiler basically does the same unroll.
sum64x5_neon: before after
Cortex-A76 (gcc 12.4): 72.3 (3.63x) 47.4 (5.56x)
Cortex-A76 (gcc 14.2): 72.3 (0.69x) 47.4 (1.05x)
Apple M1 (clang 16): 0.2 (0.98x) 0.2 (0.99x)
Signed-off-by: Zhao Zhili <quinkblack@foxmail.com>