SVT-AV1 < 3.0.0 requires input dimensions of at least 64x64.
Older versions may otherwise silently accept smaller inputs without
producing output and cause the caller to hang. Reject such inputs
explicitly in config_enc_params() to produce a clear error.
v3.0.0+ supports sub-64px dimensions and validates the
input itself, so the check is gated with SVT_AV1_CHECK_VERSION.
Fix#22817
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Port lfe_fir0_float and lfe_fir1_float to AArch64 NEON. These polyphase
FIR interpolation filters have an x86 SSE/AVX path but no AArch64
equivalent, falling back to scalar C.
The inner loop computes two dot products per output pair. Precomputing a
reversed LFE sample vector before the inner loop avoids per-iteration
shuffle overhead.
Benchmarks on AWS Graviton3 (Neoverse V1, c7g.xlarge):
lfe_fir0_float: C 5902.0 cycles -> NEON 2135.0 cycles (2.77x)
lfe_fir1_float: C 2836.3 cycles -> NEON 1527.8 cycles (1.86x)
Measured with: taskset -c 0 ./tests/checkasm/checkasm --test=dcadsp --bench,
3-run average, Ubuntu 22.04 (kernel 6.8.0-1052-aws), perf_event_paranoid=0.
Signed-off-by: Jeongkeun Kim <variety0724@gmail.com>
NV_ENC_H264_PROFILE_HIGH_10 and NV_ENC_HEVC_PROFILE_MULTIVIEW_MAIN
both equal 3 when their respective NVENC_HAVE_* flags are defined.
The MVHEVC check in nvenc_check_capabilities() matches against
ctx->profile alone, so an H.264 encode with profile=high10 is
rejected as if it were an HEVC multiview request on hardware
without MVHEVC support.
Signed-off-by: Semih Baskan <strst.gs@gmail.com>
mf_encv_input_adjust() currently only validates the pixel format and
otherwise leaves the input IMFMediaType unchanged. The Microsoft
H.264, H.265 and AV1 encoder MFTs tolerate this and internally infer
the missing attributes from the previously-set output type. Other
MediaFoundation encoder MFTs that follow the specification more
strictly reject the input type with MF_E_INVALIDMEDIATYPE (due to
MF_E_ATTRIBUTENOTFOUND on MF_MT_FRAME_SIZE / MF_MT_FRAME_RATE) when
those attributes are absent, which causes IMFTransform::SetInputType
to fail and aborts encoding.
Set MF_MT_FRAME_SIZE, MF_MT_FRAME_RATE and MF_MT_INTERLACE_MODE on
the input media type, mirroring what mf_encv_output_adjust() already
writes to the output type. Behaviour on the Microsoft MFTs is
unchanged (they were already using these values) and encoding now
works with stricter third-party MFTs.
The MF_MT_FRAME_SIZE assignment has been present but commented out
since the original MediaFoundation wrapper was added in 050b72ab5e.
Signed-off-by: Ashrit Shetty <ashritshetty@microsoft.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Monochrome formats (gray, gray10le) have log2_chroma_w == 0 and
log2_chroma_h == 0 because they have no chroma planes — the same
values as YUV444. This caused them to be misclassified as YUV444 by
the is_yuv444 detection introduced in bcea693f75.
After fed6612415 changed cuvid_test_capabilities to use is_yuv444
instead of hardcoding cudaVideoChromaFormat_420, monochrome AV1
streams were rejected with "Codec av1_cuvid is not supported with
this chroma format".
Add an nb_components > 1 guard to exclude single-component formats
from the YUV444 path.
Patch by: Aniket Dhok <adhok@nvidia.com>
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
AV1CodecConfigurationRecord may contain only the 4-byte header and no
configOBUs. Still skip the header in that case so only configOBUs are
passed to cuvidParseVideoData().
Otherwise the av1C header itself is treated as sequence header data
and AV1 decoding can fail with an unknown error.
Suggested-by: Aniket Dhok <adhok@nvidia.com>
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
3-tap [1,2,1]>>2: shared implementation body across size-specialized
entry points (8x8/16x16/32x32) to reduce code size. Fold the 3-tap
kernel into uhadd + urhadd: uhadd gives floor((prev+next)/2), then
urhadd rounds with curr to produce (prev + 2*curr + next + 2) >> 2
on 16 bytes in-place (no widen/narrow needed). Overlap-last technique
for tail avoids partial stores. Caller pads input arrays by 16 bytes
to guarantee safe over-read.
Strong smoothing (32x32): preloaded weight tables, interleaved
umull/umlal pairs (two 16-byte blocks at a time) to hide
rshrn-to-store latency, with paired st1 for 32-byte writes.
checkasm --bench --runs=15 (Apple M4, average of 3 trials):
ref_filter_3tap_8x8_8_neon: 4.1x
ref_filter_3tap_16x16_8_neon: 3.3x
ref_filter_3tap_32x32_8_neon: 2.5x
ref_filter_strong_8_neon: 1.9x
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Extract 3-tap [1,2,1]>>2 and strong intra smoothing from
intra_pred() into HEVCPredContext function pointers, preparing
for arch-specific overrides.
ref_filter_3tap[3] indexed by log2_size - 3 (sizes 8/16/32).
ref_filter_strong for 32x32 luma only.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Deprecating MMX w/o performance regression; nearly identical performance
numbers on my Zen 4 (1.99x vs c)
Signed-off-by: Zuxy Meng <zuxy.meng@gmail.com>
This change should improve performance on Skylake and later
Intel CPUs (which have only half the ports for saturated adds/subs
for mmx register compared to xmm register): llvm-mca predicts
a 25% performance improvement on Skylake.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows to use pavgb to reduce the amount of instructions used
to calculate the average; processing two rows via movhps allows
to reduce the amount of pxor and pavgb even further and turned
out to be beneficial.
This patch also avoids a load as the constant used here can be easily
generated at runtime.
Old benchmarks:
put_no_rnd_pixels_l2_c: 13.3 ( 1.00x)
put_no_rnd_pixels_l2_mmx: 11.6 ( 1.15x)
New benchmarks:
put_no_rnd_pixels_l2_c: 13.4 ( 1.00x)
put_no_rnd_pixels_l2_sse2: 7.5 ( 1.77x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows to avoid the stack for the 8 bit simple IDCT;
for the other IDCTs, it avoids storing and restoring two
xmm registers on Win64.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
We reject inputs that are significantly smaller than the smallest frame.
This check raises the minimum input needed before time consuming computations are performed
it thus improves the computation per input byte and reduces the potential DoS impact
Fixes: Timeout
Fixes: 472769364/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SVQ1_DEC_fuzzer-5519737145851904
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Add support for parsing and muxing smpte 2094-50 metadata. It will
be stored as an ITUT-T35 message in the BlockAdditional element with
an AddId type of 4 (which is reserved for ITUT-T35 in the matroska
spec).
https://www.matroska.org/technical/codec_specs.html#itu-t35-metadata
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Precompute the SILK NLSF residual weights from the stage-1 codebooks and use the table during LPC decode. This removes the per-coefficient mandated fixed-point weight calculation in silk_decode_lpc() while preserving the same decoded values.
The V-Nova LCEVC pipeline processes frames on internal background
worker threads. LCEVC_ReceiveDecoderPicture returns LCEVC_Again (-1)
when the worker has not yet completed the frame, which is the
documented "not ready, try again" response. The original code treated
any non-zero return as a fatal error (AVERROR_EXTERNAL), causing decode
to abort mid-stream.
Poll until LCEVC_Success or a genuine error is returned.
Signed-off-by: Peter von Kaenel <Peter.vonKaenel@harmonicinc.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Avoids the post_process_opaque_free callback; the only user of
this is already a RefStruct reference and presumably other users
would want to use a pool for this, too, so they would use
RefStruct-objects, too.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
H.264 only uses these functions with height 2 or 4 and
the aarch64, arm and mips versions of them optimize based
on this. Yet this is not true when these functions are used
by the lowres code in mpegvideo_dec.c. So revert back to
the C versions of these functions for mpegvideo_dec so that
the H.264 decoder can still use fully optimized functions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Frame side data unfortunately lacks padding, which CBS needs, so we can't reuse
the existing AVBufferRef.
Signed-off-by: James Almer <jamrial@gmail.com>
Check that the driver supports both BUFFER_OFFSET and BYTES_WRITTEN
encode feedback flags before creating the query pool, failing with
EINVAL if either is missing.
Set these flags explicitly instead of masking off HAS_OVERRIDES with a
bitwise NOT, which could pass unrecognized bits from newer drivers to
vkCreateQueryPool causing validation errors and
crashes.