SVT-AV1 < 3.0.0 requires input dimensions of at least 64x64.
Older versions may otherwise silently accept smaller inputs without
producing output and cause the caller to hang. Reject such inputs
explicitly in config_enc_params() to produce a clear error.
v3.0.0+ supports sub-64px dimensions and validates the
input itself, so the check is gated with SVT_AV1_CHECK_VERSION.
Fix#22817
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Port lfe_fir0_float and lfe_fir1_float to AArch64 NEON. These polyphase
FIR interpolation filters have an x86 SSE/AVX path but no AArch64
equivalent, falling back to scalar C.
The inner loop computes two dot products per output pair. Precomputing a
reversed LFE sample vector before the inner loop avoids per-iteration
shuffle overhead.
Benchmarks on AWS Graviton3 (Neoverse V1, c7g.xlarge):
lfe_fir0_float: C 5902.0 cycles -> NEON 2135.0 cycles (2.77x)
lfe_fir1_float: C 2836.3 cycles -> NEON 1527.8 cycles (1.86x)
Measured with: taskset -c 0 ./tests/checkasm/checkasm --test=dcadsp --bench,
3-run average, Ubuntu 22.04 (kernel 6.8.0-1052-aws), perf_event_paranoid=0.
Signed-off-by: Jeongkeun Kim <variety0724@gmail.com>
NV_ENC_H264_PROFILE_HIGH_10 and NV_ENC_HEVC_PROFILE_MULTIVIEW_MAIN
both equal 3 when their respective NVENC_HAVE_* flags are defined.
The MVHEVC check in nvenc_check_capabilities() matches against
ctx->profile alone, so an H.264 encode with profile=high10 is
rejected as if it were an HEVC multiview request on hardware
without MVHEVC support.
Signed-off-by: Semih Baskan <strst.gs@gmail.com>
mf_encv_input_adjust() currently only validates the pixel format and
otherwise leaves the input IMFMediaType unchanged. The Microsoft
H.264, H.265 and AV1 encoder MFTs tolerate this and internally infer
the missing attributes from the previously-set output type. Other
MediaFoundation encoder MFTs that follow the specification more
strictly reject the input type with MF_E_INVALIDMEDIATYPE (due to
MF_E_ATTRIBUTENOTFOUND on MF_MT_FRAME_SIZE / MF_MT_FRAME_RATE) when
those attributes are absent, which causes IMFTransform::SetInputType
to fail and aborts encoding.
Set MF_MT_FRAME_SIZE, MF_MT_FRAME_RATE and MF_MT_INTERLACE_MODE on
the input media type, mirroring what mf_encv_output_adjust() already
writes to the output type. Behaviour on the Microsoft MFTs is
unchanged (they were already using these values) and encoding now
works with stricter third-party MFTs.
The MF_MT_FRAME_SIZE assignment has been present but commented out
since the original MediaFoundation wrapper was added in 050b72ab5e.
Signed-off-by: Ashrit Shetty <ashritshetty@microsoft.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Monochrome formats (gray, gray10le) have log2_chroma_w == 0 and
log2_chroma_h == 0 because they have no chroma planes — the same
values as YUV444. This caused them to be misclassified as YUV444 by
the is_yuv444 detection introduced in bcea693f75.
After fed6612415 changed cuvid_test_capabilities to use is_yuv444
instead of hardcoding cudaVideoChromaFormat_420, monochrome AV1
streams were rejected with "Codec av1_cuvid is not supported with
this chroma format".
Add an nb_components > 1 guard to exclude single-component formats
from the YUV444 path.
Patch by: Aniket Dhok <adhok@nvidia.com>
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
AV1CodecConfigurationRecord may contain only the 4-byte header and no
configOBUs. Still skip the header in that case so only configOBUs are
passed to cuvidParseVideoData().
Otherwise the av1C header itself is treated as sequence header data
and AV1 decoding can fail with an unknown error.
Suggested-by: Aniket Dhok <adhok@nvidia.com>
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
3-tap [1,2,1]>>2: shared implementation body across size-specialized
entry points (8x8/16x16/32x32) to reduce code size. Fold the 3-tap
kernel into uhadd + urhadd: uhadd gives floor((prev+next)/2), then
urhadd rounds with curr to produce (prev + 2*curr + next + 2) >> 2
on 16 bytes in-place (no widen/narrow needed). Overlap-last technique
for tail avoids partial stores. Caller pads input arrays by 16 bytes
to guarantee safe over-read.
Strong smoothing (32x32): preloaded weight tables, interleaved
umull/umlal pairs (two 16-byte blocks at a time) to hide
rshrn-to-store latency, with paired st1 for 32-byte writes.
checkasm --bench --runs=15 (Apple M4, average of 3 trials):
ref_filter_3tap_8x8_8_neon: 4.1x
ref_filter_3tap_16x16_8_neon: 3.3x
ref_filter_3tap_32x32_8_neon: 2.5x
ref_filter_strong_8_neon: 1.9x
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Extract 3-tap [1,2,1]>>2 and strong intra smoothing from
intra_pred() into HEVCPredContext function pointers, preparing
for arch-specific overrides.
ref_filter_3tap[3] indexed by log2_size - 3 (sizes 8/16/32).
ref_filter_strong for 32x32 luma only.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Deprecating MMX w/o performance regression; nearly identical performance
numbers on my Zen 4 (1.99x vs c)
Signed-off-by: Zuxy Meng <zuxy.meng@gmail.com>