Add support for parsing and muxing smpte 2094-50 metadata. It will
be stored as an ITUT-T35 message in the BlockAdditional element with
an AddId type of 4 (which is reserved for ITUT-T35 in the matroska
spec).
https://www.matroska.org/technical/codec_specs.html#itu-t35-metadata
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Precompute the SILK NLSF residual weights from the stage-1 codebooks and use the table during LPC decode. This removes the per-coefficient mandated fixed-point weight calculation in silk_decode_lpc() while preserving the same decoded values.
The V-Nova LCEVC pipeline processes frames on internal background
worker threads. LCEVC_ReceiveDecoderPicture returns LCEVC_Again (-1)
when the worker has not yet completed the frame, which is the
documented "not ready, try again" response. The original code treated
any non-zero return as a fatal error (AVERROR_EXTERNAL), causing decode
to abort mid-stream.
Poll until LCEVC_Success or a genuine error is returned.
Signed-off-by: Peter von Kaenel <Peter.vonKaenel@harmonicinc.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Avoids the post_process_opaque_free callback; the only user of
this is already a RefStruct reference and presumably other users
would want to use a pool for this, too, so they would use
RefStruct-objects, too.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
H.264 only uses these functions with height 2 or 4 and
the aarch64, arm and mips versions of them optimize based
on this. Yet this is not true when these functions are used
by the lowres code in mpegvideo_dec.c. So revert back to
the C versions of these functions for mpegvideo_dec so that
the H.264 decoder can still use fully optimized functions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Frame side data unfortunately lacks padding, which CBS needs, so we can't reuse
the existing AVBufferRef.
Signed-off-by: James Almer <jamrial@gmail.com>
Check that the driver supports both BUFFER_OFFSET and BYTES_WRITTEN
encode feedback flags before creating the query pool, failing with
EINVAL if either is missing.
Set these flags explicitly instead of masking off HAS_OVERRIDES with a
bitwise NOT, which could pass unrecognized bits from newer drivers to
vkCreateQueryPool causing validation errors and
crashes.
Forward-declaring an enum is not legal C (the underlying type of
the enum may depend upon the enum constants, so this may cause
ABI issues with -fshort-enums); compilers warn about this
with -pedantic.
This essentially reverts 7e84865cff.
Notice that almost* all files that include codec_internal.h also
need to include avcodec.h, so this does not lead to unnecessary
rebuilds.
This addresses part of #22684.
*: The only file I am aware of that defines an FFCodec and does not
need AVCodecContext as complete type is null.c (but even it already
includes it implicitly); the avcodec.c test tool seems to be the only
file where this commit actually leads to an unnecessary avcodec.h
inclusion.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
They have been superseded by SSSE3; the SSE2 version was even disabled
(and segfaults if enabled).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Compared to the MMX version, this version benefits from wider
registers and pmaddubsw. It also has fewer unnecessary loads
and stores: On x64, the MMX version has 12 unnecessary GPR loads
and 6 stores in each line when width is eight; for width 16,
there are 17 unnecessary GPR loads and six stores per line.
Even the 32bit SSSE3 version only has six loads and zero stores
per line more than the x64 version. Furthermore, in contrast
to the MMX version, the SSSE3 version also does not clobber
the array of block pointers given to it.
Benchmarks:
inner_add_yblock_2_c: 29.2 ( 1.00x)
inner_add_yblock_2_mmx: 32.5 ( 0.90x)
inner_add_yblock_2_ssse3: 28.6 ( 1.02x)
inner_add_yblock_4_c: 85.2 ( 1.00x)
inner_add_yblock_4_mmx: 89.2 ( 0.96x)
inner_add_yblock_4_ssse3: 84.5 ( 1.01x)
inner_add_yblock_8_c: 302.0 ( 1.00x)
inner_add_yblock_8_mmx: 77.0 ( 3.92x)
inner_add_yblock_8_ssse3: 30.6 ( 9.85x)
inner_add_yblock_16_c: 1164.7 ( 1.00x)
inner_add_yblock_16_mmx: 260.4 ( 4.47x)
inner_add_yblock_16_ssse3: 82.3 (14.15x)
Both the MMX and SSSE3 versions leave the size 2 and 4 cases
to ff_snow_inner_add_yblock_c() (but the MMX version has
a prologue at the beginning that it needs to undo before
the call, leading to the higher overhead for these sizes).
I don't know why the SSSE3 version is marginally faster than
the C version in these cases.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is unnecessary and avoids the src_y parameter;
it also makes this function more ASM-friendly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The input lines used in ff_snow_inner_add_yblock()
must always be set (because their values are used).
The MMX assembly always relied on this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This has been done in 561a18d3ba
in order to avoid shifts, yet this rationale no longer applies
since d593e32983. So shift them back;
this is in preparation for using these coefficients together with
pmaddubsw.
Hint: 561a18d3ba also added a block
guarded by "if(LOG2_OBMC_MAX == 8". I changed the condition to remove
this check (i.e. kept the block) which should not change the output
at all. Yet all FATE tests pass if the block is completely
removed. I don't know if this block is necessary at all.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Forgotten in 6a551f1405.
Also fix the comment claiming that there are MMXEXT functions
in this file.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Write the 24-bit vpcC flags field at the current cursor position after
the version byte. The previous code wrote to p+1 instead of p, leaving
one byte uninitialized between version and flags and shifting all
subsequent fields (profile, level, bitdepth, etc.) by one byte.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Return the actual find_sei_end() error when SEI appending fails instead of
reusing the previous status code. This preserves the real parse failure for
callers instead of reporting malformed SEI handling as success.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
More about deprecating MMX than any performance gain; nearly identical
performance numbers on my Zen 4 (1.36x vs c), but llvm-mca predicts
>60% perf gain on Intel CPUs newer than Skylake.
Signed-off-by: Zuxy Meng <zuxy.meng@gmail.com>
ff_vk_find_struct returns const void *, so storing it in const void *drm_create_pnext
fixes the initialization warning but then dpb_hwfc->create_pnext = drm_create_pnext
assigns const void * to void *, triggering the same warning at that line. The right
fix is a (void *) cast at the call site, same as done for buf_pnext.
Also restrict the GetPhysicalDeviceImageFormatProperties2 verbose log in
try_export_flags to the DRM modifier path only: when has_mods is false the log
always printed mod[0]=0x0, which is misleading since no DRM modifier is involved.
Signed-off-by: Tymur Boiko <tboiko@nvidia.com>
When mapping Vulkan Video frames to DMA-BUF, synchronize using an exportable
binary semaphore and sync_fd where supported. Submit a lightweight exec that
waits on each plane's timeline semaphore at the current value, signals a
SYNC_FD-exportable binary semaphore, then export with vkGetSemaphoreFdKHR.
Store that binary semaphore in AVVkFrameInternal and reuse it across maps
instead of creating and destroying each time: for
VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_SYNC_FD_BIT, copy transference means a
successful vkGetSemaphoreFdKHR unsignals the semaphore like a wait, so it can
be signaled again on the next map submit. If export is unavailable, fall back
to vkWaitSemaphores.
Moved drm_sync_sem destroy to vulkan_free_internal
Export dma-buf fds with GetMemoryFdKHR for each populated f->mem[i], iterating
up to the sw_format plane count instead of stopping at the image count, so
multi-memory bindings are not skipped. Describe DRM layers using
max(sw planes, image count) and query subresource layout with the correct
aspect and image index when one VkImage backs multiple planes. Reference the
source hw_frames_ctx on the mapped frame and close dma-buf fds on failure paths.
For DMA-BUF-capable pools, honor VK_EXTERNAL_MEMORY_FEATURE_DEDICATED_ONLY_BIT
from format export queries when binding memory. With DRM modifiers and a
video profile in create_pnext, preserve caller usage and image flags instead of
overwriting them from generic supported_usage probing; use the modifier list
create info when probing export flags for modifier tiling.
Include VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR from the output frames
context's usage together with DST (fixes
VUID-VkVideoBeginCodingInfoKHR-slotIndex-07245) instead of adding DPB usage
only when !is_current.
In ff_vk_decode_add_slice, pass VkVideoProfileListInfoKHR (from the output
frames context's create_pnext) as the pNext argument to
ff_vk_get_pooled_buffer instead of the full create_pnext chain. In
ff_vk_frame_params, set tiling to OPTIMAL only when it is not already
DRM_FORMAT_MODIFIER_EXT. In ff_vk_decode_init, when the output pool's
create_pnext includes VkImageDrmFormatModifierListCreateInfoEXT, initialize the
DPB pool with that modifier-list pNext and DRM_FORMAT_MODIFIER_EXT tiling;
otherwise use VkVideoProfileListInfoKHR and OPTIMAL as before. When
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR is unset, the output
and DPB pools cannot use different layouts or tiling, so the DPB pool must
match the output pool.
Also fix av_hwframe_map ioctl sync_fd export, multi-planar semaphore handling,
and related failure-path cleanup.
Signed-off-by: Tymur Boiko <tboiko@nvidia.com>
Add a native encoder for the Playdate PDV format.
Supports monob (1-bit) video, producing zlib-compressed intra frames
and XOR-based delta frames.
Includes bounds checking, overflow guards, correct linesize handling
using ptrdiff_t, and proper buffer allocation ordering.
Mark the encoder as experimental by setting AV_CODEC_CAP_EXPERIMENTAL,
since it has not been validated against Panic's official Playdate
player or SDK.
It always returns zero which none of the callers check,
so just return nothing instead.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
With this commit, the RV30 and RV40 decoders no longer clobber
the fpu state for normal decoding (only error resilience can
still do so).
rv34_idct_add_c: 58.1 ( 1.00x)
rv34_idct_add_mmxext: 16.5 ( 3.52x)
rv34_idct_add_ssse3: 12.2 ( 4.76x)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>