ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-04-18 16:40:23 +00:00

Author	SHA1	Message	Date
Tymur Boiko	f7ca6f7481	vulkan: fix -Wdiscarded-qualifiers warning and misleading DRM modifier log ff_vk_find_struct returns const void , so storing it in const void drm_create_pnext fixes the initialization warning but then dpb_hwfc->create_pnext = drm_create_pnext assigns const void * to void , triggering the same warning at that line. The right fix is a (void ) cast at the call site, same as done for buf_pnext. Also restrict the GetPhysicalDeviceImageFormatProperties2 verbose log in try_export_flags to the DRM modifier path only: when has_mods is false the log always printed mod[0]=0x0, which is misleading since no DRM modifier is involved. Signed-off-by: Tymur Boiko <tboiko@nvidia.com>	2026-04-11 12:50:07 +00:00
Tymur Boiko	25e187f849	vulkan: fix DRM map, decode barriers, and video frame setup for modifier output When mapping Vulkan Video frames to DMA-BUF, synchronize using an exportable binary semaphore and sync_fd where supported. Submit a lightweight exec that waits on each plane's timeline semaphore at the current value, signals a SYNC_FD-exportable binary semaphore, then export with vkGetSemaphoreFdKHR. Store that binary semaphore in AVVkFrameInternal and reuse it across maps instead of creating and destroying each time: for VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_SYNC_FD_BIT, copy transference means a successful vkGetSemaphoreFdKHR unsignals the semaphore like a wait, so it can be signaled again on the next map submit. If export is unavailable, fall back to vkWaitSemaphores. Moved drm_sync_sem destroy to vulkan_free_internal Export dma-buf fds with GetMemoryFdKHR for each populated f->mem[i], iterating up to the sw_format plane count instead of stopping at the image count, so multi-memory bindings are not skipped. Describe DRM layers using max(sw planes, image count) and query subresource layout with the correct aspect and image index when one VkImage backs multiple planes. Reference the source hw_frames_ctx on the mapped frame and close dma-buf fds on failure paths. For DMA-BUF-capable pools, honor VK_EXTERNAL_MEMORY_FEATURE_DEDICATED_ONLY_BIT from format export queries when binding memory. With DRM modifiers and a video profile in create_pnext, preserve caller usage and image flags instead of overwriting them from generic supported_usage probing; use the modifier list create info when probing export flags for modifier tiling. Include VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR from the output frames context's usage together with DST (fixes VUID-VkVideoBeginCodingInfoKHR-slotIndex-07245) instead of adding DPB usage only when !is_current. In ff_vk_decode_add_slice, pass VkVideoProfileListInfoKHR (from the output frames context's create_pnext) as the pNext argument to ff_vk_get_pooled_buffer instead of the full create_pnext chain. In ff_vk_frame_params, set tiling to OPTIMAL only when it is not already DRM_FORMAT_MODIFIER_EXT. In ff_vk_decode_init, when the output pool's create_pnext includes VkImageDrmFormatModifierListCreateInfoEXT, initialize the DPB pool with that modifier-list pNext and DRM_FORMAT_MODIFIER_EXT tiling; otherwise use VkVideoProfileListInfoKHR and OPTIMAL as before. When VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR is unset, the output and DPB pools cannot use different layouts or tiling, so the DPB pool must match the output pool. Also fix av_hwframe_map ioctl sync_fd export, multi-planar semaphore handling, and related failure-path cleanup. Signed-off-by: Tymur Boiko <tboiko@nvidia.com>	2026-04-10 11:39:40 +00:00
Lynne	713e3c4f91	vulkan_decode: do not align single-plane images to subsampling Unlike multiplane images, single-plane images do not need to be aligned to chroma width. Saves a bit of memory.	2026-01-19 16:37:16 +01:00
Kacper Michajłow	0a8b915b04	avcodec/vulkan_decode: fix logic error when checking for encode support Both FF_VK_EXT_VIDEO_ENCODE_QUEUE and FF_VK_EXT_VIDEO_MAINTENANCE_1 are required, not only one of them. Found by VVL. Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-12-31 10:30:36 +00:00
Benjamin Cheng	531e184944	lavc/vulkan_video: Drop sampler With the limiting of video usages on the image views, we no longer need a yuv sampler, since no multi-plane image will be created with the SAMPLED usage bit.	2025-12-30 14:39:08 -05:00
Benjamin Cheng	24db09a881	lavc/vulkan_video: Restrict usages for image views These image views are used only internally for video coding, so they do not need all the usages of the image it's created from.	2025-12-30 14:15:46 -05:00
Lynne	5bb9cd23b7	vulkan_dpx: fix GRAY16BE and big-endian marked 8-bit samples	2025-12-13 21:35:56 +01:00
Lynne	72e83b42d1	vulkan_decode: clean up decoder initialization Now that we don't reset on every seek, we can simplify it.	2025-12-13 19:12:24 +01:00
averne	c384b1e803	vulkan/prores: use vkCmdClearColorImage The VK spec forbids using clear commands on YUV images, so we need to allocate separate per-plane images. This removes the need for a separate reset shader.	2025-12-07 18:17:36 +00:00
Lynne	531ce713a0	dpxdec: add a Vulkan hwaccel	2025-11-26 15:16:43 +01:00
Lynne	22cc958c58	Revert "hwcontext_vulkan: fix grayscale 10 and 12-bit formats using the new MSB formats" This reverts commit `471acedec2`.	2025-11-06 21:44:13 +01:00
Lynne	2c7732a676	Revert "hwcontext_vulkan: fix 3-plane 444 10 and 12-bit formats using the new MSB formats" This reverts commit `41ecb203c5`.	2025-11-06 21:44:13 +01:00
Lynne	38df9ba71b	Revert "hwcontext_vulkan: fix planar 10 and 12-bit RGB formats using the new MSB formats" This reverts commit `98ee3f6718`.	2025-11-06 21:44:13 +01:00
Lynne	3cd678506c	vulkan_decode: align images to the subsampling Normally, the Vulkan drivers handle this. But Vulkan decided "nah". This requires API users to crop out odd-numbered images with subsampling.	2025-10-28 07:11:26 +01:00
Lynne	98ee3f6718	hwcontext_vulkan: fix planar 10 and 12-bit RGB formats using the new MSB formats	2025-10-27 22:59:41 -03:00
Lynne	41ecb203c5	hwcontext_vulkan: fix 3-plane 444 10 and 12-bit formats using the new MSB formats We previously tried to fudge this somehow, but the pixel formats are simply broken and we cannot use them without declaring them as MSB.	2025-10-27 22:59:41 -03:00
Lynne	471acedec2	hwcontext_vulkan: fix grayscale 10 and 12-bit formats using the new MSB formats	2025-10-27 22:59:41 -03:00
averne	98412edfed	lavc: add a ProRes Vulkan hwaccel Add a shader-based Apple ProRes decoder. It supports all codec features for profiles up to the 4444 XQ profile, ie.: - 4:2:2 and 4:4:4 chroma subsampling - 10- and 12-bit component depth - Interlacing - Alpha The implementation consists in two shaders: the VLD kernel does entropy decoding for color/alpha, and the IDCT kernel performs the inverse transform on color components. Benchmarks for a 4k yuv422p10 sample: - AMD Radeon 6700XT: 178 fps - Intel i7 Tiger Lake: 37 fps - NVidia Orin Nano: 70 fps	2025-10-25 19:54:13 +00:00
Lynne	eb9e000584	vulkan_decode: add ifdefs around VP9 definitions and privatize profile struct The struct is not referenced anywhere else.	2025-08-08 15:07:33 +00:00
Benjamin Cheng	f7a5128109	vulkan_av1: Fix frame threading Basically do the same thing that was done for VP9, and remove the vestigial frame_id_alloc_mask in the context.	2025-08-08 14:45:58 +00:00
Lynne	75aeffb1c6	lavc: add a ProRes RAW Vulkan hwaccel This commit adds a ProRes RAW hardware implementation written in Vulkan. Both version 0 and version 1 streams are supported. The implementation is highly parallelized, with 512 invocations dispatched per every tile, with generally 4k tiles on a 5.8k stream. Thanks to unlord for the 8-point iDCT. Benchmark for a generic 5.8k RAW HQ file: 6900XT: 63fps 7900XTX: 84fps 6000 Ada: 120fps Intel: 9fps	2025-08-08 18:29:41 +09:00
Lynne	2caf23e7c4	vp9: add Vulkan VP9 hwaccel	2025-08-08 18:29:40 +09:00
Timo Rothenpieler	262d41c804	all: fix typos found by codespell	2025-08-03 13:48:47 +02:00
Lynne	7b45d9c5fd	vulkan_ffv1: pipe through slice decoding status	2025-05-20 19:53:02 +09:00
Lynne	ec3f3457fd	vulkan_decode: add STORAGE flag to output images In filtering, and SDR encoding, we use storage images. This fixes using Vulkan filters on Intel. Tested not to break anything on the three major vendors.	2025-04-19 10:59:16 +02:00
Lynne	193610d9ba	vulkan_decode: allow using NULL offsets/nb_slices in ff_vk_decode_add_slice() For codecs like VP9 which use a single slice.	2025-03-27 17:22:11 +01:00
Lynne	5fc4acae9c	vulkan_decode: allow using NULL sequence_params when decoding The function had some checks to allow for this, but as it always tried to dereference a bufferref, it wasn't fully ready.	2025-03-27 17:22:11 +01:00
Lynne	6bad55eb17	ffv1: add a Vulkan-based decoder This patch adds a fully-featured level 3 and 4 decoder for FFv1, supporting Golomb and all Range coding variants, all pixel formats, and all features, except for the newly added floating-point formats. On a 6000 Ada, for 3840x2160 bgr0 content at 50Mbps (standard desktop recording), it is able to do 400fps. An Alder Lake with 24 threads can barely do 100fps.	2025-03-17 08:51:23 +01:00
Lynne	31176b16ac	vulkan_decode: use VK_KHR_video_maintenance2 if available	2025-03-17 08:49:12 +01:00
Lynne	e15e85b869	vulkan_decode: adjust number of async contexts created This caps the number of contexts we create based on thread count. This saves VRAM and filters out cases where more async is of lesser benefit.	2025-03-17 08:49:11 +01:00
Lynne	4495802bdb	vulkan_decode: support multiple image views Enables non-monochrome video decoding using all our existing functions in the context of an SDR decoder.	2025-03-17 08:49:11 +01:00
Lynne	491b65e343	vulkan_decode: support software-defined decoders	2025-03-17 08:49:11 +01:00
Lynne	551041e384	vulkan_decode: remove informative queries We queried the decoder whether it was able to decode sucessfully, but since we operated asynchronously, we weren't able to do anything with this information but let the user know decoding failed for the previous frame(s). Since we parse the slice headers ourselves and we're reasonably sure we can decode before actually starting to decode, this was rarely triggered on corrupt data, and hardware's understanding of whether there was an error or not is vague. There's also a semantic problem with our use of the queries - if there's a seek, we flush, but what happens to the queries is vague according to the spec. Most hardware dealt fine, since queries are nothing more than GPU memory with integers stored. But with Intel, they seem to be more of a register to which a driver must keep track of, leading to issues if there's been a reset (seek) and we query the previous submission before the seek. Just get rid of them. The query code is still used in encoding. This fixes seeking with HEVC and AV1 on Intel.	2025-01-03 14:53:41 +09:00
Lynne	8fbecfd1a0	vulkan_decode: add queue_flags field to specify queue used	2024-12-23 04:25:09 +09:00
Lynne	2e06b84e27	vulkan: do not reinvent a queue context struct We recently introduced a public field which was a superset of the queue context we used to have. Switch to using it entirely. This also allows us to get rid of the NIH function which was valid only for video queues.	2024-12-23 04:25:09 +09:00
Lynne	7239be07be	vulkan_decode: use a single execution pool Originally, the decoder had a single execution pool, with one execution context per thread. Execution pools were always intended to be thread-safe, as long as there were enough execution contexts in the pool to satisfy all threads. Due to synchronization issues, the threading part was removed at some point, and, for decoding, each thread had its own execution pool. Having a single execution pool per context is hacky, not to mention wasteful. Most importantly, we cannot associate single shaders across multiple execution pools for a single application. This means that we cannot use shaders to either apply film grain, or use this framework for software-defined decoders. The recent commits added threading capabilities back to the execution pool, and the number of contexts in each pool was increased. This was done with the assumption that the execution pool was singular, which it was not. This led to increased parallelism and number of frames in flight, which is taxing on memory. This commit finally restores proper threading behaviour. The validation layer has isses that are reported and addressed in the earlier commit.	2024-12-23 04:25:08 +09:00
Anton Khirnov	56ba57b672	lavc/refstruct: move to lavu and make public It is highly versatile and generally useful.	2024-12-15 14:03:47 +01:00
Lynne	41f65b7326	vulkan_decode: ensure there's at least one context per decode thread Otherwise, what may happen is that 2 threads will both write into the same context.	2024-11-28 01:29:21 +09:00
Lynne	a5e6860a89	vulkan_decode: fix counting for parallelism ff_vk_exec_pool_init used to multiply the number by the number of queues, but that got changed, yet this use of the function was not updated.	2024-11-28 01:29:15 +09:00
Lynne	37d5cb84e8	vulkan: check if current buffer has finished execution before picking another This saves resources, as dependencies are freed/reclaimed with a lower latency, and provies a speedup.	2024-10-04 10:10:42 +02:00
Lynne	5e9845f11e	vulkan(_decode): fix, simplify and improve queries The old query code never worked properly, and did some hideous heuristics to read the status bit, and work that into a return code. This is all best left to callers to do, which simplifies our code a lot. This also fixes minor validation errors regarding calling queries which are not in their active state.	2024-09-09 07:05:46 +02:00
Lynne	9c65325819	vulkan_decode: use ff_vk_init This solves the issue of an av_log function being called with a context with invalid class. Co-authored-by: Anton Khirnov <anton@khirnov.net>	2024-09-09 07:05:45 +02:00
Lynne	66e950fcac	vulkan_video: move imageview creation and DPB fields to common context Shared between decoders and encoders.	2024-09-09 07:05:44 +02:00
Lynne	18d964fc2c	vulkan: enable encoding of images if video_maintenance1 is enabled Vulkan encoding was designed in a very... consolidated way. You had to know the exact codec and profile that the image was going to eventually be encoded as at... image creation time. Unfortunately, as good as our code is, glimpsing into the exact future isn't what its capable of. video_maintenance1 removed that requirement, which only then made encoding images practically possible.	2024-08-16 01:22:16 +02:00
Lynne	869f4aec48	vulkan_decode: use the correct queue family for decoding ops In `680d969a30`, the new API was used to find a queue family for dispatch, but the found queue family was not used for decoding, just for dispatching.	2024-08-16 01:22:08 +02:00
Lynne	680d969a30	vulkan_decode: port to the new queue family API	2024-08-11 05:13:16 +02:00
Lynne	1c05661ec4	vulkan_decode: add \n to error message	2024-08-11 05:13:15 +02:00
Lynne	ca591e6b50	vulkan_decode: force layered_dpb to 0 when dedicated_dpb is 0 layered_dpb only makes sense when dedicated_dpb is set to 1. For some mysterious reason, some Nvidia drivers stopped indicating SEPARATE_REFRENCES, but kept the COINCIDE flag, which broke the code.	2024-08-11 05:13:14 +02:00
Lynne	6757cdb535	vulkan_video: remove NIH pooled buffer implementation The code predates ff_vk_get_pooled_buffer().	2024-08-11 05:13:10 +02:00
Lynne	db09f1a5d8	vulkan_av1: add workaround for NVIDIA drivers tested on broken CTS The first release of the CTS for AV1 decoding had incorrect offsets for the OrderHints values. The CTS will be fixed, and eventually, the drivers will be updated to the proper spec-conforming behaviour, but we still need to add a workaround as this will take months. Only NVIDIA use these values at all, so limit the workaround to only NVIDIA. Also, other vendors don't tend to provide accurate CTS information.	2024-04-15 02:40:02 +02:00

1 2

84 commits