Commit graph

24 commits

Author SHA1 Message Date
Lynne
56dea1a9e8
vulkan_ffv1: initialize only the necessary shaders on init
The decoder will reinit the hwaccel upon pixfmt/dimension changes,
so we can remove the f->use32bit and is_rgb variants of all shaders.

This speeds up init time.
2025-11-26 15:16:40 +01:00
Lynne
be9998674a
vulkan_ffv1/prores: remove unnecessary slice buffer unref
The slice buffer is already unref'd by ff_vk_decode_free_frame().
2025-11-26 15:16:40 +01:00
Lynne
615b26f1b1
vulkan_ffv1: fix swapped colors for x2bgr10 2025-11-26 15:16:40 +01:00
Lynne
9860017495
vulkan/ffv1: use u32vec2 for slice offsets
Simplifies calculations slightly.
2025-11-12 00:37:24 +01:00
Lynne
38df9ba71b Revert "hwcontext_vulkan: fix planar 10 and 12-bit RGB formats using the new MSB formats"
This reverts commit 98ee3f6718.
2025-11-06 21:44:13 +01:00
Lynne
15e82dc452 Revert "hwcontext_vulkan: remove unsupported/broken pixel formats"
This reverts commit 5b388f2838.
2025-11-06 21:44:13 +01:00
Lynne
5b388f2838 hwcontext_vulkan: remove unsupported/broken pixel formats
We have no use for 14-bit pixel formats for now, so remove support for gray14,
which was broken due to the LSB padding issue.

Similarly YUVA at 10/12 bit was broken for the same reason.
2025-10-27 22:59:41 -03:00
Lynne
98ee3f6718 hwcontext_vulkan: fix planar 10 and 12-bit RGB formats using the new MSB formats 2025-10-27 22:59:41 -03:00
averne
9e93163268 vulkan/ffv1dec: fix FFVkSPIRVCompiler leak 2025-06-11 13:30:07 +09:00
averne
f604d1093f vulkan/ffv1dec: fix leak in FFVulkanDecodeShared 2025-06-11 13:30:07 +09:00
Lynne
7b45d9c5fd
vulkan_ffv1: pipe through slice decoding status 2025-05-20 19:53:02 +09:00
Lynne
bd41838b60
ffv1enc_vulkan: switch to 2-line cache, unify prediction code 2025-05-20 19:53:01 +09:00
Lynne
29b85cd4b8
vulkan_ffv1: add cached symbol reader for AMD
Speeds up everything on AMD by 3x.
This uses 32 local invocations to load state into cache, as well
as to do the RCT faster.
2025-04-14 06:10:43 +02:00
Lynne
4d561e6a1e
vulkan_ffv1: remove need for scratch data during setup
This saves on some VRAM, but mainly allows for a more unified path.
2025-04-14 06:10:43 +02:00
Lynne
8ceabb677c
vulkan_ffv1: externalize extended lookup check
8% speedup on nvidia on 4k.
2025-04-14 06:10:43 +02:00
Lynne
77f777d925
ffv1/vulkan: redo context count tracking and quant_table_idx management
This commit also makes it possible for the encoder to choose a different
quantization table on a per-slice basis, as well as adding this capability
to the decoder.

Also, this commit fully fixes decoding of context=1 encoded files.
2025-04-14 06:10:42 +02:00
Lynne
66b8c92df2
vulkan_ffv1: cache only 2 lines when decoding RGB
This reduces the intermediate VRAM used for RGB decoding by a
factor of 100x for 6k video.
This also speeds the decoder up by 16% for 4k RGB24 and 31% for 6k video.

This is equivalent to what the software decoder does, but with less pointers.
2025-04-14 06:10:42 +02:00
Lynne
694ebe890c
vulkan_ffv1: improve buffer barrier correctness for slice state
This is likely a nanooptimization, but its more correct.
2025-04-14 06:10:42 +02:00
Lynne
6111aef533
vulkan_ffv1: fix reset shader dependencies
Without a barrier upfront, the reset shader may read data fields not
yet set by the setup shader.
2025-04-14 06:10:42 +02:00
Lynne
b72ada0a96
vulkan_ffv1: fallback to upload if mapping packet fails, fix fallback
The commit which added support for host mapping accidentally broke the
original, upload route.
For drivers without host-mapping (very few), fix it.
2025-04-14 06:10:42 +02:00
Lynne
1f09b55c94
vulkan_ffv1: allocate just as much memory for slice state as needed
Rather than always using the maximum allowed slices, just use the number
of slices present in this frame.
2025-04-14 06:10:41 +02:00
Lynne
d7772da728
vulkan_ffv1: remove unused define
Leftover debug macro.
2025-04-14 06:10:41 +02:00
Lynne
d077e00f3e
vulkan_ffv1: enable acceleration on Intel
Fixed by previous commit.
2025-04-14 06:10:41 +02:00
Lynne
6bad55eb17
ffv1: add a Vulkan-based decoder
This patch adds a fully-featured level 3 and 4 decoder for FFv1,
supporting Golomb and all Range coding variants, all pixel formats,
and all features, except for the newly added floating-point formats.

On a 6000 Ada, for 3840x2160 bgr0 content at 50Mbps (standard desktop
recording), it is able to do 400fps.
An Alder Lake with 24 threads can barely do 100fps.
2025-03-17 08:51:23 +01:00