Commit graph

66 commits

Author SHA1 Message Date
Lynne
eb9e000584 vulkan_decode: add ifdefs around VP9 definitions and privatize profile struct
The struct is not referenced anywhere else.
2025-08-08 15:07:33 +00:00
Benjamin Cheng
f7a5128109 vulkan_av1: Fix frame threading
Basically do the same thing that was done for VP9, and remove the
vestigial frame_id_alloc_mask in the context.
2025-08-08 14:45:58 +00:00
Lynne
75aeffb1c6
lavc: add a ProRes RAW Vulkan hwaccel
This commit adds a ProRes RAW hardware implementation written in Vulkan.
Both version 0 and version 1 streams are supported.
The implementation is highly parallelized, with 512 invocations dispatched
per every tile, with generally 4k tiles on a 5.8k stream.

Thanks to unlord for the 8-point iDCT.

Benchmark for a generic 5.8k RAW HQ file:
6900XT: 63fps
7900XTX: 84fps
6000 Ada: 120fps
Intel: 9fps
2025-08-08 18:29:41 +09:00
Lynne
2caf23e7c4
vp9: add Vulkan VP9 hwaccel 2025-08-08 18:29:40 +09:00
Timo Rothenpieler
262d41c804 all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
Lynne
7b45d9c5fd
vulkan_ffv1: pipe through slice decoding status 2025-05-20 19:53:02 +09:00
Lynne
ec3f3457fd
vulkan_decode: add STORAGE flag to output images
In filtering, and SDR encoding, we use storage images.
This fixes using Vulkan filters on Intel.
Tested not to break anything on the three major vendors.
2025-04-19 10:59:16 +02:00
Lynne
193610d9ba
vulkan_decode: allow using NULL offsets/nb_slices in ff_vk_decode_add_slice()
For codecs like VP9 which use a single slice.
2025-03-27 17:22:11 +01:00
Lynne
5fc4acae9c
vulkan_decode: allow using NULL sequence_params when decoding
The function had some checks to allow for this, but as it always tried
to dereference a bufferref, it wasn't fully ready.
2025-03-27 17:22:11 +01:00
Lynne
6bad55eb17
ffv1: add a Vulkan-based decoder
This patch adds a fully-featured level 3 and 4 decoder for FFv1,
supporting Golomb and all Range coding variants, all pixel formats,
and all features, except for the newly added floating-point formats.

On a 6000 Ada, for 3840x2160 bgr0 content at 50Mbps (standard desktop
recording), it is able to do 400fps.
An Alder Lake with 24 threads can barely do 100fps.
2025-03-17 08:51:23 +01:00
Lynne
31176b16ac
vulkan_decode: use VK_KHR_video_maintenance2 if available 2025-03-17 08:49:12 +01:00
Lynne
e15e85b869
vulkan_decode: adjust number of async contexts created
This caps the number of contexts we create based on thread count.
This saves VRAM and filters out cases where more async is of lesser
benefit.
2025-03-17 08:49:11 +01:00
Lynne
4495802bdb
vulkan_decode: support multiple image views
Enables non-monochrome video decoding using all our existing functions
in the context of an SDR decoder.
2025-03-17 08:49:11 +01:00
Lynne
491b65e343
vulkan_decode: support software-defined decoders 2025-03-17 08:49:11 +01:00
Lynne
551041e384
vulkan_decode: remove informative queries
We queried the decoder whether it was able to decode sucessfully, but
since we operated asynchronously, we weren't able to do anything with
this information but let the user know decoding failed for the previous
frame(s).

Since we parse the slice headers ourselves and we're reasonably sure we
can decode before actually starting to decode, this was rarely triggered
on corrupt data, and hardware's understanding of whether there was an error
or not is vague.

There's also a semantic problem with our use of the queries - if there's
a seek, we flush, but what happens to the queries is vague according to
the spec. Most hardware dealt fine, since queries are nothing more than
GPU memory with integers stored. But with Intel, they seem to be more of
a register to which a driver must keep track of, leading to issues if there's
been a reset (seek) and we query the previous submission before the seek.

Just get rid of them. The query code is still used in encoding.

This fixes seeking with HEVC and AV1 on Intel.
2025-01-03 14:53:41 +09:00
Lynne
8fbecfd1a0
vulkan_decode: add queue_flags field to specify queue used 2024-12-23 04:25:09 +09:00
Lynne
2e06b84e27
vulkan: do not reinvent a queue context struct
We recently introduced a public field which was a superset
of the queue context we used to have.

Switch to using it entirely.

This also allows us to get rid of the NIH function which was
valid only for video queues.
2024-12-23 04:25:09 +09:00
Lynne
7239be07be
vulkan_decode: use a single execution pool
Originally, the decoder had a single execution pool, with one
execution context per thread. Execution pools were always intended
to be thread-safe, as long as there were enough execution contexts
in the pool to satisfy all threads.

Due to synchronization issues, the threading part was removed at some
point, and, for decoding, each thread had its own execution pool.
Having a single execution pool per context is hacky, not to mention
wasteful.
Most importantly, we *cannot* associate single shaders across multiple
execution pools for a single application. This means that we cannot
use shaders to either apply film grain, or use this framework for
software-defined decoders.

The recent commits added threading capabilities back to the execution
pool, and the number of contexts in each pool was increased. This was
done with the assumption that the execution pool was singular, which
it was not. This led to increased parallelism and number of frames
in flight, which is taxing on memory.

This commit finally restores proper threading behaviour.
The validation layer has isses that are reported and addressed in the
earlier commit.
2024-12-23 04:25:08 +09:00
Anton Khirnov
56ba57b672 lavc/refstruct: move to lavu and make public
It is highly versatile and generally useful.
2024-12-15 14:03:47 +01:00
Lynne
41f65b7326
vulkan_decode: ensure there's at least one context per decode thread
Otherwise, what may happen is that 2 threads will both write
into the same context.
2024-11-28 01:29:21 +09:00
Lynne
a5e6860a89
vulkan_decode: fix counting for parallelism
ff_vk_exec_pool_init used to multiply the number by
the number of queues, but that got changed, yet this use
of the function was not updated.
2024-11-28 01:29:15 +09:00
Lynne
37d5cb84e8
vulkan: check if current buffer has finished execution before picking another
This saves resources, as dependencies are freed/reclaimed with a lower latency,
and provies a speedup.
2024-10-04 10:10:42 +02:00
Lynne
5e9845f11e
vulkan(_decode): fix, simplify and improve queries
The old query code never worked properly, and did some hideous
heuristics to read the status bit, and work that into a return
code.
This is all best left to callers to do, which simplifies
our code a lot.

This also fixes minor validation errors regarding calling queries
which are not in their active state.
2024-09-09 07:05:46 +02:00
Lynne
9c65325819
vulkan_decode: use ff_vk_init
This solves the issue of an av_log function being called
with a context with invalid class.

Co-authored-by: Anton Khirnov <anton@khirnov.net>
2024-09-09 07:05:45 +02:00
Lynne
66e950fcac
vulkan_video: move imageview creation and DPB fields to common context
Shared between decoders and encoders.
2024-09-09 07:05:44 +02:00
Lynne
18d964fc2c
vulkan: enable encoding of images if video_maintenance1 is enabled
Vulkan encoding was designed in a very... consolidated way.
You had to know the exact codec and profile that the image was going to
eventually be encoded as at... image creation time. Unfortunately, as good
as our code is, glimpsing into the exact future isn't what its capable of.

video_maintenance1 removed that requirement, which only then made encoding
images practically possible.
2024-08-16 01:22:16 +02:00
Lynne
869f4aec48
vulkan_decode: use the correct queue family for decoding ops
In 680d969a30, the new API was
used to find a queue family for dispatch, but the found queue
family was not used for decoding, just for dispatching.
2024-08-16 01:22:08 +02:00
Lynne
680d969a30
vulkan_decode: port to the new queue family API 2024-08-11 05:13:16 +02:00
Lynne
1c05661ec4
vulkan_decode: add \n to error message 2024-08-11 05:13:15 +02:00
Lynne
ca591e6b50
vulkan_decode: force layered_dpb to 0 when dedicated_dpb is 0
layered_dpb only makes sense when dedicated_dpb is set to 1.
For some mysterious reason, some Nvidia drivers stopped indicating
SEPARATE_REFRENCES, but kept the COINCIDE flag, which broke
the code.
2024-08-11 05:13:14 +02:00
Lynne
6757cdb535
vulkan_video: remove NIH pooled buffer implementation
The code predates ff_vk_get_pooled_buffer().
2024-08-11 05:13:10 +02:00
Lynne
db09f1a5d8
vulkan_av1: add workaround for NVIDIA drivers tested on broken CTS
The first release of the CTS for AV1 decoding had incorrect
offsets for the OrderHints values.
The CTS will be fixed, and eventually, the drivers will be
updated to the proper spec-conforming behaviour, but we still
need to add a workaround as this will take months.

Only NVIDIA use these values at all, so limit the workaround
to only NVIDIA. Also, other vendors don't tend to provide accurate
CTS information.
2024-04-15 02:40:02 +02:00
Andreas Rheinhardt
790f793844 avutil/common: Don't auto-include mem.h
There are lots of files that don't need it: The number of object
files that actually need it went down from 2011 to 884 here.

Keep it for external users in order to not cause breakages.

Also improve the other headers a bit while just at it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:43 +01:00
Lynne
ecdc94b97f
vulkan_av1: port to the new stable API
Co-Authored-by: Dave Airlie <airlied@redhat.com>
2024-03-25 08:54:40 +01:00
Andreas Rheinhardt
ccb432c1fe avcodec/vulkan_decode: Remove always-false check
These fields are set for all Vulkan decoding hwaccels;
they would be useless if it were different.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-07 09:00:47 +01:00
Andreas Rheinhardt
f9d35e78fe avcodec/vulkan_decode: Un-sparse extensions table
Only three of the 226 (== AV_CODEC_ID_AV1) entries
have been used. Unsparsing this table is especially
important given that this array lives in .data.rel.ro.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-07 09:00:39 +01:00
Andreas Rheinhardt
f7b227bec3 avcodec/vulkan_video: Merge dec part of FFVkCodecMap and extension props
All the fields of FFVkCodecMap are either decoder-only
or encoder-only (with the latter being unused and unset for now).
Yet there is already a per-decoder struct containing
static information about these decoders, namely
VkExtensionProperties.

This commit merges the decoder-parts of FFVkCodecMap
with the VkExtensionProperties into a common structure.

Given that FFVkCodecMap is now unused, it is removed.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-07 09:00:30 +01:00
Andreas Rheinhardt
e429b0fdb7 avutil/vulkan: Don't autoinclude vulkan_loader.h
Only include it where necessary.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-03 22:55:26 +01:00
Andreas Rheinhardt
cb15b7b29e avcodec/vulkan_video: Don't use sparse table
ff_vk_codec_map currently is an array indexed by AVCodecID;
it has AV_CODEC_ID_FIRST_AUDIO (= 65536) entries, but uses
only three of them; only 24B of 1MiB were actually used

This commit fixes this by adding an AVCodecID field to the table
and making it non-sparse.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-03 17:17:13 +01:00
Sam James
2f24f10d9c libavcodec: fix -Wint-conversion in vulkan
FIx warnings (soon to be errors in GCC 14, already so in Clang 15):
```
src/libavcodec/vulkan_av1.c: In function ‘vk_av1_create_params’:
src/libavcodec/vulkan_av1.c:183:43: error: initialization of ‘long long unsigned int’ from ‘void *’ makes integer from pointer without a cast [-Wint-conversion]
  183 |         .videoSessionParametersTemplate = NULL,
      |                                           ^~~~
src/libavcodec/vulkan_av1.c:183:43: note: (near initialization for ‘(anonymous).videoSessionParametersTemplate’)
```

Use Vulkan's VK_NULL_HANDLE instead of bare NULL.

Fix Trac ticket #10724.

Was reported downstream in Gentoo at https://bugs.gentoo.org/919067.

Signed-off-by: Sam James <sam@gentoo.org>
2024-01-06 22:38:55 +01:00
Lynne
70864e6adb
vulkan_decode: correct flipped condition in image layout
Changed by the previous commit.
Caused validation issues on hardware with !reuse_dpb_dst but not layered_dpb.
2023-10-25 22:01:21 +02:00
Lynne
0b3616231d
vulkan_decode: fix another validation issue
Surprising no one, the insane usage rule has a catch.
2023-10-25 20:51:55 +02:00
Lynne
467e411839
vulkan_decode: fix pedantic validation issue
"Validation Error: [ VUID-VkImageViewCreateInfo-imageViewType-04974 ] Object 0: handle = 0x9f9b41000000003c, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0xc120e150 | vkCreateImageView():
Using pCreateInfo->viewType VK_IMAGE_VIEW_TYPE_2D and the subresourceRange.layerCount VK_REMAINING_ARRAY_LAYERS=(17) and must 1 (try looking into VK_IMAGE_VIEW_TYPE_*_ARRAY).
The Vulkan spec states: If viewType is VK_IMAGE_VIEW_TYPE_1D, VK_IMAGE_VIEW_TYPE_2D, or VK_IMAGE_VIEW_TYPE_3D; and subresourceRange.layerCount is VK_REMAINING_ARRAY_LAYERS,
then the remaining number of layers must be 1"
2023-10-25 20:51:54 +02:00
Lynne
9ee4f47c94
vulkan_decode: use coded_width/height instead of the non-coded width and height
Partially fixes https://streams.videolan.org/issues/19938/20000_20180305-15.04.59.ts
The is coded as 1920x1080, meant to be rendered at 1440x1080 with cropping,
or 1680x1080 before cropping. Currently, the created DPB is 1440x1080, which results
in the image being decoded incorrectly, as the decoder overwrites output memory.
This commit fixes this.
2023-10-25 20:51:05 +02:00
Andreas Rheinhardt
6695c0af0e avcodec/vulkan_decode: Use RefStruct API for shared_ref
Avoids allocations, error checks and indirections.
Also increases type-safety.

Reviewed-by: Lynne <dev@lynne.ee>
Tested-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-07 22:35:50 +02:00
Lynne
9310ffc809
vulkan_decode: don't call get_proc_addr on every frame's destruction
The issue is that we cannot rely on any context existing when we free
frames. The Vulkan functions are loaded in each context separately,
so until now, we've just been loading them on every frame's destruction.

Rather than do this, just save the function pointers we need in each
frame. The function pointers are guaranteed to not change and exist.
2023-09-15 17:35:22 +02:00
Lynne
552a5fa496
vulkan_hevc: switch from a buffer pool to a malloc and simplify
Simpler and more robust now that contexts are not shared between threads.
2023-09-15 17:35:19 +02:00
Andreas Rheinhardt
c1b6235d41 avcodec/vulkan_decode: Factor creating session params out, fix leak
All Vulkan HWAccels share the same boilerplate code for creating
session params and this includes a common bug: In case actually
creating the video session parameters fails, the buffer destined
to hold them leaks; in case of HEVC this is also true if
get_data_set_buf() fails.

This commit factors this code out and fixes the leak.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-15 02:38:22 +02:00
Lynne
398467f519
vulkan_decode: convert max level from vulkan to av for comparisons 2023-09-08 06:56:43 +02:00
Andreas Rheinhardt
8238bc0b5e avcodec/defs: Add AV_PROFILE_* defines, deprecate FF_PROFILE_* defines
These defines are also used in other contexts than just AVCodecContext
ones, e.g. in libavformat. Furthermore, given that these defines are
public, the AV-prefix is the right one, so deprecate (and not just move)
the FF-macros.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-07 00:39:02 +02:00