This commit adds a ProRes RAW hardware implementation written in Vulkan.
Both version 0 and version 1 streams are supported.
The implementation is highly parallelized, with 512 invocations dispatched
per every tile, with generally 4k tiles on a 5.8k stream.
Thanks to unlord for the 8-point iDCT.
Benchmark for a generic 5.8k RAW HQ file:
6900XT: 63fps
7900XTX: 84fps
6000 Ada: 120fps
Intel: 9fps
In filtering, and SDR encoding, we use storage images.
This fixes using Vulkan filters on Intel.
Tested not to break anything on the three major vendors.
This patch adds a fully-featured level 3 and 4 decoder for FFv1,
supporting Golomb and all Range coding variants, all pixel formats,
and all features, except for the newly added floating-point formats.
On a 6000 Ada, for 3840x2160 bgr0 content at 50Mbps (standard desktop
recording), it is able to do 400fps.
An Alder Lake with 24 threads can barely do 100fps.
We queried the decoder whether it was able to decode sucessfully, but
since we operated asynchronously, we weren't able to do anything with
this information but let the user know decoding failed for the previous
frame(s).
Since we parse the slice headers ourselves and we're reasonably sure we
can decode before actually starting to decode, this was rarely triggered
on corrupt data, and hardware's understanding of whether there was an error
or not is vague.
There's also a semantic problem with our use of the queries - if there's
a seek, we flush, but what happens to the queries is vague according to
the spec. Most hardware dealt fine, since queries are nothing more than
GPU memory with integers stored. But with Intel, they seem to be more of
a register to which a driver must keep track of, leading to issues if there's
been a reset (seek) and we query the previous submission before the seek.
Just get rid of them. The query code is still used in encoding.
This fixes seeking with HEVC and AV1 on Intel.
We recently introduced a public field which was a superset
of the queue context we used to have.
Switch to using it entirely.
This also allows us to get rid of the NIH function which was
valid only for video queues.
Originally, the decoder had a single execution pool, with one
execution context per thread. Execution pools were always intended
to be thread-safe, as long as there were enough execution contexts
in the pool to satisfy all threads.
Due to synchronization issues, the threading part was removed at some
point, and, for decoding, each thread had its own execution pool.
Having a single execution pool per context is hacky, not to mention
wasteful.
Most importantly, we *cannot* associate single shaders across multiple
execution pools for a single application. This means that we cannot
use shaders to either apply film grain, or use this framework for
software-defined decoders.
The recent commits added threading capabilities back to the execution
pool, and the number of contexts in each pool was increased. This was
done with the assumption that the execution pool was singular, which
it was not. This led to increased parallelism and number of frames
in flight, which is taxing on memory.
This commit finally restores proper threading behaviour.
The validation layer has isses that are reported and addressed in the
earlier commit.
The old query code never worked properly, and did some hideous
heuristics to read the status bit, and work that into a return
code.
This is all best left to callers to do, which simplifies
our code a lot.
This also fixes minor validation errors regarding calling queries
which are not in their active state.
Vulkan encoding was designed in a very... consolidated way.
You had to know the exact codec and profile that the image was going to
eventually be encoded as at... image creation time. Unfortunately, as good
as our code is, glimpsing into the exact future isn't what its capable of.
video_maintenance1 removed that requirement, which only then made encoding
images practically possible.
layered_dpb only makes sense when dedicated_dpb is set to 1.
For some mysterious reason, some Nvidia drivers stopped indicating
SEPARATE_REFRENCES, but kept the COINCIDE flag, which broke
the code.
The first release of the CTS for AV1 decoding had incorrect
offsets for the OrderHints values.
The CTS will be fixed, and eventually, the drivers will be
updated to the proper spec-conforming behaviour, but we still
need to add a workaround as this will take months.
Only NVIDIA use these values at all, so limit the workaround
to only NVIDIA. Also, other vendors don't tend to provide accurate
CTS information.
There are lots of files that don't need it: The number of object
files that actually need it went down from 2011 to 884 here.
Keep it for external users in order to not cause breakages.
Also improve the other headers a bit while just at it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These fields are set for all Vulkan decoding hwaccels;
they would be useless if it were different.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only three of the 226 (== AV_CODEC_ID_AV1) entries
have been used. Unsparsing this table is especially
important given that this array lives in .data.rel.ro.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
All the fields of FFVkCodecMap are either decoder-only
or encoder-only (with the latter being unused and unset for now).
Yet there is already a per-decoder struct containing
static information about these decoders, namely
VkExtensionProperties.
This commit merges the decoder-parts of FFVkCodecMap
with the VkExtensionProperties into a common structure.
Given that FFVkCodecMap is now unused, it is removed.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ff_vk_codec_map currently is an array indexed by AVCodecID;
it has AV_CODEC_ID_FIRST_AUDIO (= 65536) entries, but uses
only three of them; only 24B of 1MiB were actually used
This commit fixes this by adding an AVCodecID field to the table
and making it non-sparse.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
FIx warnings (soon to be errors in GCC 14, already so in Clang 15):
```
src/libavcodec/vulkan_av1.c: In function ‘vk_av1_create_params’:
src/libavcodec/vulkan_av1.c:183:43: error: initialization of ‘long long unsigned int’ from ‘void *’ makes integer from pointer without a cast [-Wint-conversion]
183 | .videoSessionParametersTemplate = NULL,
| ^~~~
src/libavcodec/vulkan_av1.c:183:43: note: (near initialization for ‘(anonymous).videoSessionParametersTemplate’)
```
Use Vulkan's VK_NULL_HANDLE instead of bare NULL.
Fix Trac ticket #10724.
Was reported downstream in Gentoo at https://bugs.gentoo.org/919067.
Signed-off-by: Sam James <sam@gentoo.org>
"Validation Error: [ VUID-VkImageViewCreateInfo-imageViewType-04974 ] Object 0: handle = 0x9f9b41000000003c, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0xc120e150 | vkCreateImageView():
Using pCreateInfo->viewType VK_IMAGE_VIEW_TYPE_2D and the subresourceRange.layerCount VK_REMAINING_ARRAY_LAYERS=(17) and must 1 (try looking into VK_IMAGE_VIEW_TYPE_*_ARRAY).
The Vulkan spec states: If viewType is VK_IMAGE_VIEW_TYPE_1D, VK_IMAGE_VIEW_TYPE_2D, or VK_IMAGE_VIEW_TYPE_3D; and subresourceRange.layerCount is VK_REMAINING_ARRAY_LAYERS,
then the remaining number of layers must be 1"
Partially fixes https://streams.videolan.org/issues/19938/20000_20180305-15.04.59.ts
The is coded as 1920x1080, meant to be rendered at 1440x1080 with cropping,
or 1680x1080 before cropping. Currently, the created DPB is 1440x1080, which results
in the image being decoded incorrectly, as the decoder overwrites output memory.
This commit fixes this.
The issue is that we cannot rely on any context existing when we free
frames. The Vulkan functions are loaded in each context separately,
so until now, we've just been loading them on every frame's destruction.
Rather than do this, just save the function pointers we need in each
frame. The function pointers are guaranteed to not change and exist.
All Vulkan HWAccels share the same boilerplate code for creating
session params and this includes a common bug: In case actually
creating the video session parameters fails, the buffer destined
to hold them leaks; in case of HEVC this is also true if
get_data_set_buf() fails.
This commit factors this code out and fixes the leak.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These defines are also used in other contexts than just AVCodecContext
ones, e.g. in libavformat. Furthermore, given that these defines are
public, the AV-prefix is the right one, so deprecate (and not just move)
the FF-macros.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>