Commit graph

17 commits

Author SHA1 Message Date
Lynne
f2a55af9a4
vulkan_dpx: switch to compile-time SPIR-V generation 2026-01-12 17:28:43 +01:00
Lynne
e27b510da8
vulkan_prores: generate SPIR-V at compile-time 2026-01-12 17:28:42 +01:00
Lynne
026e94e339
vulkan_prores_raw: use compile-time SPIR-V generation 2026-01-12 17:28:42 +01:00
Lynne
f2affdfafb
configure/make: support compile-time SPIR-V generation 2026-01-12 17:28:40 +01:00
Lynne
6eced88188
vulkan: merge ProRes and ProRes RAW iDCTs
This cleans up the code a bit, and reduces binary size.
2025-12-22 19:46:26 +01:00
Lynne
9e8e34d475
vulkan_ffv1: remove unused RCT shader files
The 2 files were made redundant when the RCT was merged into encode/decode.
2025-12-13 22:12:26 +01:00
averne
c384b1e803 vulkan/prores: use vkCmdClearColorImage
The VK spec forbids using clear commands on YUV images,
so we need to allocate separate per-plane images.
This removes the need for a separate reset shader.
2025-12-07 18:17:36 +00:00
Lynne
531ce713a0
dpxdec: add a Vulkan hwaccel 2025-11-26 15:16:43 +01:00
Lynne
bb30a0d0d8
vulkan_prores_raw: split up decoding and DCT
This commit optimizes the Vulkan decoder by splitting up decoding
from iDCT, and merging the few tables needed directly into the shader.

The speedup on Intel is 10x.
2025-11-26 15:16:41 +01:00
averne
98412edfed lavc: add a ProRes Vulkan hwaccel
Add a shader-based Apple ProRes decoder.
It supports all codec features for profiles up to
the 4444 XQ profile, ie.:
- 4:2:2 and 4:4:4 chroma subsampling
- 10- and 12-bit component depth
- Interlacing
- Alpha

The implementation consists in two shaders: the
VLD kernel does entropy decoding for color/alpha,
and the IDCT kernel performs the inverse transform
on color components.

Benchmarks for a 4k yuv422p10 sample:
- AMD Radeon 6700XT:   178 fps
- Intel i7 Tiger Lake: 37 fps
- NVidia Orin Nano:    70 fps
2025-10-25 19:54:13 +00:00
Lynne
75aeffb1c6
lavc: add a ProRes RAW Vulkan hwaccel
This commit adds a ProRes RAW hardware implementation written in Vulkan.
Both version 0 and version 1 streams are supported.
The implementation is highly parallelized, with 512 invocations dispatched
per every tile, with generally 4k tiles on a 5.8k stream.

Thanks to unlord for the 8-point iDCT.

Benchmark for a generic 5.8k RAW HQ file:
6900XT: 63fps
7900XTX: 84fps
6000 Ada: 120fps
Intel: 9fps
2025-08-08 18:29:41 +09:00
Lynne
7576410af7
ffv1enc_vulkan: implement RCT search for level >= 4 2025-05-20 19:53:01 +09:00
Lynne
ebbc7ff650
ffv1enc_vulkan: merge all encoder variants into one file
Makes it easier to work with, despite the heavy ifdeffery.
2025-05-20 19:52:55 +09:00
Lynne
66b8c92df2
vulkan_ffv1: cache only 2 lines when decoding RGB
This reduces the intermediate VRAM used for RGB decoding by a
factor of 100x for 6k video.
This also speeds the decoder up by 16% for 4k RGB24 and 31% for 6k video.

This is equivalent to what the software decoder does, but with less pointers.
2025-04-14 06:10:42 +02:00
Lynne
6bad55eb17
ffv1: add a Vulkan-based decoder
This patch adds a fully-featured level 3 and 4 decoder for FFv1,
supporting Golomb and all Range coding variants, all pixel formats,
and all features, except for the newly added floating-point formats.

On a 6000 Ada, for 3840x2160 bgr0 content at 50Mbps (standard desktop
recording), it is able to do 400fps.
An Alder Lake with 24 threads can barely do 100fps.
2025-03-17 08:51:23 +01:00
Lynne
ed2391d341
ffv1enc: add a Vulkan encoder
This commit implements a standard, compliant, version 3 and version 4
FFv1 encoder, entirely in Vulkan. The encoder is written in standard
GLSL and requires a Vulkan 1.3 supporting GPU with the BDA extension.

The encoder can use any amount of slices, but nominally, should use
32x32 slices (1024 in total) to maximize parallelism.

All features are supported, as well as all pixel formats.
This includes:
 - Rice
 - Range coding with a custom quantization table
 - PCM encoding

CRC calculation is also massively parallelized on the GPU.

Encoding of unaligned dimensions on subsampled data requires
version 4, or requires oversizing the image to 64-pixel alignment
and cropping out the padding via container flags.

Performance-wise, this makes 1080p real-time screen capture possible
at 60fps on even modest GPUs.
2024-11-18 07:54:22 +01:00
Lynne
4e861ad8e0
libavcodec/Makefile: add a makefile for Vulkan shaders 2024-10-15 17:45:19 +02:00