Commit graph

17 commits

Author SHA1 Message Date
Lynne
fc960dafef
vulkan_ffv1: optimize symbol reader
This was the fastest variant tested.
2025-04-14 06:10:41 +02:00
Lynne
defebd74c0
vulkan_ffv1: slightly optimize the range decoder
GPUs have cmovs as standard.
2025-04-14 06:10:41 +02:00
Lynne
6bad55eb17
ffv1: add a Vulkan-based decoder
This patch adds a fully-featured level 3 and 4 decoder for FFv1,
supporting Golomb and all Range coding variants, all pixel formats,
and all features, except for the newly added floating-point formats.

On a 6000 Ada, for 3840x2160 bgr0 content at 50Mbps (standard desktop
recording), it is able to do 400fps.
An Alder Lake with 24 threads can barely do 100fps.
2025-03-17 08:51:23 +01:00
Lynne
f2a0bdd6b1
vulkan: unify handling of BGR and simplify ffv1_rct 2025-03-17 08:49:15 +01:00
Lynne
b2ebe9884e
ffv1enc_vulkan: refactor code to support sharing with decoder
The shaders were written to support sharing, but needed slight
tweaking.
2025-03-17 08:49:14 +01:00
Lynne
89704f07bb
lavc/vulkan: add a u8vec2buf buffer type
Useful, since it doesn't have alignment limitations.
2025-02-21 03:19:20 +01:00
IndecisiveTurtle
351fd8460a vulkan/common: Add put_bytes_count 2024-11-28 10:03:01 +09:00
IndecisiveTurtle
e3ac63b213 vulkan/common: Use u32vec2 buffer type instead of u64
According to the GL_EXT_buffer_reference spec alignment
"must be a power of two and be greater than or equal to the largest scalar/component type in the block."

This means by using u32vec2 we can drop the requirement alignment from 8 bytes to 4 bytes
and save a pack64 call in reverse8 (though I assume in most ISAs that compiles to nothing)

Allows the vc2 vulkan encoder to function without setting PB_UNALIGNED
2024-11-28 09:31:43 +09:00
IndecisiveTurtle
f794ed48c0 vulkan/common: Fix off-by-one error in flush_put_bits
If caller wrote a divisible by eight number of bits it would write an extra byte.
Also increment by to_write instead of BUF_BYTES which overly pads the bitstream.
2024-11-28 09:31:43 +09:00
Lynne
f65e51293a
hwcontext_vulkan: add support for AV_PIX_FMT_GBRAP10/12/14 2024-11-26 14:14:13 +01:00
Lynne
7c52dda55f
hwcontext_vulkan: add support for AV_PIX_FMT_GBRP12/14/16 2024-11-26 14:14:12 +01:00
Lynne
4d3e96c90c
lavc/vulkan/common: fix reverse4's incorrect swizzle
The function is responsible for converting little to big endian.
It had an incorrect swizzle for the last 2 bytes.
2024-11-20 05:23:36 +01:00
Lynne
9691ac6af2
ffv1enc_vulkan: increase max outstanding byte count to 16bit
The issue is that at higher resolutions, the outstanding byte counter
overflowed in case the image had a lot of blank areas.
2024-11-20 05:23:35 +01:00
Lynne
ebf5264c93
ffv1enc_vulkan: fix PCM encoding
This line was mysteriously deleted.
2024-11-20 05:23:35 +01:00
Lynne
eb536d97a0
ffv1enc_vulkan: support buffers larger than 4GiB
Unlike the software FFv1 encoder, none of our buffers are allocated by
FFmpeg, which supports at most 4GiB large allocations.

For really large sizes, the maximum size of the buffer can exceed 4GiB,
which the software encoder optimistically tries to allocate as 4GiB
in the hopes that the encoder will compress to under that amount.

We can just let Vulkan allocate us a larger buffer, and switch to
64-bit offsets.
2024-11-20 05:23:05 +01:00
Lynne
ed2391d341
ffv1enc: add a Vulkan encoder
This commit implements a standard, compliant, version 3 and version 4
FFv1 encoder, entirely in Vulkan. The encoder is written in standard
GLSL and requires a Vulkan 1.3 supporting GPU with the BDA extension.

The encoder can use any amount of slices, but nominally, should use
32x32 slices (1024 in total) to maximize parallelism.

All features are supported, as well as all pixel formats.
This includes:
 - Rice
 - Range coding with a custom quantization table
 - PCM encoding

CRC calculation is also massively parallelized on the GPU.

Encoding of unaligned dimensions on subsampled data requires
version 4, or requires oversizing the image to 64-pixel alignment
and cropping out the padding via container flags.

Performance-wise, this makes 1080p real-time screen capture possible
at 60fps on even modest GPUs.
2024-11-18 07:54:22 +01:00
Lynne
4e861ad8e0
libavcodec/Makefile: add a makefile for Vulkan shaders 2024-10-15 17:45:19 +02:00