Commit graph

43 commits

Author SHA1 Message Date
Lynne
3cbe3418b2
vulkan_ffv1: fix golomb coding for non-RGB streams
The run_index is reset on each plane, unlike with RGB, where
its reset once per slice.
2025-05-27 06:40:33 +09:00
Lynne
c395ad7c2c
vulkan_ffv1: small cleanup for golomb
Split up computation of the offset in the same way that
the range coder version does it.
2025-05-27 06:40:29 +09:00
Lynne
977d1a24bc
vulkan/ffv1: fix sync issue in cached bitstream reader/writer
The issue is that there is an explicit lack of synchronization as only the very
first invocation writes symbols and updates the state, which other invocations
then store.
2025-05-23 05:23:44 +09:00
Lynne
7b45d9c5fd
vulkan_ffv1: pipe through slice decoding status 2025-05-20 19:53:02 +09:00
Lynne
cb8f4b675d
vulkan/ffv1: unify encode and decode get/put primitives
This simply makes a get_rac/put_rac_internal variant that can be
reused.
2025-05-20 19:53:02 +09:00
Lynne
7576410af7
ffv1enc_vulkan: implement RCT search for level >= 4 2025-05-20 19:53:01 +09:00
Lynne
0156680f09
ffv1enc_vulkan: implement the cached EC writer from the decoder
This gives a 35% speedup on AMD and 50% on Nvidia.
2025-05-20 19:53:01 +09:00
Lynne
a24ea37228
vulkan_ffv1: fix PCM + cached symbol reader
writeout_rgb requires that all subgroups are active.
2025-05-20 19:53:01 +09:00
Lynne
8a2d921627
ffv1_common: minor RGB optimization 2025-05-20 19:53:01 +09:00
Lynne
bd41838b60
ffv1enc_vulkan: switch to 2-line cache, unify prediction code 2025-05-20 19:53:01 +09:00
Lynne
52595025c5
ffv1enc_vulkan: minor EC optimizations 2025-05-20 19:53:01 +09:00
Lynne
7c0a8c07ce
ffv1enc_vulkan: unify EC code between setup and encode 2025-05-20 19:53:00 +09:00
Lynne
69f83bafd1
ffv1enc_vulkan: get rid of temporary data for the setup shader 2025-05-20 19:53:00 +09:00
Lynne
a4078abd73
vulkan/ffv1: synchronize get_pred implementations between encoder and decoder 2025-05-20 19:53:00 +09:00
Lynne
ebbc7ff650
ffv1enc_vulkan: merge all encoder variants into one file
Makes it easier to work with, despite the heavy ifdeffery.
2025-05-20 19:52:55 +09:00
Lynne
707c04fe06
ffv1enc_vulkan: support 8 and 16-bit 2-plane YUV formats
This adds support for all 8-bit and 16-bit 2-plane formats.
P010 and others require more work as the data's LSB-padded.
2025-05-01 09:34:44 +02:00
Lynne
36c6c66deb vulkan/rangecoder: minor cleanup 2025-04-16 23:38:16 +02:00
Lynne
29b85cd4b8
vulkan_ffv1: add cached symbol reader for AMD
Speeds up everything on AMD by 3x.
This uses 32 local invocations to load state into cache, as well
as to do the RCT faster.
2025-04-14 06:10:43 +02:00
Lynne
e040c087c7
vulkan: add support for expect/assume
This commit adds support for compiler hints.
While on AMD these are not used/needed, Nvidia benefits from them, and gives
a sizeable 10% speedup on 4k.
2025-04-14 06:10:43 +02:00
Lynne
985a26be28
vulkan_ffv1: shortcut +-1 coeffs in symbol reading
Slightly faster, and allows for further optimizations.
2025-04-14 06:10:43 +02:00
Lynne
4d561e6a1e
vulkan_ffv1: remove need for scratch data during setup
This saves on some VRAM, but mainly allows for a more unified path.
2025-04-14 06:10:43 +02:00
Lynne
8ceabb677c
vulkan_ffv1: externalize extended lookup check
8% speedup on nvidia on 4k.
2025-04-14 06:10:43 +02:00
Lynne
77f777d925
ffv1/vulkan: redo context count tracking and quant_table_idx management
This commit also makes it possible for the encoder to choose a different
quantization table on a per-slice basis, as well as adding this capability
to the decoder.

Also, this commit fully fixes decoding of context=1 encoded files.
2025-04-14 06:10:42 +02:00
Lynne
66b8c92df2
vulkan_ffv1: cache only 2 lines when decoding RGB
This reduces the intermediate VRAM used for RGB decoding by a
factor of 100x for 6k video.
This also speeds the decoder up by 16% for 4k RGB24 and 31% for 6k video.

This is equivalent to what the software decoder does, but with less pointers.
2025-04-14 06:10:42 +02:00
Lynne
72953477a4
vulkan_ffv1: fix left-2 sample addressing
Typo.
Not enough to fix context=1, but its a start.
2025-04-14 06:10:42 +02:00
Lynne
45d7abf6d9
vulkan_ffv1: init overread/corrupt fields
Forgotten.
2025-04-14 06:10:42 +02:00
Lynne
fc960dafef
vulkan_ffv1: optimize symbol reader
This was the fastest variant tested.
2025-04-14 06:10:41 +02:00
Lynne
defebd74c0
vulkan_ffv1: slightly optimize the range decoder
GPUs have cmovs as standard.
2025-04-14 06:10:41 +02:00
Lynne
6bad55eb17
ffv1: add a Vulkan-based decoder
This patch adds a fully-featured level 3 and 4 decoder for FFv1,
supporting Golomb and all Range coding variants, all pixel formats,
and all features, except for the newly added floating-point formats.

On a 6000 Ada, for 3840x2160 bgr0 content at 50Mbps (standard desktop
recording), it is able to do 400fps.
An Alder Lake with 24 threads can barely do 100fps.
2025-03-17 08:51:23 +01:00
Lynne
f2a0bdd6b1
vulkan: unify handling of BGR and simplify ffv1_rct 2025-03-17 08:49:15 +01:00
Lynne
b2ebe9884e
ffv1enc_vulkan: refactor code to support sharing with decoder
The shaders were written to support sharing, but needed slight
tweaking.
2025-03-17 08:49:14 +01:00
Lynne
89704f07bb
lavc/vulkan: add a u8vec2buf buffer type
Useful, since it doesn't have alignment limitations.
2025-02-21 03:19:20 +01:00
IndecisiveTurtle
351fd8460a vulkan/common: Add put_bytes_count 2024-11-28 10:03:01 +09:00
IndecisiveTurtle
e3ac63b213 vulkan/common: Use u32vec2 buffer type instead of u64
According to the GL_EXT_buffer_reference spec alignment
"must be a power of two and be greater than or equal to the largest scalar/component type in the block."

This means by using u32vec2 we can drop the requirement alignment from 8 bytes to 4 bytes
and save a pack64 call in reverse8 (though I assume in most ISAs that compiles to nothing)

Allows the vc2 vulkan encoder to function without setting PB_UNALIGNED
2024-11-28 09:31:43 +09:00
IndecisiveTurtle
f794ed48c0 vulkan/common: Fix off-by-one error in flush_put_bits
If caller wrote a divisible by eight number of bits it would write an extra byte.
Also increment by to_write instead of BUF_BYTES which overly pads the bitstream.
2024-11-28 09:31:43 +09:00
Lynne
f65e51293a
hwcontext_vulkan: add support for AV_PIX_FMT_GBRAP10/12/14 2024-11-26 14:14:13 +01:00
Lynne
7c52dda55f
hwcontext_vulkan: add support for AV_PIX_FMT_GBRP12/14/16 2024-11-26 14:14:12 +01:00
Lynne
4d3e96c90c
lavc/vulkan/common: fix reverse4's incorrect swizzle
The function is responsible for converting little to big endian.
It had an incorrect swizzle for the last 2 bytes.
2024-11-20 05:23:36 +01:00
Lynne
9691ac6af2
ffv1enc_vulkan: increase max outstanding byte count to 16bit
The issue is that at higher resolutions, the outstanding byte counter
overflowed in case the image had a lot of blank areas.
2024-11-20 05:23:35 +01:00
Lynne
ebf5264c93
ffv1enc_vulkan: fix PCM encoding
This line was mysteriously deleted.
2024-11-20 05:23:35 +01:00
Lynne
eb536d97a0
ffv1enc_vulkan: support buffers larger than 4GiB
Unlike the software FFv1 encoder, none of our buffers are allocated by
FFmpeg, which supports at most 4GiB large allocations.

For really large sizes, the maximum size of the buffer can exceed 4GiB,
which the software encoder optimistically tries to allocate as 4GiB
in the hopes that the encoder will compress to under that amount.

We can just let Vulkan allocate us a larger buffer, and switch to
64-bit offsets.
2024-11-20 05:23:05 +01:00
Lynne
ed2391d341
ffv1enc: add a Vulkan encoder
This commit implements a standard, compliant, version 3 and version 4
FFv1 encoder, entirely in Vulkan. The encoder is written in standard
GLSL and requires a Vulkan 1.3 supporting GPU with the BDA extension.

The encoder can use any amount of slices, but nominally, should use
32x32 slices (1024 in total) to maximize parallelism.

All features are supported, as well as all pixel formats.
This includes:
 - Rice
 - Range coding with a custom quantization table
 - PCM encoding

CRC calculation is also massively parallelized on the GPU.

Encoding of unaligned dimensions on subsampled data requires
version 4, or requires oversizing the image to 64-pixel alignment
and cropping out the padding via container flags.

Performance-wise, this makes 1080p real-time screen capture possible
at 60fps on even modest GPUs.
2024-11-18 07:54:22 +01:00
Lynne
4e861ad8e0
libavcodec/Makefile: add a makefile for Vulkan shaders 2024-10-15 17:45:19 +02:00