Commit graph

17 commits

Author SHA1 Message Date
Lynne
615b26f1b1
vulkan_ffv1: fix swapped colors for x2bgr10 2025-11-26 15:16:40 +01:00
Lynne
3cbe3418b2
vulkan_ffv1: fix golomb coding for non-RGB streams
The run_index is reset on each plane, unlike with RGB, where
its reset once per slice.
2025-05-27 06:40:33 +09:00
Lynne
c395ad7c2c
vulkan_ffv1: small cleanup for golomb
Split up computation of the offset in the same way that
the range coder version does it.
2025-05-27 06:40:29 +09:00
Lynne
977d1a24bc
vulkan/ffv1: fix sync issue in cached bitstream reader/writer
The issue is that there is an explicit lack of synchronization as only the very
first invocation writes symbols and updates the state, which other invocations
then store.
2025-05-23 05:23:44 +09:00
Lynne
7b45d9c5fd
vulkan_ffv1: pipe through slice decoding status 2025-05-20 19:53:02 +09:00
Lynne
a24ea37228
vulkan_ffv1: fix PCM + cached symbol reader
writeout_rgb requires that all subgroups are active.
2025-05-20 19:53:01 +09:00
Lynne
bd41838b60
ffv1enc_vulkan: switch to 2-line cache, unify prediction code 2025-05-20 19:53:01 +09:00
Lynne
a4078abd73
vulkan/ffv1: synchronize get_pred implementations between encoder and decoder 2025-05-20 19:53:00 +09:00
Lynne
29b85cd4b8
vulkan_ffv1: add cached symbol reader for AMD
Speeds up everything on AMD by 3x.
This uses 32 local invocations to load state into cache, as well
as to do the RCT faster.
2025-04-14 06:10:43 +02:00
Lynne
e040c087c7
vulkan: add support for expect/assume
This commit adds support for compiler hints.
While on AMD these are not used/needed, Nvidia benefits from them, and gives
a sizeable 10% speedup on 4k.
2025-04-14 06:10:43 +02:00
Lynne
985a26be28
vulkan_ffv1: shortcut +-1 coeffs in symbol reading
Slightly faster, and allows for further optimizations.
2025-04-14 06:10:43 +02:00
Lynne
8ceabb677c
vulkan_ffv1: externalize extended lookup check
8% speedup on nvidia on 4k.
2025-04-14 06:10:43 +02:00
Lynne
77f777d925
ffv1/vulkan: redo context count tracking and quant_table_idx management
This commit also makes it possible for the encoder to choose a different
quantization table on a per-slice basis, as well as adding this capability
to the decoder.

Also, this commit fully fixes decoding of context=1 encoded files.
2025-04-14 06:10:42 +02:00
Lynne
66b8c92df2
vulkan_ffv1: cache only 2 lines when decoding RGB
This reduces the intermediate VRAM used for RGB decoding by a
factor of 100x for 6k video.
This also speeds the decoder up by 16% for 4k RGB24 and 31% for 6k video.

This is equivalent to what the software decoder does, but with less pointers.
2025-04-14 06:10:42 +02:00
Lynne
72953477a4
vulkan_ffv1: fix left-2 sample addressing
Typo.
Not enough to fix context=1, but its a start.
2025-04-14 06:10:42 +02:00
Lynne
fc960dafef
vulkan_ffv1: optimize symbol reader
This was the fastest variant tested.
2025-04-14 06:10:41 +02:00
Lynne
6bad55eb17
ffv1: add a Vulkan-based decoder
This patch adds a fully-featured level 3 and 4 decoder for FFv1,
supporting Golomb and all Range coding variants, all pixel formats,
and all features, except for the newly added floating-point formats.

On a 6000 Ada, for 3840x2160 bgr0 content at 50Mbps (standard desktop
recording), it is able to do 400fps.
An Alder Lake with 24 threads can barely do 100fps.
2025-03-17 08:51:23 +01:00