Commit graph

156 commits

Author SHA1 Message Date
Lynne
4d63e3dd4c
vulkan_ffv1: add Bayer encoder
Sponsored-by: Sovereign Tech Fund
2026-06-03 14:12:50 +09:00
Lynne
713f191c24
vulkan_ffv1: add Bayer decoder
Sponsored-by: Sovereign Tech Fund
2026-06-03 14:12:50 +09:00
Lynne
d66552e676
vulkan/ffv1: add 32-bit float RGB encoding and a rice + remap path
This implements 32-bit float RGB encoding and makes the Vulkan implementation
on-par with the C implementation.

Sponsored-by: Sovereign Tech Fund
2026-05-30 12:10:01 +09:00
Lynne
9a6b5ca197
vulkan/ffv1_enc_rct_search: fix slice dimension iterations
This was a mess, we were using incorrect pixels outside of the image boundaries as
valid, the iteration had undefined behaviour since it was non-uniform across the workgroup.

Calculate the per-invoc iterations from the slice dimensions instead, making all of
them identical. And add a valid flag to decide whether to use them or not. And fix the
synchronization.

Sponsored-by: Sovereign Tech Fund
2026-05-26 17:47:04 +09:00
Lynne
9cabb12f74
vulkan/ffv1_enc_rct_search: write slice_rct_coef directly by main invoc
The issue is that SliceContext was passed as an inout, which caused all
invocs to locally copy and modify it.
When the main invoc wrote it, only the very last written value was used,
choosing the wrong coeffs.

Sponsored-by: Sovereign Tech Fund
2026-05-26 17:47:04 +09:00
Lynne
5fc56fbf96
vulkan/ffv1_enc_rct_search: barrier before reading score_mode
There was a race condition where the main invocation would race ahead and use
values not yet written by other invocs.

Sponsored-by: Sovereign Tech Fund
2026-05-26 17:47:04 +09:00
Lynne
2806afd28f
vulkan/ffv1: read raw 16-bit float images via R16_UINT view to preserve denormals
GPUs filter out denormals when reading floats via imageLoad. Denormals shouldn't
be present in general, but if they are, this is a lossless codec, and we have to
preserve them. This allows reading the exact values.

Sponsored-by: Sovereign Tech Fund
2026-05-26 17:47:04 +09:00
Lynne
50e6668c83
vulkan/ffv1_enc: skip GOLOMB encode_line when !bits for FLOAT formats
Same as the arithmetic coded path. I skipped out on adding this here.

Sponsored-by: Sovereign Tech Fund
2026-05-26 17:47:04 +09:00
Lynne
e14e43aeaa
vulkan/ffv1_enc: pass the correct base and offset to OFFBUF in init_golomb
Ugh, my previous fix on this was only right in some cases, this is a general fix.

Sponsored-by: Sovereign Tech Fund
2026-05-26 17:47:03 +09:00
Lynne
d1e0a292ce
vulkan/ffv1_enc_remap: clear the full 65536-entry fltmap
Float pixfmts are meant to be normalized between [0, 1], but in case they
were not, and negative numbers were present, then the top bits would be
filled with garbage.

Sponsored-by: Sovereign Tech Fund
2026-05-26 17:47:03 +09:00
Lynne
4675271e7a
vulkan/rangecoder: fix encoding issue when -1 != 0xFF
This was an oversight while microoptimizing. The outstanding_byte can
reach 0xFF in some situations, which was causing errors when encoding,
particularly with 32-bit floats.

Sponsored-by: Sovereign Tech Fund
2026-05-26 17:46:59 +09:00
Lynne
5ad8c67e6c
apv_decode: add a Vulkan hwaccel 2026-05-19 17:43:53 +09:00
Lynne
9c40552965
prores_raw: synchronize decoder with reference implementation
This completes the reverse engineering of the decoder.
The commit applies the linearization curve from the previous patch.
2026-05-17 12:17:16 +09:00
Lynne
d8cb567171
prores_raw: fix tile alignment issues
Reverse engineered the decoder a bit more. All tiles are always 16x1.
The issue is that at the edges, tiles don't have the same width.
Instead, the first tile that starts to clip is half, and then the
next tile after that is also half the previous tile's width.
2026-05-17 12:02:52 +09:00
Lynne
eb24fb0c7f
vulkan/common: fix LOAD64 again
duh, gb.buf is incremented in the loop and I missed that. ugh.
2026-05-17 12:02:52 +09:00
Lynne
2d826f18fb
vulkan/prores_raw: don't load the quantization matrix on every invocation 2026-05-14 02:55:53 +09:00
Lynne
13aabf726b
vulkan/prores_raw: specify format on image
Unlike other decoders or encoders, prores_raw only has a single
Vulkan format to worry about.
This is a 20% speedup on AMD, since AMD apparently has optimizations
for this.
2026-05-14 02:55:53 +09:00
Lynne
a2737497de
vulkan/prores_raw: add skip_bits_unchecked and use it
show_bits(gb, 32) is called immediately above. It guarantees that
the following skip_bits call will not need to reload.
2026-05-14 02:55:53 +09:00
Lynne
5dc567a28e
vulkan/prores_raw: remove redundant fast golomb parsing path 2026-05-14 02:55:52 +09:00
Lynne
64f848890c
vulkan/prores_raw: use 16-bit/32-bit uints where needed
16-bit ints can overflow.
2026-05-14 02:55:52 +09:00
Lynne
74e3d63fb6
vulkan/prores_raw: use get_bits shared memory cache
50% speedup on AMD.
2026-05-14 02:55:52 +09:00
Lynne
67811c2754
vulkan/common: fix get_bit() with SMEM caching
First of all, it uses the wrong data pointer. Second, gb.bits wouldn't
get set if LOAD64 was called after the start of the stream.
2026-05-14 02:55:48 +09:00
Lynne
fd25b35dd2
vulkan_ffv1: support decoding 32-bit float video
Sponsored-by: Sovereign Tech Fund
2026-05-11 05:32:41 +09:00
Lynne
162ad61486
vulkan/ffv1: fix second-line linecache initialization for Golomb
This was a difficult problem to find.

Sponsored-by: Sovereign Tech Fund
2026-04-22 23:24:04 +02:00
Lynne
9c04a40136
vulkan/ffv1: implement floating-point decoding
Sponsored-by: Sovereign Tech Fund
2026-03-31 23:47:45 +02:00
Lynne
f5054f726d
ffv1enc_vulkan: implement floating-point encoding
Sponsored-by: Sovereign Tech Fund
2026-03-31 23:47:45 +02:00
Lynne
29b8614e62
vulkan/ffv1: fix bitstream initialization for Golomb
Was broken when we switched to descriptors.

Sponsored-by: Sovereign Tech Fund
2026-03-31 23:47:45 +02:00
IndecisiveTurtle
cebe0b577e lavc: implement a Vulkan-based prores encoder
Adds a vulkan implementation of the reference prores kostya encoder. Provides about 3-4x speedup over the CPU code
2026-03-05 14:02:39 +00:00
Lynne
13e063ceec
vulkan/ffv1: properly initialize the linecache 2026-02-22 03:39:23 +01:00
Lynne
c91634dfe6
vulkan/ffv1: add current linecache for encode/decode
This avoids needing expensive roundtrips when reading/writing to images,
mainly in the decoder.
2026-02-19 19:42:35 +01:00
Lynne
3f91ff8aa6
ffv1enc_vulkan: perform non-RGB prediction in 16-bits 2026-02-19 19:42:35 +01:00
Lynne
9d5421ad92
vulkan/ffv1: keep track of RCT Ry/By coeffs using vector suffixes
This makes it far easier to read, particularly because when reading
or writing, their order is swapped.
2026-02-19 19:42:35 +01:00
Lynne
b9c19c9073
ffv1enc_vulkan: use direct values rather than reading from struct
This saves indirection and allows compilers to eliminate more
code during compilation.
2026-02-19 19:42:35 +01:00
Lynne
fc10cc4a52
vulkan/ffv1: optimize get_isymbol 2026-02-19 19:42:34 +01:00
Lynne
f32111f3f7
vulkan/ffv1: improve compiler hints
Don't unroll unless needed, don't use const in function arguments,
don't use expect unless actually needed.
2026-02-19 19:42:34 +01:00
Lynne
e9645930dd
vulkan/ffv1_dec: synchronize image writes when decoding 2026-02-19 19:42:34 +01:00
Lynne
fb7700636c
vulkan/ffv1: synchronize before/after RCT transform/preload 2026-02-19 19:42:34 +01:00
Lynne
5ac9376763
vulkan/ffv1_dec_setup: roll a put_rac inside a loop
This saves 16KiB of memory.
Yeah, things go large when all compilers inline everything.
2026-02-19 19:42:33 +01:00
Lynne
33525cb6e7
vulkan/rangecoder: don't store pointers in the context 2026-02-19 19:42:33 +01:00
Lynne
5c1b2947a4
ffv1enc_vulkan: only return the encoded size, not its offset
The encoded offset is just a multiple of the index by the max slice size.
2026-02-19 19:42:33 +01:00
Lynne
2c138e2df5
vulkan/ffv1: use loops to encode planes
Every function in SPIR-V gets inlined, always. So use loops.
2026-02-19 19:42:33 +01:00
Lynne
10a26974cd
vulkan/ffv1: finalize and initialize slices only in invocation == 0 2026-02-19 19:42:33 +01:00
Lynne
bc968bc8b4
vulkan/ffv1_enc: cache state probabilities
4x speedup on AMD.
2026-02-19 19:42:33 +01:00
Lynne
826b72d12f
vulkan/ffv1: mark buffers as uniform/readonly when needed
Should be a speedup in most cases.
2026-02-19 19:42:32 +01:00
Lynne
3d74e0e63a
ffv1enc_vulkan: fix Golomb encoding
The issue is that the PB buffer address for Golomb may not be aligned
to mod 4.
2026-02-19 19:42:32 +01:00
Lynne
fdd0f21f5d
vulkan/ffv1_common: use scalar alignment for the base slice structure
Scalar is the fastest for modern GPUs to use.
2026-02-19 19:42:32 +01:00
Lynne
10407de110
vulkan/rangecoder: clean up unused functions and redundant fields 2026-02-19 19:42:32 +01:00
Lynne
dbc6fa5248
ffv1enc: use local RangeCoder struct 2026-02-19 19:42:31 +01:00
Lynne
b756d83e24
vulkan_ffv1: use local RangeCoder struct, refactor overread checking 2026-02-19 19:42:31 +01:00
Lynne
06eb98bc97
ffv1enc_vulkan: remove dead code 2026-02-19 19:42:31 +01:00