Lynne
4d63e3dd4c
vulkan_ffv1: add Bayer encoder
...
Sponsored-by: Sovereign Tech Fund
2026-06-03 14:12:50 +09:00
Lynne
713f191c24
vulkan_ffv1: add Bayer decoder
...
Sponsored-by: Sovereign Tech Fund
2026-06-03 14:12:50 +09:00
Lynne
d66552e676
vulkan/ffv1: add 32-bit float RGB encoding and a rice + remap path
...
This implements 32-bit float RGB encoding and makes the Vulkan implementation
on-par with the C implementation.
Sponsored-by: Sovereign Tech Fund
2026-05-30 12:10:01 +09:00
Lynne
9a6b5ca197
vulkan/ffv1_enc_rct_search: fix slice dimension iterations
...
This was a mess, we were using incorrect pixels outside of the image boundaries as
valid, the iteration had undefined behaviour since it was non-uniform across the workgroup.
Calculate the per-invoc iterations from the slice dimensions instead, making all of
them identical. And add a valid flag to decide whether to use them or not. And fix the
synchronization.
Sponsored-by: Sovereign Tech Fund
2026-05-26 17:47:04 +09:00
Lynne
9cabb12f74
vulkan/ffv1_enc_rct_search: write slice_rct_coef directly by main invoc
...
The issue is that SliceContext was passed as an inout, which caused all
invocs to locally copy and modify it.
When the main invoc wrote it, only the very last written value was used,
choosing the wrong coeffs.
Sponsored-by: Sovereign Tech Fund
2026-05-26 17:47:04 +09:00
Lynne
5fc56fbf96
vulkan/ffv1_enc_rct_search: barrier before reading score_mode
...
There was a race condition where the main invocation would race ahead and use
values not yet written by other invocs.
Sponsored-by: Sovereign Tech Fund
2026-05-26 17:47:04 +09:00
Lynne
2806afd28f
vulkan/ffv1: read raw 16-bit float images via R16_UINT view to preserve denormals
...
GPUs filter out denormals when reading floats via imageLoad. Denormals shouldn't
be present in general, but if they are, this is a lossless codec, and we have to
preserve them. This allows reading the exact values.
Sponsored-by: Sovereign Tech Fund
2026-05-26 17:47:04 +09:00
Lynne
50e6668c83
vulkan/ffv1_enc: skip GOLOMB encode_line when !bits for FLOAT formats
...
Same as the arithmetic coded path. I skipped out on adding this here.
Sponsored-by: Sovereign Tech Fund
2026-05-26 17:47:04 +09:00
Lynne
e14e43aeaa
vulkan/ffv1_enc: pass the correct base and offset to OFFBUF in init_golomb
...
Ugh, my previous fix on this was only right in some cases, this is a general fix.
Sponsored-by: Sovereign Tech Fund
2026-05-26 17:47:03 +09:00
Lynne
d1e0a292ce
vulkan/ffv1_enc_remap: clear the full 65536-entry fltmap
...
Float pixfmts are meant to be normalized between [0, 1], but in case they
were not, and negative numbers were present, then the top bits would be
filled with garbage.
Sponsored-by: Sovereign Tech Fund
2026-05-26 17:47:03 +09:00
Lynne
4675271e7a
vulkan/rangecoder: fix encoding issue when -1 != 0xFF
...
This was an oversight while microoptimizing. The outstanding_byte can
reach 0xFF in some situations, which was causing errors when encoding,
particularly with 32-bit floats.
Sponsored-by: Sovereign Tech Fund
2026-05-26 17:46:59 +09:00
Lynne
5ad8c67e6c
apv_decode: add a Vulkan hwaccel
2026-05-19 17:43:53 +09:00
Lynne
9c40552965
prores_raw: synchronize decoder with reference implementation
...
This completes the reverse engineering of the decoder.
The commit applies the linearization curve from the previous patch.
2026-05-17 12:17:16 +09:00
Lynne
d8cb567171
prores_raw: fix tile alignment issues
...
Reverse engineered the decoder a bit more. All tiles are always 16x1.
The issue is that at the edges, tiles don't have the same width.
Instead, the first tile that starts to clip is half, and then the
next tile after that is also half the previous tile's width.
2026-05-17 12:02:52 +09:00
Lynne
eb24fb0c7f
vulkan/common: fix LOAD64 again
...
duh, gb.buf is incremented in the loop and I missed that. ugh.
2026-05-17 12:02:52 +09:00
Lynne
2d826f18fb
vulkan/prores_raw: don't load the quantization matrix on every invocation
2026-05-14 02:55:53 +09:00
Lynne
13aabf726b
vulkan/prores_raw: specify format on image
...
Unlike other decoders or encoders, prores_raw only has a single
Vulkan format to worry about.
This is a 20% speedup on AMD, since AMD apparently has optimizations
for this.
2026-05-14 02:55:53 +09:00
Lynne
a2737497de
vulkan/prores_raw: add skip_bits_unchecked and use it
...
show_bits(gb, 32) is called immediately above. It guarantees that
the following skip_bits call will not need to reload.
2026-05-14 02:55:53 +09:00
Lynne
5dc567a28e
vulkan/prores_raw: remove redundant fast golomb parsing path
2026-05-14 02:55:52 +09:00
Lynne
64f848890c
vulkan/prores_raw: use 16-bit/32-bit uints where needed
...
16-bit ints can overflow.
2026-05-14 02:55:52 +09:00
Lynne
74e3d63fb6
vulkan/prores_raw: use get_bits shared memory cache
...
50% speedup on AMD.
2026-05-14 02:55:52 +09:00
Lynne
67811c2754
vulkan/common: fix get_bit() with SMEM caching
...
First of all, it uses the wrong data pointer. Second, gb.bits wouldn't
get set if LOAD64 was called after the start of the stream.
2026-05-14 02:55:48 +09:00
Lynne
fd25b35dd2
vulkan_ffv1: support decoding 32-bit float video
...
Sponsored-by: Sovereign Tech Fund
2026-05-11 05:32:41 +09:00
Lynne
162ad61486
vulkan/ffv1: fix second-line linecache initialization for Golomb
...
This was a difficult problem to find.
Sponsored-by: Sovereign Tech Fund
2026-04-22 23:24:04 +02:00
Lynne
9c04a40136
vulkan/ffv1: implement floating-point decoding
...
Sponsored-by: Sovereign Tech Fund
2026-03-31 23:47:45 +02:00
Lynne
f5054f726d
ffv1enc_vulkan: implement floating-point encoding
...
Sponsored-by: Sovereign Tech Fund
2026-03-31 23:47:45 +02:00
Lynne
29b8614e62
vulkan/ffv1: fix bitstream initialization for Golomb
...
Was broken when we switched to descriptors.
Sponsored-by: Sovereign Tech Fund
2026-03-31 23:47:45 +02:00
IndecisiveTurtle
cebe0b577e
lavc: implement a Vulkan-based prores encoder
...
Adds a vulkan implementation of the reference prores kostya encoder. Provides about 3-4x speedup over the CPU code
2026-03-05 14:02:39 +00:00
Lynne
13e063ceec
vulkan/ffv1: properly initialize the linecache
2026-02-22 03:39:23 +01:00
Lynne
c91634dfe6
vulkan/ffv1: add current linecache for encode/decode
...
This avoids needing expensive roundtrips when reading/writing to images,
mainly in the decoder.
2026-02-19 19:42:35 +01:00
Lynne
3f91ff8aa6
ffv1enc_vulkan: perform non-RGB prediction in 16-bits
2026-02-19 19:42:35 +01:00
Lynne
9d5421ad92
vulkan/ffv1: keep track of RCT Ry/By coeffs using vector suffixes
...
This makes it far easier to read, particularly because when reading
or writing, their order is swapped.
2026-02-19 19:42:35 +01:00
Lynne
b9c19c9073
ffv1enc_vulkan: use direct values rather than reading from struct
...
This saves indirection and allows compilers to eliminate more
code during compilation.
2026-02-19 19:42:35 +01:00
Lynne
fc10cc4a52
vulkan/ffv1: optimize get_isymbol
2026-02-19 19:42:34 +01:00
Lynne
f32111f3f7
vulkan/ffv1: improve compiler hints
...
Don't unroll unless needed, don't use const in function arguments,
don't use expect unless actually needed.
2026-02-19 19:42:34 +01:00
Lynne
e9645930dd
vulkan/ffv1_dec: synchronize image writes when decoding
2026-02-19 19:42:34 +01:00
Lynne
fb7700636c
vulkan/ffv1: synchronize before/after RCT transform/preload
2026-02-19 19:42:34 +01:00
Lynne
5ac9376763
vulkan/ffv1_dec_setup: roll a put_rac inside a loop
...
This saves 16KiB of memory.
Yeah, things go large when all compilers inline everything.
2026-02-19 19:42:33 +01:00
Lynne
33525cb6e7
vulkan/rangecoder: don't store pointers in the context
2026-02-19 19:42:33 +01:00
Lynne
5c1b2947a4
ffv1enc_vulkan: only return the encoded size, not its offset
...
The encoded offset is just a multiple of the index by the max slice size.
2026-02-19 19:42:33 +01:00
Lynne
2c138e2df5
vulkan/ffv1: use loops to encode planes
...
Every function in SPIR-V gets inlined, always. So use loops.
2026-02-19 19:42:33 +01:00
Lynne
10a26974cd
vulkan/ffv1: finalize and initialize slices only in invocation == 0
2026-02-19 19:42:33 +01:00
Lynne
bc968bc8b4
vulkan/ffv1_enc: cache state probabilities
...
4x speedup on AMD.
2026-02-19 19:42:33 +01:00
Lynne
826b72d12f
vulkan/ffv1: mark buffers as uniform/readonly when needed
...
Should be a speedup in most cases.
2026-02-19 19:42:32 +01:00
Lynne
3d74e0e63a
ffv1enc_vulkan: fix Golomb encoding
...
The issue is that the PB buffer address for Golomb may not be aligned
to mod 4.
2026-02-19 19:42:32 +01:00
Lynne
fdd0f21f5d
vulkan/ffv1_common: use scalar alignment for the base slice structure
...
Scalar is the fastest for modern GPUs to use.
2026-02-19 19:42:32 +01:00
Lynne
10407de110
vulkan/rangecoder: clean up unused functions and redundant fields
2026-02-19 19:42:32 +01:00
Lynne
dbc6fa5248
ffv1enc: use local RangeCoder struct
2026-02-19 19:42:31 +01:00
Lynne
b756d83e24
vulkan_ffv1: use local RangeCoder struct, refactor overread checking
2026-02-19 19:42:31 +01:00
Lynne
06eb98bc97
ffv1enc_vulkan: remove dead code
2026-02-19 19:42:31 +01:00