Commit graph

145 commits

Author SHA1 Message Date
Lynne
5ad8c67e6c
apv_decode: add a Vulkan hwaccel 2026-05-19 17:43:53 +09:00
Lynne
9c40552965
prores_raw: synchronize decoder with reference implementation
This completes the reverse engineering of the decoder.
The commit applies the linearization curve from the previous patch.
2026-05-17 12:17:16 +09:00
Lynne
d8cb567171
prores_raw: fix tile alignment issues
Reverse engineered the decoder a bit more. All tiles are always 16x1.
The issue is that at the edges, tiles don't have the same width.
Instead, the first tile that starts to clip is half, and then the
next tile after that is also half the previous tile's width.
2026-05-17 12:02:52 +09:00
Lynne
eb24fb0c7f
vulkan/common: fix LOAD64 again
duh, gb.buf is incremented in the loop and I missed that. ugh.
2026-05-17 12:02:52 +09:00
Lynne
2d826f18fb
vulkan/prores_raw: don't load the quantization matrix on every invocation 2026-05-14 02:55:53 +09:00
Lynne
13aabf726b
vulkan/prores_raw: specify format on image
Unlike other decoders or encoders, prores_raw only has a single
Vulkan format to worry about.
This is a 20% speedup on AMD, since AMD apparently has optimizations
for this.
2026-05-14 02:55:53 +09:00
Lynne
a2737497de
vulkan/prores_raw: add skip_bits_unchecked and use it
show_bits(gb, 32) is called immediately above. It guarantees that
the following skip_bits call will not need to reload.
2026-05-14 02:55:53 +09:00
Lynne
5dc567a28e
vulkan/prores_raw: remove redundant fast golomb parsing path 2026-05-14 02:55:52 +09:00
Lynne
64f848890c
vulkan/prores_raw: use 16-bit/32-bit uints where needed
16-bit ints can overflow.
2026-05-14 02:55:52 +09:00
Lynne
74e3d63fb6
vulkan/prores_raw: use get_bits shared memory cache
50% speedup on AMD.
2026-05-14 02:55:52 +09:00
Lynne
67811c2754
vulkan/common: fix get_bit() with SMEM caching
First of all, it uses the wrong data pointer. Second, gb.bits wouldn't
get set if LOAD64 was called after the start of the stream.
2026-05-14 02:55:48 +09:00
Lynne
fd25b35dd2
vulkan_ffv1: support decoding 32-bit float video
Sponsored-by: Sovereign Tech Fund
2026-05-11 05:32:41 +09:00
Lynne
162ad61486
vulkan/ffv1: fix second-line linecache initialization for Golomb
This was a difficult problem to find.

Sponsored-by: Sovereign Tech Fund
2026-04-22 23:24:04 +02:00
Lynne
9c04a40136
vulkan/ffv1: implement floating-point decoding
Sponsored-by: Sovereign Tech Fund
2026-03-31 23:47:45 +02:00
Lynne
f5054f726d
ffv1enc_vulkan: implement floating-point encoding
Sponsored-by: Sovereign Tech Fund
2026-03-31 23:47:45 +02:00
Lynne
29b8614e62
vulkan/ffv1: fix bitstream initialization for Golomb
Was broken when we switched to descriptors.

Sponsored-by: Sovereign Tech Fund
2026-03-31 23:47:45 +02:00
IndecisiveTurtle
cebe0b577e lavc: implement a Vulkan-based prores encoder
Adds a vulkan implementation of the reference prores kostya encoder. Provides about 3-4x speedup over the CPU code
2026-03-05 14:02:39 +00:00
Lynne
13e063ceec
vulkan/ffv1: properly initialize the linecache 2026-02-22 03:39:23 +01:00
Lynne
c91634dfe6
vulkan/ffv1: add current linecache for encode/decode
This avoids needing expensive roundtrips when reading/writing to images,
mainly in the decoder.
2026-02-19 19:42:35 +01:00
Lynne
3f91ff8aa6
ffv1enc_vulkan: perform non-RGB prediction in 16-bits 2026-02-19 19:42:35 +01:00
Lynne
9d5421ad92
vulkan/ffv1: keep track of RCT Ry/By coeffs using vector suffixes
This makes it far easier to read, particularly because when reading
or writing, their order is swapped.
2026-02-19 19:42:35 +01:00
Lynne
b9c19c9073
ffv1enc_vulkan: use direct values rather than reading from struct
This saves indirection and allows compilers to eliminate more
code during compilation.
2026-02-19 19:42:35 +01:00
Lynne
fc10cc4a52
vulkan/ffv1: optimize get_isymbol 2026-02-19 19:42:34 +01:00
Lynne
f32111f3f7
vulkan/ffv1: improve compiler hints
Don't unroll unless needed, don't use const in function arguments,
don't use expect unless actually needed.
2026-02-19 19:42:34 +01:00
Lynne
e9645930dd
vulkan/ffv1_dec: synchronize image writes when decoding 2026-02-19 19:42:34 +01:00
Lynne
fb7700636c
vulkan/ffv1: synchronize before/after RCT transform/preload 2026-02-19 19:42:34 +01:00
Lynne
5ac9376763
vulkan/ffv1_dec_setup: roll a put_rac inside a loop
This saves 16KiB of memory.
Yeah, things go large when all compilers inline everything.
2026-02-19 19:42:33 +01:00
Lynne
33525cb6e7
vulkan/rangecoder: don't store pointers in the context 2026-02-19 19:42:33 +01:00
Lynne
5c1b2947a4
ffv1enc_vulkan: only return the encoded size, not its offset
The encoded offset is just a multiple of the index by the max slice size.
2026-02-19 19:42:33 +01:00
Lynne
2c138e2df5
vulkan/ffv1: use loops to encode planes
Every function in SPIR-V gets inlined, always. So use loops.
2026-02-19 19:42:33 +01:00
Lynne
10a26974cd
vulkan/ffv1: finalize and initialize slices only in invocation == 0 2026-02-19 19:42:33 +01:00
Lynne
bc968bc8b4
vulkan/ffv1_enc: cache state probabilities
4x speedup on AMD.
2026-02-19 19:42:33 +01:00
Lynne
826b72d12f
vulkan/ffv1: mark buffers as uniform/readonly when needed
Should be a speedup in most cases.
2026-02-19 19:42:32 +01:00
Lynne
3d74e0e63a
ffv1enc_vulkan: fix Golomb encoding
The issue is that the PB buffer address for Golomb may not be aligned
to mod 4.
2026-02-19 19:42:32 +01:00
Lynne
fdd0f21f5d
vulkan/ffv1_common: use scalar alignment for the base slice structure
Scalar is the fastest for modern GPUs to use.
2026-02-19 19:42:32 +01:00
Lynne
10407de110
vulkan/rangecoder: clean up unused functions and redundant fields 2026-02-19 19:42:32 +01:00
Lynne
dbc6fa5248
ffv1enc: use local RangeCoder struct 2026-02-19 19:42:31 +01:00
Lynne
b756d83e24
vulkan_ffv1: use local RangeCoder struct, refactor overread checking 2026-02-19 19:42:31 +01:00
Lynne
06eb98bc97
ffv1enc_vulkan: remove dead code 2026-02-19 19:42:31 +01:00
Lynne
b230ba4db9
ffv1enc_vulkan: use regular descriptors for slice state 2026-02-19 19:42:30 +01:00
Lynne
c0a697a1bc
vulkan_ffv1: use regular descriptors for slice state
HUGE speedup on AMD, HUGE speedup everywhere.
2026-02-19 19:42:30 +01:00
Lynne
da99d3f209
vulkan_ffv1: implement parallel probability adaptation 2026-02-19 19:42:30 +01:00
Lynne
25e8d3d89c
vulkan/rangecoder: clean up the type mess slightly 2026-02-19 19:42:30 +01:00
Lynne
3bc265d484
ffv1enc_vulkan: make reset shader independent from the setup shader
Allows them to run in parallel.
2026-02-19 19:42:29 +01:00
Lynne
7234f1b167
ffv1enc_vulkan: use a loop to write slice header symbols
Same as with the decoder.
2026-02-19 19:42:29 +01:00
Lynne
fb5d3cf15e
vulkan_ffv1: use a loop to decode slice header symbols
All known drivers and implementations inline every single function.
This ends up being faster.
2026-02-19 19:42:29 +01:00
Lynne
eff3dad6b7
avcodec: remove support for runtime SPIR-V compilation
Begone.
2026-02-19 19:42:29 +01:00
Lynne
c4879dbbda
avcodec/vulkan: standardize on .glsl extension
None of the files are strictly compute now.
2026-02-19 19:42:29 +01:00
Lynne
b736d1c73e
ffv1enc_vulkan: convert encode shader to compile-time SPIR-V generation 2026-02-19 19:42:29 +01:00
Lynne
4038af3da8
ffv1enc_vulkan: convert setup shader to compile-time SPIR-V generation 2026-02-19 19:42:29 +01:00