ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-02-14 11:20:49 +00:00

Author	SHA1	Message	Date
averne	c384b1e803	vulkan/prores: use vkCmdClearColorImage The VK spec forbids using clear commands on YUV images, so we need to allocate separate per-plane images. This removes the need for a separate reset shader.	2025-12-07 18:17:36 +00:00
Lynne	f80addbb07	ffv1enc_vulkan: fix encoding with large contexts When RGB_LINECACHE == 2, then top2 is not the current line.	2025-12-04 16:53:58 +01:00
Lynne	9b14ea0aa1	vulkan_dpx: fix alignment issue 12-bit images apparently require mod-32 alignment for each line. Go figure.	2025-12-04 15:08:46 +01:00
averne	fd2fd3828c	libavcodec/vulkan: remove unnessary member in GetBitContext The number of remaining bits can be calculated using existing state. This simplifies calculations and frees up one register.	2025-11-30 19:21:08 +01:00
averne	ef7354d471	libavcodec/vulkan: introduce cached bitstream reader This stores a small buffer in shared memory per decode thread (16 bytes), which helps reduce the number of memory accesses. The bitstream buffer is first aligned to a 4 byte boundary, so that the buffer can be filled with a single memory request.	2025-11-30 19:21:04 +01:00
averne	1c5bb1b12d	vulkan/prores: normalize coefficients during IDCT This allows increased internal precision. In addition, we can introduce an offset to the DC coefficient during the second IDCT step, to remove a per-element addition in the output codepath. Finally, by processing columns first we can remove the barrier after loading coefficients. Signed-off-by: averne <averne381@gmail.com>	2025-11-29 17:56:28 +01:00
averne	1982add485	vulkan/prores: fix dequantization for 4:2:2 subsampling Bug introduced in `d00f41f` due to an oversight.	2025-11-29 17:27:21 +01:00
Lynne	531ce713a0	dpxdec: add a Vulkan hwaccel	2025-11-26 15:16:43 +01:00
Lynne	7af5b5cec3	vulkan_prores_raw: use the native image representation It allows us to easily synchronize the software and hardware decoders, by removing the abstraction the Vulkan layer added by changing the values written.	2025-11-26 15:16:42 +01:00
Lynne	a811a6885a	vulkan_prores_raw: read the header length rather than assuming its 8 In all known samples, it is equal to 8.	2025-11-26 15:16:42 +01:00
Lynne	0db891366d	vulkan_prores_raw: fix dynamically non-uniform accesses to pushconsts The Vulkan spec requires that all accesses to push data are uniform for all invocations (e.g. can't be based on gl_WorkGroupID or gl_LocalInvocationID).	2025-11-26 15:16:41 +01:00
Lynne	bb30a0d0d8	vulkan_prores_raw: split up decoding and DCT This commit optimizes the Vulkan decoder by splitting up decoding from iDCT, and merging the few tables needed directly into the shader. The speedup on Intel is 10x.	2025-11-26 15:16:41 +01:00
Lynne	615b26f1b1	vulkan_ffv1: fix swapped colors for x2bgr10	2025-11-26 15:16:40 +01:00
Lynne	d36d88dcbb	vulkan/common: add reverse2 endian reversal macro	2025-11-26 15:16:39 +01:00
averne	1d84ab331c	vulkan/prores: Adopt the same IDCT routine as the prores-raw hwaccel The added rounding at the final output conforms to the SMPTE document and reduces the deviation against the software decoder.	2025-11-25 17:54:56 +00:00
Lynne	9860017495	vulkan/ffv1: use u32vec2 for slice offsets Simplifies calculations slightly.	2025-11-12 00:37:24 +01:00
averne	d00f41f213	vulkan/prores: forward quantization parameter to the IDCT shader The qScale syntax element has a maximum value of 512, which would overflow the 16-bit store from the VLD shader in extreme cases. This fixes that edge case by forwarding the element in a storage buffer, and applying the inverse quantization fully in the IDCT shader.	2025-11-08 22:31:21 +00:00
Lynne	6720f71247	Revert "vulkan/prores: output LSB-padded data" This reverts commit `909d71322a`. The issue was elsewhere, not in our code.	2025-11-06 21:46:43 +01:00
averne	909d71322a	vulkan/prores: output LSB-padded data For consistency with existing Vulkan-based hwaccels	2025-10-28 06:12:14 +00:00
Lynne	51843adfe5	vulkan/rangecoder: ifdef out encode and decode chunks There's little code sharing between them.	2025-10-28 07:11:26 +01:00
averne	98412edfed	lavc: add a ProRes Vulkan hwaccel Add a shader-based Apple ProRes decoder. It supports all codec features for profiles up to the 4444 XQ profile, ie.: - 4:2:2 and 4:4:4 chroma subsampling - 10- and 12-bit component depth - Interlacing - Alpha The implementation consists in two shaders: the VLD kernel does entropy decoding for color/alpha, and the IDCT kernel performs the inverse transform on color components. Benchmarks for a 4k yuv422p10 sample: - AMD Radeon 6700XT: 178 fps - Intel i7 Tiger Lake: 37 fps - NVidia Orin Nano: 70 fps	2025-10-25 19:54:13 +00:00
Lynne	75aeffb1c6	lavc: add a ProRes RAW Vulkan hwaccel This commit adds a ProRes RAW hardware implementation written in Vulkan. Both version 0 and version 1 streams are supported. The implementation is highly parallelized, with 512 invocations dispatched per every tile, with generally 4k tiles on a 5.8k stream. Thanks to unlord for the 8-point iDCT. Benchmark for a generic 5.8k RAW HQ file: 6900XT: 63fps 7900XTX: 84fps 6000 Ada: 120fps Intel: 9fps	2025-08-08 18:29:41 +09:00
Lynne	2c3315b04c	lavc/vulkan/common: sign-ify lengths This makes left_bits return useful data rather than overflowing, and also saves some 64-bit integer operations, which is still always a plus sadly.	2025-08-05 23:51:21 +09:00
Timo Rothenpieler	262d41c804	all: fix typos found by codespell	2025-08-03 13:48:47 +02:00
Lynne	3cbe3418b2	vulkan_ffv1: fix golomb coding for non-RGB streams The run_index is reset on each plane, unlike with RGB, where its reset once per slice.	2025-05-27 06:40:33 +09:00
Lynne	c395ad7c2c	vulkan_ffv1: small cleanup for golomb Split up computation of the offset in the same way that the range coder version does it.	2025-05-27 06:40:29 +09:00
Lynne	977d1a24bc	vulkan/ffv1: fix sync issue in cached bitstream reader/writer The issue is that there is an explicit lack of synchronization as only the very first invocation writes symbols and updates the state, which other invocations then store.	2025-05-23 05:23:44 +09:00
Lynne	7b45d9c5fd	vulkan_ffv1: pipe through slice decoding status	2025-05-20 19:53:02 +09:00
Lynne	cb8f4b675d	vulkan/ffv1: unify encode and decode get/put primitives This simply makes a get_rac/put_rac_internal variant that can be reused.	2025-05-20 19:53:02 +09:00
Lynne	7576410af7	ffv1enc_vulkan: implement RCT search for level >= 4	2025-05-20 19:53:01 +09:00
Lynne	0156680f09	ffv1enc_vulkan: implement the cached EC writer from the decoder This gives a 35% speedup on AMD and 50% on Nvidia.	2025-05-20 19:53:01 +09:00
Lynne	a24ea37228	vulkan_ffv1: fix PCM + cached symbol reader writeout_rgb requires that all subgroups are active.	2025-05-20 19:53:01 +09:00
Lynne	8a2d921627	ffv1_common: minor RGB optimization	2025-05-20 19:53:01 +09:00
Lynne	bd41838b60	ffv1enc_vulkan: switch to 2-line cache, unify prediction code	2025-05-20 19:53:01 +09:00
Lynne	52595025c5	ffv1enc_vulkan: minor EC optimizations	2025-05-20 19:53:01 +09:00
Lynne	7c0a8c07ce	ffv1enc_vulkan: unify EC code between setup and encode	2025-05-20 19:53:00 +09:00
Lynne	69f83bafd1	ffv1enc_vulkan: get rid of temporary data for the setup shader	2025-05-20 19:53:00 +09:00
Lynne	a4078abd73	vulkan/ffv1: synchronize get_pred implementations between encoder and decoder	2025-05-20 19:53:00 +09:00
Lynne	ebbc7ff650	ffv1enc_vulkan: merge all encoder variants into one file Makes it easier to work with, despite the heavy ifdeffery.	2025-05-20 19:52:55 +09:00
Lynne	707c04fe06	ffv1enc_vulkan: support 8 and 16-bit 2-plane YUV formats This adds support for all 8-bit and 16-bit 2-plane formats. P010 and others require more work as the data's LSB-padded.	2025-05-01 09:34:44 +02:00
Lynne	36c6c66deb	vulkan/rangecoder: minor cleanup	2025-04-16 23:38:16 +02:00
Lynne	29b85cd4b8	vulkan_ffv1: add cached symbol reader for AMD Speeds up everything on AMD by 3x. This uses 32 local invocations to load state into cache, as well as to do the RCT faster.	2025-04-14 06:10:43 +02:00
Lynne	e040c087c7	vulkan: add support for expect/assume This commit adds support for compiler hints. While on AMD these are not used/needed, Nvidia benefits from them, and gives a sizeable 10% speedup on 4k.	2025-04-14 06:10:43 +02:00
Lynne	985a26be28	vulkan_ffv1: shortcut +-1 coeffs in symbol reading Slightly faster, and allows for further optimizations.	2025-04-14 06:10:43 +02:00
Lynne	4d561e6a1e	vulkan_ffv1: remove need for scratch data during setup This saves on some VRAM, but mainly allows for a more unified path.	2025-04-14 06:10:43 +02:00
Lynne	8ceabb677c	vulkan_ffv1: externalize extended lookup check 8% speedup on nvidia on 4k.	2025-04-14 06:10:43 +02:00
Lynne	77f777d925	ffv1/vulkan: redo context count tracking and quant_table_idx management This commit also makes it possible for the encoder to choose a different quantization table on a per-slice basis, as well as adding this capability to the decoder. Also, this commit fully fixes decoding of context=1 encoded files.	2025-04-14 06:10:42 +02:00
Lynne	66b8c92df2	vulkan_ffv1: cache only 2 lines when decoding RGB This reduces the intermediate VRAM used for RGB decoding by a factor of 100x for 6k video. This also speeds the decoder up by 16% for 4k RGB24 and 31% for 6k video. This is equivalent to what the software decoder does, but with less pointers.	2025-04-14 06:10:42 +02:00
Lynne	72953477a4	vulkan_ffv1: fix left-2 sample addressing Typo. Not enough to fix context=1, but its a start.	2025-04-14 06:10:42 +02:00
Lynne	45d7abf6d9	vulkan_ffv1: init overread/corrupt fields Forgotten.	2025-04-14 06:10:42 +02:00

1 2

67 commits