ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-02-06 18:00:17 +00:00

Author	SHA1	Message	Date
Lynne	e51c549f6e	vulkan/dpx: drop using the nontemporal extension Its rarely respected by implementations, its fairly new (1 year old), and it has a scuffed define (neither glslc nor glslang enable the "GL_EXT_nontemporal_keyword" define if its enabled, unlike all other extensions).	2026-01-14 16:13:22 +01:00
Lynne	f2a55af9a4	vulkan_dpx: switch to compile-time SPIR-V generation	2026-01-12 17:28:43 +01:00
Lynne	0f4667fc11	vulkan_prores_raw: clean up and optimize	2026-01-12 17:28:42 +01:00
Lynne	23ab1b1a66	vulkan/dct: embed DCT scaling values during SPIR-V generation Instead of relying on rounded off values, use specialization constants to bake the DCT values into the shader when its compiled.	2026-01-12 17:28:42 +01:00
Lynne	e27b510da8	vulkan_prores: generate SPIR-V at compile-time	2026-01-12 17:28:42 +01:00
Lynne	026e94e339	vulkan_prores_raw: use compile-time SPIR-V generation	2026-01-12 17:28:42 +01:00
Lynne	f2affdfafb	configure/make: support compile-time SPIR-V generation	2026-01-12 17:28:40 +01:00
Lynne	58bd5ad630	vulkan/prores_raw_idct: use the same prores_idct method for loading coeffs This saves 2 barriers. Also implement workbank avoidance.	2025-12-31 15:00:47 +01:00
Lynne	8db6947700	vulkan_prores_raw: reduce zigzag table size No need for full 32-bits.	2025-12-22 19:46:27 +01:00
Lynne	cfcf52a08c	vulkan: deduplicate shorthand casting defines to common.comp	2025-12-22 19:46:27 +01:00
Lynne	6eced88188	vulkan: merge ProRes and ProRes RAW iDCTs This cleans up the code a bit, and reduces binary size.	2025-12-22 19:46:26 +01:00
averne	b9078c0939	vulkan/prores: copy constant tables to shared memory The shader needs ~3 loads per DCT coeff. This data was not observed to get efficiently stored in the upper cached levels, loading it explicitely in shared memory fixes that. Also reduce code size by moving the bitstream initialization outside of the switch/case.	2025-12-15 12:29:00 +00:00
averne	a2475d16ed	lavc/vulkan/common: allow configurable bitstream caching in shared memory	2025-12-15 12:29:00 +00:00
Lynne	9e8e34d475	vulkan_ffv1: remove unused RCT shader files The 2 files were made redundant when the RCT was merged into encode/decode.	2025-12-13 22:12:26 +01:00
Lynne	5bb9cd23b7	vulkan_dpx: fix GRAY16BE and big-endian marked 8-bit samples	2025-12-13 21:35:56 +01:00
Lynne	c3291993eb	vulkan_ffv1: use proper rounded divisions for plane width and height Fixes #20314	2025-12-13 19:12:24 +01:00
Ruikai Peng	c48b8ebbbb	avcodec/vulkan: fix DPX unpack offset The DPX Vulkan unpack shader computes a word offset as uint off = (line_off + pix_off >> 5); Due to GLSL operator precedence this is evaluated as line_off + (pix_off >> 5) rather than (line_off + pix_off) >> 5. Since line_off is in bits while off is a 32-bit word index, scanlines beyond y=0 use an inflated offset and the shader reads past the end of the DPX slice buffer. Parenthesize the expression so that the sum is shifted as intended: uint off = (line_off + pix_off) >> 5; This corrects the unpacked data and removes the CRC mismatch observed between the software and Vulkan DPX decoders for mispacked 12-bit DPX samples. The GPU OOB read itself is only observable indirectly via this corruption since it occurs inside the shader. Repro on x86_64 with Vulkan/llvmpipe (`531ce713a0`): ./configure --cc=clang --disable-optimizations --disable-stripping \ --enable-debug=3 --disable-doc --disable-ffplay \ --enable-vulkan --enable-libshaderc \ --enable-hwaccel=dpx_vulkan \ --extra-cflags='-fsanitize=address -fno-omit-frame-pointer' \ --extra-ldflags='-fsanitize=address' && make VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.json PoC: packed 12-bit DPX with the packing flag cleared so the unpack shader runs (4x64 gbrp12le), e.g. poc12_packed0.dpx. Software decode: ./ffmpeg -v error -i poc12_packed0.dpx -f framecrc - -> 0, ..., 1536, 0x26cf81c2 Vulkan hwaccel decode: VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.json \ ./ffmpeg -v error -init_hw_device vulkan \ -hwaccel vulkan -hwaccel_output_format vulkan \ -i poc12_packed0.dpx \ -vf hwdownload,format=gbrp12le -f framecrc - -> 0, ..., 1536, 0x71e10a51 The only difference between the two runs is the Vulkan unpack shader, and the stable CRC mismatch indicates that it is reading past the intended DPX slice region. Regression since: `531ce713a0` Found-by: Pwno	2025-12-12 20:13:16 +00:00
averne	c384b1e803	vulkan/prores: use vkCmdClearColorImage The VK spec forbids using clear commands on YUV images, so we need to allocate separate per-plane images. This removes the need for a separate reset shader.	2025-12-07 18:17:36 +00:00
Lynne	f80addbb07	ffv1enc_vulkan: fix encoding with large contexts When RGB_LINECACHE == 2, then top2 is not the current line.	2025-12-04 16:53:58 +01:00
Lynne	9b14ea0aa1	vulkan_dpx: fix alignment issue 12-bit images apparently require mod-32 alignment for each line. Go figure.	2025-12-04 15:08:46 +01:00
averne	fd2fd3828c	libavcodec/vulkan: remove unnessary member in GetBitContext The number of remaining bits can be calculated using existing state. This simplifies calculations and frees up one register.	2025-11-30 19:21:08 +01:00
averne	ef7354d471	libavcodec/vulkan: introduce cached bitstream reader This stores a small buffer in shared memory per decode thread (16 bytes), which helps reduce the number of memory accesses. The bitstream buffer is first aligned to a 4 byte boundary, so that the buffer can be filled with a single memory request.	2025-11-30 19:21:04 +01:00
averne	1c5bb1b12d	vulkan/prores: normalize coefficients during IDCT This allows increased internal precision. In addition, we can introduce an offset to the DC coefficient during the second IDCT step, to remove a per-element addition in the output codepath. Finally, by processing columns first we can remove the barrier after loading coefficients. Signed-off-by: averne <averne381@gmail.com>	2025-11-29 17:56:28 +01:00
averne	1982add485	vulkan/prores: fix dequantization for 4:2:2 subsampling Bug introduced in `d00f41f` due to an oversight.	2025-11-29 17:27:21 +01:00
Lynne	531ce713a0	dpxdec: add a Vulkan hwaccel	2025-11-26 15:16:43 +01:00
Lynne	7af5b5cec3	vulkan_prores_raw: use the native image representation It allows us to easily synchronize the software and hardware decoders, by removing the abstraction the Vulkan layer added by changing the values written.	2025-11-26 15:16:42 +01:00
Lynne	a811a6885a	vulkan_prores_raw: read the header length rather than assuming its 8 In all known samples, it is equal to 8.	2025-11-26 15:16:42 +01:00
Lynne	0db891366d	vulkan_prores_raw: fix dynamically non-uniform accesses to pushconsts The Vulkan spec requires that all accesses to push data are uniform for all invocations (e.g. can't be based on gl_WorkGroupID or gl_LocalInvocationID).	2025-11-26 15:16:41 +01:00
Lynne	bb30a0d0d8	vulkan_prores_raw: split up decoding and DCT This commit optimizes the Vulkan decoder by splitting up decoding from iDCT, and merging the few tables needed directly into the shader. The speedup on Intel is 10x.	2025-11-26 15:16:41 +01:00
Lynne	615b26f1b1	vulkan_ffv1: fix swapped colors for x2bgr10	2025-11-26 15:16:40 +01:00
Lynne	d36d88dcbb	vulkan/common: add reverse2 endian reversal macro	2025-11-26 15:16:39 +01:00
averne	1d84ab331c	vulkan/prores: Adopt the same IDCT routine as the prores-raw hwaccel The added rounding at the final output conforms to the SMPTE document and reduces the deviation against the software decoder.	2025-11-25 17:54:56 +00:00
Lynne	9860017495	vulkan/ffv1: use u32vec2 for slice offsets Simplifies calculations slightly.	2025-11-12 00:37:24 +01:00
averne	d00f41f213	vulkan/prores: forward quantization parameter to the IDCT shader The qScale syntax element has a maximum value of 512, which would overflow the 16-bit store from the VLD shader in extreme cases. This fixes that edge case by forwarding the element in a storage buffer, and applying the inverse quantization fully in the IDCT shader.	2025-11-08 22:31:21 +00:00
Lynne	6720f71247	Revert "vulkan/prores: output LSB-padded data" This reverts commit `909d71322a`. The issue was elsewhere, not in our code.	2025-11-06 21:46:43 +01:00
averne	909d71322a	vulkan/prores: output LSB-padded data For consistency with existing Vulkan-based hwaccels	2025-10-28 06:12:14 +00:00
Lynne	51843adfe5	vulkan/rangecoder: ifdef out encode and decode chunks There's little code sharing between them.	2025-10-28 07:11:26 +01:00
averne	98412edfed	lavc: add a ProRes Vulkan hwaccel Add a shader-based Apple ProRes decoder. It supports all codec features for profiles up to the 4444 XQ profile, ie.: - 4:2:2 and 4:4:4 chroma subsampling - 10- and 12-bit component depth - Interlacing - Alpha The implementation consists in two shaders: the VLD kernel does entropy decoding for color/alpha, and the IDCT kernel performs the inverse transform on color components. Benchmarks for a 4k yuv422p10 sample: - AMD Radeon 6700XT: 178 fps - Intel i7 Tiger Lake: 37 fps - NVidia Orin Nano: 70 fps	2025-10-25 19:54:13 +00:00
Lynne	75aeffb1c6	lavc: add a ProRes RAW Vulkan hwaccel This commit adds a ProRes RAW hardware implementation written in Vulkan. Both version 0 and version 1 streams are supported. The implementation is highly parallelized, with 512 invocations dispatched per every tile, with generally 4k tiles on a 5.8k stream. Thanks to unlord for the 8-point iDCT. Benchmark for a generic 5.8k RAW HQ file: 6900XT: 63fps 7900XTX: 84fps 6000 Ada: 120fps Intel: 9fps	2025-08-08 18:29:41 +09:00
Lynne	2c3315b04c	lavc/vulkan/common: sign-ify lengths This makes left_bits return useful data rather than overflowing, and also saves some 64-bit integer operations, which is still always a plus sadly.	2025-08-05 23:51:21 +09:00
Timo Rothenpieler	262d41c804	all: fix typos found by codespell	2025-08-03 13:48:47 +02:00
Lynne	3cbe3418b2	vulkan_ffv1: fix golomb coding for non-RGB streams The run_index is reset on each plane, unlike with RGB, where its reset once per slice.	2025-05-27 06:40:33 +09:00
Lynne	c395ad7c2c	vulkan_ffv1: small cleanup for golomb Split up computation of the offset in the same way that the range coder version does it.	2025-05-27 06:40:29 +09:00
Lynne	977d1a24bc	vulkan/ffv1: fix sync issue in cached bitstream reader/writer The issue is that there is an explicit lack of synchronization as only the very first invocation writes symbols and updates the state, which other invocations then store.	2025-05-23 05:23:44 +09:00
Lynne	7b45d9c5fd	vulkan_ffv1: pipe through slice decoding status	2025-05-20 19:53:02 +09:00
Lynne	cb8f4b675d	vulkan/ffv1: unify encode and decode get/put primitives This simply makes a get_rac/put_rac_internal variant that can be reused.	2025-05-20 19:53:02 +09:00
Lynne	7576410af7	ffv1enc_vulkan: implement RCT search for level >= 4	2025-05-20 19:53:01 +09:00
Lynne	0156680f09	ffv1enc_vulkan: implement the cached EC writer from the decoder This gives a 35% speedup on AMD and 50% on Nvidia.	2025-05-20 19:53:01 +09:00
Lynne	a24ea37228	vulkan_ffv1: fix PCM + cached symbol reader writeout_rgb requires that all subgroups are active.	2025-05-20 19:53:01 +09:00
Lynne	8a2d921627	ffv1_common: minor RGB optimization	2025-05-20 19:53:01 +09:00

1 2

84 commits