ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-02-07 02:10:00 +00:00

Author	SHA1	Message	Date
Lynne	3cbe3418b2	vulkan_ffv1: fix golomb coding for non-RGB streams The run_index is reset on each plane, unlike with RGB, where its reset once per slice.	2025-05-27 06:40:33 +09:00
Lynne	c395ad7c2c	vulkan_ffv1: small cleanup for golomb Split up computation of the offset in the same way that the range coder version does it.	2025-05-27 06:40:29 +09:00
Lynne	977d1a24bc	vulkan/ffv1: fix sync issue in cached bitstream reader/writer The issue is that there is an explicit lack of synchronization as only the very first invocation writes symbols and updates the state, which other invocations then store.	2025-05-23 05:23:44 +09:00
Lynne	7b45d9c5fd	vulkan_ffv1: pipe through slice decoding status	2025-05-20 19:53:02 +09:00
Lynne	cb8f4b675d	vulkan/ffv1: unify encode and decode get/put primitives This simply makes a get_rac/put_rac_internal variant that can be reused.	2025-05-20 19:53:02 +09:00
Lynne	7576410af7	ffv1enc_vulkan: implement RCT search for level >= 4	2025-05-20 19:53:01 +09:00
Lynne	0156680f09	ffv1enc_vulkan: implement the cached EC writer from the decoder This gives a 35% speedup on AMD and 50% on Nvidia.	2025-05-20 19:53:01 +09:00
Lynne	a24ea37228	vulkan_ffv1: fix PCM + cached symbol reader writeout_rgb requires that all subgroups are active.	2025-05-20 19:53:01 +09:00
Lynne	8a2d921627	ffv1_common: minor RGB optimization	2025-05-20 19:53:01 +09:00
Lynne	bd41838b60	ffv1enc_vulkan: switch to 2-line cache, unify prediction code	2025-05-20 19:53:01 +09:00
Lynne	52595025c5	ffv1enc_vulkan: minor EC optimizations	2025-05-20 19:53:01 +09:00
Lynne	7c0a8c07ce	ffv1enc_vulkan: unify EC code between setup and encode	2025-05-20 19:53:00 +09:00
Lynne	69f83bafd1	ffv1enc_vulkan: get rid of temporary data for the setup shader	2025-05-20 19:53:00 +09:00
Lynne	a4078abd73	vulkan/ffv1: synchronize get_pred implementations between encoder and decoder	2025-05-20 19:53:00 +09:00
Lynne	ebbc7ff650	ffv1enc_vulkan: merge all encoder variants into one file Makes it easier to work with, despite the heavy ifdeffery.	2025-05-20 19:52:55 +09:00
Lynne	707c04fe06	ffv1enc_vulkan: support 8 and 16-bit 2-plane YUV formats This adds support for all 8-bit and 16-bit 2-plane formats. P010 and others require more work as the data's LSB-padded.	2025-05-01 09:34:44 +02:00
Lynne	36c6c66deb	vulkan/rangecoder: minor cleanup	2025-04-16 23:38:16 +02:00
Lynne	29b85cd4b8	vulkan_ffv1: add cached symbol reader for AMD Speeds up everything on AMD by 3x. This uses 32 local invocations to load state into cache, as well as to do the RCT faster.	2025-04-14 06:10:43 +02:00
Lynne	e040c087c7	vulkan: add support for expect/assume This commit adds support for compiler hints. While on AMD these are not used/needed, Nvidia benefits from them, and gives a sizeable 10% speedup on 4k.	2025-04-14 06:10:43 +02:00
Lynne	985a26be28	vulkan_ffv1: shortcut +-1 coeffs in symbol reading Slightly faster, and allows for further optimizations.	2025-04-14 06:10:43 +02:00
Lynne	4d561e6a1e	vulkan_ffv1: remove need for scratch data during setup This saves on some VRAM, but mainly allows for a more unified path.	2025-04-14 06:10:43 +02:00
Lynne	8ceabb677c	vulkan_ffv1: externalize extended lookup check 8% speedup on nvidia on 4k.	2025-04-14 06:10:43 +02:00
Lynne	77f777d925	ffv1/vulkan: redo context count tracking and quant_table_idx management This commit also makes it possible for the encoder to choose a different quantization table on a per-slice basis, as well as adding this capability to the decoder. Also, this commit fully fixes decoding of context=1 encoded files.	2025-04-14 06:10:42 +02:00
Lynne	66b8c92df2	vulkan_ffv1: cache only 2 lines when decoding RGB This reduces the intermediate VRAM used for RGB decoding by a factor of 100x for 6k video. This also speeds the decoder up by 16% for 4k RGB24 and 31% for 6k video. This is equivalent to what the software decoder does, but with less pointers.	2025-04-14 06:10:42 +02:00
Lynne	72953477a4	vulkan_ffv1: fix left-2 sample addressing Typo. Not enough to fix context=1, but its a start.	2025-04-14 06:10:42 +02:00
Lynne	45d7abf6d9	vulkan_ffv1: init overread/corrupt fields Forgotten.	2025-04-14 06:10:42 +02:00
Lynne	fc960dafef	vulkan_ffv1: optimize symbol reader This was the fastest variant tested.	2025-04-14 06:10:41 +02:00
Lynne	defebd74c0	vulkan_ffv1: slightly optimize the range decoder GPUs have cmovs as standard.	2025-04-14 06:10:41 +02:00
Lynne	6bad55eb17	ffv1: add a Vulkan-based decoder This patch adds a fully-featured level 3 and 4 decoder for FFv1, supporting Golomb and all Range coding variants, all pixel formats, and all features, except for the newly added floating-point formats. On a 6000 Ada, for 3840x2160 bgr0 content at 50Mbps (standard desktop recording), it is able to do 400fps. An Alder Lake with 24 threads can barely do 100fps.	2025-03-17 08:51:23 +01:00
Lynne	f2a0bdd6b1	vulkan: unify handling of BGR and simplify ffv1_rct	2025-03-17 08:49:15 +01:00
Lynne	b2ebe9884e	ffv1enc_vulkan: refactor code to support sharing with decoder The shaders were written to support sharing, but needed slight tweaking.	2025-03-17 08:49:14 +01:00
Lynne	89704f07bb	lavc/vulkan: add a u8vec2buf buffer type Useful, since it doesn't have alignment limitations.	2025-02-21 03:19:20 +01:00
IndecisiveTurtle	351fd8460a	vulkan/common: Add put_bytes_count	2024-11-28 10:03:01 +09:00
IndecisiveTurtle	e3ac63b213	vulkan/common: Use u32vec2 buffer type instead of u64 According to the GL_EXT_buffer_reference spec alignment "must be a power of two and be greater than or equal to the largest scalar/component type in the block." This means by using u32vec2 we can drop the requirement alignment from 8 bytes to 4 bytes and save a pack64 call in reverse8 (though I assume in most ISAs that compiles to nothing) Allows the vc2 vulkan encoder to function without setting PB_UNALIGNED	2024-11-28 09:31:43 +09:00
IndecisiveTurtle	f794ed48c0	vulkan/common: Fix off-by-one error in flush_put_bits If caller wrote a divisible by eight number of bits it would write an extra byte. Also increment by to_write instead of BUF_BYTES which overly pads the bitstream.	2024-11-28 09:31:43 +09:00
Lynne	f65e51293a	hwcontext_vulkan: add support for AV_PIX_FMT_GBRAP10/12/14	2024-11-26 14:14:13 +01:00
Lynne	7c52dda55f	hwcontext_vulkan: add support for AV_PIX_FMT_GBRP12/14/16	2024-11-26 14:14:12 +01:00
Lynne	4d3e96c90c	lavc/vulkan/common: fix reverse4's incorrect swizzle The function is responsible for converting little to big endian. It had an incorrect swizzle for the last 2 bytes.	2024-11-20 05:23:36 +01:00
Lynne	9691ac6af2	ffv1enc_vulkan: increase max outstanding byte count to 16bit The issue is that at higher resolutions, the outstanding byte counter overflowed in case the image had a lot of blank areas.	2024-11-20 05:23:35 +01:00
Lynne	ebf5264c93	ffv1enc_vulkan: fix PCM encoding This line was mysteriously deleted.	2024-11-20 05:23:35 +01:00
Lynne	eb536d97a0	ffv1enc_vulkan: support buffers larger than 4GiB Unlike the software FFv1 encoder, none of our buffers are allocated by FFmpeg, which supports at most 4GiB large allocations. For really large sizes, the maximum size of the buffer can exceed 4GiB, which the software encoder optimistically tries to allocate as 4GiB in the hopes that the encoder will compress to under that amount. We can just let Vulkan allocate us a larger buffer, and switch to 64-bit offsets.	2024-11-20 05:23:05 +01:00
Lynne	ed2391d341	ffv1enc: add a Vulkan encoder This commit implements a standard, compliant, version 3 and version 4 FFv1 encoder, entirely in Vulkan. The encoder is written in standard GLSL and requires a Vulkan 1.3 supporting GPU with the BDA extension. The encoder can use any amount of slices, but nominally, should use 32x32 slices (1024 in total) to maximize parallelism. All features are supported, as well as all pixel formats. This includes: - Rice - Range coding with a custom quantization table - PCM encoding CRC calculation is also massively parallelized on the GPU. Encoding of unaligned dimensions on subsampled data requires version 4, or requires oversizing the image to 64-pixel alignment and cropping out the padding via container flags. Performance-wise, this makes 1080p real-time screen capture possible at 60fps on even modest GPUs.	2024-11-18 07:54:22 +01:00
Lynne	4e861ad8e0	libavcodec/Makefile: add a makefile for Vulkan shaders	2024-10-15 17:45:19 +02:00

43 commits