ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-04-18 00:20:21 +00:00

Author	SHA1	Message	Date
Niklas Haas	0da2bbab68	swscale/ops_dispatch: re-indent (cosmetic)	2026-04-16 20:59:39 +00:00
Niklas Haas	4c19f82cc0	swscale/ops_dispatch: compute minimum needed tail size Not only does this take into account extreme edge cases where the plane padding can significantly exceed the actual width/stride, but it also correctly takes into account the filter offsets when scaling; which the previous code completely ignored. Simpler, robuster, and more correct. Now valgrind passes for 100% of format conversions for me, with and without scaling. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	cd8ece4114	swscale/ops_dispatch: generalize the number of tail blocks This is a mostly straightforward internal mechanical change that I wanted to isolate from the following commit to make bisection easier in the case of regressions. While the number of tail blocks could theoretically be different for input vs output memcpy, the extra complexity of handling that mismatch (and adjusting all of the tail offsets, strides etc.) seems not worth it. I tested this commit by manually setting `p->tail_blocks` to higher values and seeing if that still passed the self-check under valgrind. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	dba7b81b38	swscale/ops_dispatch: avoid calling comp->func with w=0 The x86 kernel e.g. assumes that at least one block is processed; so avoid calling this with an empty width. This is currently only possible if e.g. operating on an unpadded, very small image whose total linesize is less than a single block. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	35174913ac	swscale/ops_dispatch: fix and generalize tail buffer size calculation This code had two issues: 1. It was over-allocating bytes for the input offset map case, and 2. It was hard-coding the assumption that there is only a single tail block We can fix both of these issues by rewriting the way the tail size is derived. In the non-offset case, and assuming only 1 tail block: aligned_w - safe_width = num_blocks * block_size - (num_blocks - 1) * block_size = block_size Additionally, the FFMAX(tail_size_in/out) is unnecessary, because: tail_size = pass->width - safe_width <= aligned_w - safe_width In the input offset case, we instead realize that the input kernel already never over-reads the input due to the filter size adjustment/clamping, so the only thing we need to ensure is that we allocate extra bytes for the input over-read. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	f604add8c1	swscale/ops_dispatch: remove pointless AV_CEIL_RSHIFT() The over_read/write fields are not documented as depending on the subsampling factor. Actually, they are not documented as depending on the plane at all. If and when we do actually add support for horizontal subsampling to this code, it will most likely be by turning all of these key variables into arrays, which will be an upgrade we get basically for free. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	dd8ff89adf	swscale/ops_dispatch: add helper to explicitly control pixel->bytes rounding This makes it far less likely to accidentally add or remove a +7 bias when repeating this often-used expression. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	16a57b2985	swscale/ops_dispatch: ensure block size is multiple of pixel size This could trigger if e.g. a backend tries to operate on monow formats with a block size that is not a multiple of 1. In this case, `block_size_in` would previously be miscomputed (to e.g. 0), which is obviously wrong. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	86307dad4a	swscale/ops_dispatch: make offset calculation code robust against overflow As well as weird edge cases like trying to filter `monow` and pixels landing in the middle of a byte. Realistically, this will never happen - we'd instead pre-process it into something byte-aligned, and then dispatch a byte-aligned filter on it. However, I need to add a check for overflow in any case, so we might as well add the alignment check at the same time. It's basically free. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	95e4f7cac5	swscale/ops_dispatch: fix rounding direction of plane_size This is an upper bound, so it should be rounded up. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	c6e47b293d	swscale/ops_dispatch: pre-emptively guard against int overflow By using size_t whenever we compute derived figures. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	0524e66aec	swscale/ops_dispatch: drop pointless `const` (cosmetic) These are clearly not mutated within their constrained scope, and it just wastes valuable horizontal space. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	c98810ac78	swscale/ops_dispatch: zero-init tail buffer Prevents valgrind from complaining about operating on uninitialized bytes. This should be cheap as it's only done once during setup(). Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	b5573a8683	swscale/ops_dispatch: cosmetic Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-10 15:12:18 +02:00
Niklas Haas	3a15990368	swscale/ops_dispatch: forward correct pass alignment As a consequence of the fact that the frame pool API doesn't let us directly access the linesize, we have to "un-translate" the over_read/write back to the nearest multiple of the pixel size. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-10 15:12:18 +02:00
Niklas Haas	c0cc7f341a	swscale/ops: simplify SwsOpList.order_src/dst Just define these directly as integer arrays; there's really no point in having them re-use SwsSwizzleOp; the only place this was ever even remotely relevant was in the no-op check, which any decent compiler should already be capable of optimizing into a single 32-bit comparison. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-29 09:39:09 +00:00
Niklas Haas	e6e9c45892	swscale/ops_dispatch: try again with split subpasses if compile() fails First, we try compiling the filter pass as-is; in case any backends decide to handle the filter as a single pass. (e.g. Vulkan, which will want to compile such using internal temporary buffers and barriers) If that fails, retry with a chained list of split passes. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-28 18:50:14 +01:00
Niklas Haas	e3daeff965	swscale/ops_dispatch: compute input x offset map for SwsOpExec This is cheap to precompute and can be used as-is for gather-style horizontal filter implementations. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-28 18:50:14 +01:00
Niklas Haas	dc88946d7b	swscale/ops_dispatch: fix plane width calculation This was wrong if sub_x > 1. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-28 18:50:14 +01:00
Niklas Haas	78878b9daa	swscale/ops_dispatch: refactor tail handling Rather than dispatching the compiled function for each line of the tail individually, with a memcpy to a shared buffer in between, this instead copies the entire tail region into a temporary intermediate buffer, processes it with a single dispatch call, and then copies the entire result back to the destination. The main benefit of this is that it enables scaling, subsampling or other quirky layouts to continue working, which may require accessing lines adjacent to the main input. It also arguably makes the code a bit simpler and easier to follow, but YMMV. One minor consequence of the change in logic is that we also no longer handle the last row of an unpadded input buffer separately - instead, if any row needs to be padded, all rows in the current slice will be padded. This is a bit less efficient but much more predictable, and as discussed, basically required for scaling/filtering anyways. While we could implement some sort of hybrid regime where we only use the new logic when scaling is needed, I really don't think this would gain us anything concrete enough to be worth the effort, especially since the performance is basically roughly the same across the board: 16 threads: yuv444p 1920x1080 -> ayuv 1920x1080: speedup=1.000x slower (input memcpy) rgb24 1920x1080 -> argb 1920x1080: speedup=1.012x faster (output memcpy) 1 thread: yuv444p 1920x1080 -> ayuv 1920x1080: speedup=1.062x faster (input memcpy) rgb24 1920x1080 -> argb 1920x1080: speedup=0.959x slower (output memcpy) Overall speedup is +/- 1% across the board, well within margin of error. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-28 18:50:14 +01:00
Niklas Haas	015abfab38	swscale/ops_dispatch: precompute relative y bump map This is more useful for tight loops inside CPU backends, which can implement this by having a shared path for incrementing to the next line (as normal), and then a separate path for adding an extra position-dependent, stride multiplied line offset after each completed line. As a free upside, this encoding does not require any separate/special handling for the exec tail. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-28 18:50:14 +01:00
Niklas Haas	2583d7ad9b	swscale/ops_dispatch: add line offsets map to SwsOpPass And use it to look up the correct source plane line for each destination line. Needed for vertical scaling, in which case multiple output lines can reference the same input line. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-28 18:50:14 +01:00
Niklas Haas	3310fe95ae	swscale/ops_dispatch: also print ops list after optimizing Will make more sense in light of the fact that this may not correspond to the op list actually sent to the backends, due to subpass splitting. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-18 09:09:44 +00:00
Niklas Haas	039b492019	swscale/ops_dispatch: correctly round tail size If the block size is somehow less than 8, this may round down, leading to one byte too few being copied (e.g. for monow/rgb4). This was never an issue for current backends because they all have block sizes of 8 or larger, but a future platform may have different requirements. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-18 09:09:44 +00:00
Niklas Haas	800c3a71e9	swscale/ops_dispatch: properly handle negative strides The `memcpy_in` condition is reversed for negative strides, which require a memcpy() on the first line, not the last line. Additionally, the check just completely didn't work for negative linesizes, due to comparing against a negative stride. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-18 09:09:44 +00:00
Niklas Haas	91e76dc766	swscale/ops_dispatch: remove slice_align hack Added in commit `00907e1244` to hack around a problem that was caused by the Vulkan backend's incorrect use of the ops dispatch code, which was fixed properly in commit `143cb56501`. This logic never made sense to begin with, it was only meant to disable the memcpy logic for Vulkan specifically. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-18 09:09:44 +00:00
Niklas Haas	7fb1e0832c	swscale/ops_dispatch: move ENOTSUP error to ff_sws_compile_pass() Or else this might false-positive when we retry compilation after subpass splitting. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-12 21:02:48 +00:00
Niklas Haas	e7c84a8e6a	swscale/ops_dispatch: infer destination format from SwsOpList This is now redundant. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-12 21:02:48 +00:00
Niklas Haas	b5db7c7354	swscale/ops_dispatch: have ff_sws_compile_pass() take ownership of `ops` More useful than just allowing it to "modify" the ops; in practice this means the contents will be undefined anyways - might as well have this function take care of freeing it afterwards as well. Will make things simpler with regards to subpass splitting. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-12 21:02:48 +00:00
Niklas Haas	adf2d4e90f	swscale/ops_dispatch: add helper function to clean up SwsCompiledOp Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-12 21:02:48 +00:00
Niklas Haas	563cc8216b	swscale/graph: allow setup() to return an error code Useful for a handful of reasons, including Vulkan (which depends on external device resources), but also a change I want to make to the tail handling. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-12 21:02:48 +00:00
Niklas Haas	bd9590db70	swscale/ops_dispatch: remove unnecessary SwsOpExec fields These were abstraction-violating in the first place. Good riddance. This partially reverts commit `c911295f09`. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-09 12:01:51 +01:00
Niklas Haas	911176c880	swscale/ops_dispatch: add SwsCompiledFunc.opaque Allows compiled functions to opt out of the ops_dispatch execution harness altogether and just get dispatched directly as the pass run() function. Useful in particular for Vulkan. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-09 12:01:51 +01:00
Niklas Haas	9571f5cf15	swscale/graph: simplify ff_sws_graph_add_pass() usages Now that this function returns a status code and takes care of cleanup on failure, many call-sites can just return the function directly. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-09 12:01:51 +01:00
Niklas Haas	2e29833832	swscale/graph: have ff_sws_graph_add_pass() free priv on failure This is arguably more convenient for most downstream users, as will be more prominently seen in the next commit. Also allows this code to re-use a pass_free() helper with the graph uninit. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-09 12:01:51 +01:00
Niklas Haas	42a47838ea	swscale/graph: add setup()/free() to ff_sws_graph_add_pass() signature This is just slightly common enough a pattern that it IMO makes sense to do so. This will also make more sense after the following commits. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-09 12:01:51 +01:00
Niklas Haas	f77ab892f5	swscale/ops_dispatch: print op list on successful compile Instead of once at the start of add_convert_pass(). This makes much more sense in light of the fact that we want to start e.g. splitting passes apart. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-09 11:25:58 +01:00
Niklas Haas	a31973e99c	swscale/ops_dispatch: avoid redundant ff_sws_op_list_update_comps() This is already called by compile_backend(), and nothing else in this file depends on accurate values. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-09 11:25:58 +01:00
Niklas Haas	d9e594ca96	swscale/graph: have ff_sws_graph_add_pass() return an error code This allows distinguishing between different types of failure, e.g. AVERROR(EINVAL) on invalid pass dimensions. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-05 23:34:56 +00:00
Niklas Haas	4b5122bfb2	swscale/ops_dispatch: move on-stack mutation to ops backends And move the remainder printing there as well. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-05 23:34:56 +00:00
Niklas Haas	b4bcb00cd3	swscale/ops: add and use ff_sws_op_list_input/output() Makes various pieces of code that expect to get a SWS_OP_READ more robust, and also allows us to generalize to introduce more input op types in the future (in particular, I am looking ahead towards filter ops). Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-05 23:34:56 +00:00
Niklas Haas	68f3886460	swscale/ops_dispatch: split off compile/dispatch code from ops.c This code is self-contained and logically distinct from the ops-related helpers in ops.c, so it belongs in its own file. Purely cosmetic; no functional change. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-05 23:34:56 +00:00

42 commits