Commit graph

42 commits

Author SHA1 Message Date
Niklas Haas
0da2bbab68 swscale/ops_dispatch: re-indent (cosmetic) 2026-04-16 20:59:39 +00:00
Niklas Haas
4c19f82cc0 swscale/ops_dispatch: compute minimum needed tail size
Not only does this take into account extreme edge cases where the plane
padding can significantly exceed the actual width/stride, but it also
correctly takes into account the filter offsets when scaling; which the
previous code completely ignored.

Simpler, robuster, and more correct. Now valgrind passes for 100% of format
conversions for me, with and without scaling.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
cd8ece4114 swscale/ops_dispatch: generalize the number of tail blocks
This is a mostly straightforward internal mechanical change that I wanted
to isolate from the following commit to make bisection easier in the case of
regressions.

While the number of tail blocks could theoretically be different for input
vs output memcpy, the extra complexity of handling that mismatch (and
adjusting all of the tail offsets, strides etc.) seems not worth it.

I tested this commit by manually setting `p->tail_blocks` to higher values
and seeing if that still passed the self-check under valgrind.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
dba7b81b38 swscale/ops_dispatch: avoid calling comp->func with w=0
The x86 kernel e.g. assumes that at least one block is processed; so avoid
calling this with an empty width. This is currently only possible if e.g.
operating on an unpadded, very small image whose total linesize is less than
a single block.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
35174913ac swscale/ops_dispatch: fix and generalize tail buffer size calculation
This code had two issues:

1. It was over-allocating bytes for the input offset map case, and
2. It was hard-coding the assumption that there is only a single tail block

We can fix both of these issues by rewriting the way the tail size is derived.

In the non-offset case, and assuming only 1 tail block:
    aligned_w - safe_width
  = num_blocks * block_size - (num_blocks - 1) * block_size
  = block_size

Additionally, the FFMAX(tail_size_in/out) is unnecessary, because:
    tail_size = pass->width - safe_width <= aligned_w - safe_width

In the input offset case, we instead realize that the input kernel already
never over-reads the input due to the filter size adjustment/clamping, so
the only thing we need to ensure is that we allocate extra bytes for the
input over-read.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
f604add8c1 swscale/ops_dispatch: remove pointless AV_CEIL_RSHIFT()
The over_read/write fields are not documented as depending on the subsampling
factor. Actually, they are not documented as depending on the plane at all.

If and when we do actually add support for horizontal subsampling to this
code, it will most likely be by turning all of these key variables into
arrays, which will be an upgrade we get basically for free.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
dd8ff89adf swscale/ops_dispatch: add helper to explicitly control pixel->bytes rounding
This makes it far less likely to accidentally add or remove a +7 bias when
repeating this often-used expression.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
16a57b2985 swscale/ops_dispatch: ensure block size is multiple of pixel size
This could trigger if e.g. a backend tries to operate on monow formats with
a block size that is not a multiple of 1. In this case, `block_size_in`
would previously be miscomputed (to e.g. 0), which is obviously wrong.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
86307dad4a swscale/ops_dispatch: make offset calculation code robust against overflow
As well as weird edge cases like trying to filter `monow` and pixels landing
in the middle of a byte. Realistically, this will never happen - we'd instead
pre-process it into something byte-aligned, and then dispatch a byte-aligned
filter on it.

However, I need to add a check for overflow in any case, so we might as well
add the alignment check at the same time. It's basically free.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
95e4f7cac5 swscale/ops_dispatch: fix rounding direction of plane_size
This is an upper bound, so it should be rounded up.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
c6e47b293d swscale/ops_dispatch: pre-emptively guard against int overflow
By using size_t whenever we compute derived figures.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
0524e66aec swscale/ops_dispatch: drop pointless const (cosmetic)
These are clearly not mutated within their constrained scope, and it just
wastes valuable horizontal space.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
c98810ac78 swscale/ops_dispatch: zero-init tail buffer
Prevents valgrind from complaining about operating on uninitialized bytes.
This should be cheap as it's only done once during setup().

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
b5573a8683 swscale/ops_dispatch: cosmetic
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 15:12:18 +02:00
Niklas Haas
3a15990368 swscale/ops_dispatch: forward correct pass alignment
As a consequence of the fact that the frame pool API doesn't let us directly
access the linesize, we have to "un-translate" the over_read/write back to
the nearest multiple of the pixel size.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 15:12:18 +02:00
Niklas Haas
c0cc7f341a swscale/ops: simplify SwsOpList.order_src/dst
Just define these directly as integer arrays; there's really no point in
having them re-use SwsSwizzleOp; the only place this was ever even remotely
relevant was in the no-op check, which any decent compiler should already
be capable of optimizing into a single 32-bit comparison.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-29 09:39:09 +00:00
Niklas Haas
e6e9c45892 swscale/ops_dispatch: try again with split subpasses if compile() fails
First, we try compiling the filter pass as-is; in case any backends decide to
handle the filter as a single pass. (e.g. Vulkan, which will want to compile
such using internal temporary buffers and barriers)

If that fails, retry with a chained list of split passes.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas
e3daeff965 swscale/ops_dispatch: compute input x offset map for SwsOpExec
This is cheap to precompute and can be used as-is for gather-style horizontal
filter implementations.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas
dc88946d7b swscale/ops_dispatch: fix plane width calculation
This was wrong if sub_x > 1.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas
78878b9daa swscale/ops_dispatch: refactor tail handling
Rather than dispatching the compiled function for each line of the tail
individually, with a memcpy to a shared buffer in between, this instead copies
the entire tail region into a temporary intermediate buffer, processes it with
a single dispatch call, and then copies the entire result back to the
destination.

The main benefit of this is that it enables scaling, subsampling or other
quirky layouts to continue working, which may require accessing lines adjacent
to the main input.

It also arguably makes the code a bit simpler and easier to follow, but YMMV.

One minor consequence of the change in logic is that we also no longer handle
the last row of an unpadded input buffer separately - instead, if *any* row
needs to be padded, *all* rows in the current slice will be padded. This is
a bit less efficient but much more predictable, and as discussed, basically
required for scaling/filtering anyways.

While we could implement some sort of hybrid regime where we only use the new
logic when scaling is needed, I really don't think this would gain us anything
concrete enough to be worth the effort, especially since the performance is
basically roughly the same across the board:

16 threads:
  yuv444p 1920x1080 -> ayuv 1920x1080: speedup=1.000x slower (input memcpy)
  rgb24   1920x1080 -> argb 1920x1080: speedup=1.012x faster (output memcpy)

1 thread:
  yuv444p 1920x1080 -> ayuv 1920x1080: speedup=1.062x faster (input memcpy)
  rgb24   1920x1080 -> argb 1920x1080: speedup=0.959x slower (output memcpy)

Overall speedup is +/- 1% across the board, well within margin of error.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas
015abfab38 swscale/ops_dispatch: precompute relative y bump map
This is more useful for tight loops inside CPU backends, which can implement
this by having a shared path for incrementing to the next line (as normal),
and then a separate path for adding an extra position-dependent, stride
multiplied line offset after each completed line.

As a free upside, this encoding does not require any separate/special handling
for the exec tail.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas
2583d7ad9b swscale/ops_dispatch: add line offsets map to SwsOpPass
And use it to look up the correct source plane line for each destination
line. Needed for vertical scaling, in which case multiple output lines can
reference the same input line.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas
3310fe95ae swscale/ops_dispatch: also print ops list after optimizing
Will make more sense in light of the fact that this may not correspond
to the op list actually sent to the backends, due to subpass splitting.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-18 09:09:44 +00:00
Niklas Haas
039b492019 swscale/ops_dispatch: correctly round tail size
If the block size is somehow less than 8, this may round down, leading to
one byte too few being copied (e.g. for monow/rgb4).

This was never an issue for current backends because they all have block sizes
of 8 or larger, but a future platform may have different requirements.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-18 09:09:44 +00:00
Niklas Haas
800c3a71e9 swscale/ops_dispatch: properly handle negative strides
The `memcpy_in` condition is reversed for negative strides, which require a
memcpy() on the *first* line, not the last line. Additionally, the check
just completely didn't work for negative linesizes, due to comparing against
a negative stride.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-18 09:09:44 +00:00
Niklas Haas
91e76dc766 swscale/ops_dispatch: remove slice_align hack
Added in commit 00907e1244 to hack around a problem that was caused by
the Vulkan backend's incorrect use of the ops dispatch code, which was fixed
properly in commit 143cb56501.

This logic never made sense to begin with, it was only meant to disable the
memcpy logic for Vulkan specifically.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-18 09:09:44 +00:00
Niklas Haas
7fb1e0832c swscale/ops_dispatch: move ENOTSUP error to ff_sws_compile_pass()
Or else this might false-positive when we retry compilation after subpass
splitting.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-12 21:02:48 +00:00
Niklas Haas
e7c84a8e6a swscale/ops_dispatch: infer destination format from SwsOpList
This is now redundant.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-12 21:02:48 +00:00
Niklas Haas
b5db7c7354 swscale/ops_dispatch: have ff_sws_compile_pass() take ownership of ops
More useful than just allowing it to "modify" the ops; in practice this means
the contents will be undefined anyways - might as well have this function
take care of freeing it afterwards as well.

Will make things simpler with regards to subpass splitting.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-12 21:02:48 +00:00
Niklas Haas
adf2d4e90f swscale/ops_dispatch: add helper function to clean up SwsCompiledOp
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-12 21:02:48 +00:00
Niklas Haas
563cc8216b swscale/graph: allow setup() to return an error code
Useful for a handful of reasons, including Vulkan (which depends on external
device resources), but also a change I want to make to the tail handling.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-12 21:02:48 +00:00
Niklas Haas
bd9590db70 swscale/ops_dispatch: remove unnecessary SwsOpExec fields
These were abstraction-violating in the first place. Good riddance.

This partially reverts commit c911295f09.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-09 12:01:51 +01:00
Niklas Haas
911176c880 swscale/ops_dispatch: add SwsCompiledFunc.opaque
Allows compiled functions to opt out of the ops_dispatch execution harness
altogether and just get dispatched directly as the pass run() function.

Useful in particular for Vulkan.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-09 12:01:51 +01:00
Niklas Haas
9571f5cf15 swscale/graph: simplify ff_sws_graph_add_pass() usages
Now that this function returns a status code and takes care of cleanup on
failure, many call-sites can just return the function directly.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-09 12:01:51 +01:00
Niklas Haas
2e29833832 swscale/graph: have ff_sws_graph_add_pass() free priv on failure
This is arguably more convenient for most downstream users, as will be
more prominently seen in the next commit.

Also allows this code to re-use a pass_free() helper with the graph uninit.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-09 12:01:51 +01:00
Niklas Haas
42a47838ea swscale/graph: add setup()/free() to ff_sws_graph_add_pass() signature
This is just slightly common enough a pattern that it IMO makes sense to do
so. This will also make more sense after the following commits.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-09 12:01:51 +01:00
Niklas Haas
f77ab892f5 swscale/ops_dispatch: print op list on successful compile
Instead of once at the start of add_convert_pass(). This makes much
more sense in light of the fact that we want to start e.g. splitting
passes apart.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-09 11:25:58 +01:00
Niklas Haas
a31973e99c swscale/ops_dispatch: avoid redundant ff_sws_op_list_update_comps()
This is already called by compile_backend(), and nothing else in this file
depends on accurate values.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-09 11:25:58 +01:00
Niklas Haas
d9e594ca96 swscale/graph: have ff_sws_graph_add_pass() return an error code
This allows distinguishing between different types of failure, e.g.
AVERROR(EINVAL) on invalid pass dimensions.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-05 23:34:56 +00:00
Niklas Haas
4b5122bfb2 swscale/ops_dispatch: move on-stack mutation to ops backends
And move the remainder printing there as well.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-05 23:34:56 +00:00
Niklas Haas
b4bcb00cd3 swscale/ops: add and use ff_sws_op_list_input/output()
Makes various pieces of code that expect to get a SWS_OP_READ more robust,
and also allows us to generalize to introduce more input op types in the
future (in particular, I am looking ahead towards filter ops).

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-05 23:34:56 +00:00
Niklas Haas
68f3886460 swscale/ops_dispatch: split off compile/dispatch code from ops.c
This code is self-contained and logically distinct from the ops-related
helpers in ops.c, so it belongs in its own file.

Purely cosmetic; no functional change.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-05 23:34:56 +00:00