Makes various pieces of code that expect to get a SWS_OP_READ more robust,
and also allows us to generalize to introduce more input op types in the
future (in particular, I am looking ahead towards filter ops).
Signed-off-by: Niklas Haas <git@haasn.dev>
We often need to dither only a subset of the components. Previously this
was not possible, but we can just use the special value -1 for this.
The main motivating factor is actually the fact that "unnecessary" dither ops
would otherwise frequently prevent plane splitting, since e.g. a copied
alpha plane has to come along for the ride through the whole F32/dither
pipeline.
Additionally, it somewhat simplifies implementations.
Signed-off-by: Niklas Haas <git@haasn.dev>
This was just a minor/pointless optimization in the first place. We keep
the skip on the last component because we can never commute that past the
end of the list.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
When splitting planes, some planes can end up without a read operation
altogether, e.g. when just clearing the alpha plane.
Just return ENOTSUP for such lists instead of EINVAL.
Also fixes the !ops->num_ops check to avoid UB.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
The way this code was written relied on the implicit assumption that no other
row was reading from the same column, which was true in practice so far but
not necessarily true in general. Fix it by precomputing the nonzero component
mask and then adding an explicit check.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This requires a bit of a manual check in the 32-bit integer case to
make sure we don't exceed the value range of AVRational; but it still allows
quite a number of optimizations despite that restriction.
e.g.
rgb24 -> yuva444p9be:
- [u16 ...X -> ++++] SWS_OP_CLEAR : {_ _ _ 511}
- [u16 .... -> zzzz] SWS_OP_SWAP_BYTES
- [u16 .... -> zzzz] SWS_OP_WRITE : 4 elem(s) planar >> 0
+ [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES
+ [u16 ...X -> zzz+] SWS_OP_CLEAR : {_ _ _ 65281}
+ [u16 .... -> zzz+] SWS_OP_WRITE : 4 elem(s) planar >> 0
gray -> yuv444p12be:
- [u16 .XXX -> +++X] SWS_OP_CLEAR : {_ 2048 2048 _}
- [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES
- [u16 ...X -> zzzX] SWS_OP_WRITE : 3 elem(s) planar >> 0
+ [u16 .XXX -> zzXX] SWS_OP_SWAP_BYTES
+ [u16 .XXX -> z++X] SWS_OP_CLEAR : {_ 8 8 _}
+ [u16 ...X -> z++X] SWS_OP_WRITE : 3 elem(s) planar >> 0
Ultimately, the benefit of this will only become relevant once we start
splitting apart planes, since then we can have planes with only CLEAR
operations.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This optimization is lossy, since it removes important information about the
number of planes to be copied. Subsumed by the more correct
Instead, move this code to the new ff_sws_op_list_is_noop().
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
I think this is ultimately a better home, since the semantics of this are
not really tied to optimization itself; and because I want to make it an
explicitly suported part of the user-facing API (rather than just an
internal-use field).
The secondary motivating reason here is that I intend to use internal
helpers of `ops.c` inside the next commit. (Though this is a weak reason
on its own, and not sufficient to justify this move by itself.)
Instead of blindly interleaving re-ordering and minimizing optimizations,
separate this loop into several passes - the first pass will minimize the
operation list in-place as much as possible, and the second pass will apply any
desired re-orderings. (We also want to try pushing clear back before any other
re-orderings, as this can trigger more phase 1 optimizations)
This restructuring leads to significantly more predictable and stable behavior,
especially when introducing more operation types going forwards. Does not
actually affect the current results, but matters with some upcoming changes
I have planned.
This requires updating the order of the dither matrix offsets as well. In
particular, this can only be done cleanly if the dither matrix offsets are
compatible between duplicated channels; but this should be guaranteed after
the previous commits.
As a side note, this also fixes a bug where we pushed SWS_OP_SWIZZLE past
SWS_OP_DITHER even for very low bit depth output (e.g. rgb4), which led to
a big loss in accuracy due to loss of per-channel dither noise.
Improves loss of e.g. gray -> rgb444 from 0.00358659 to 0.00261414,
now beating legacy swscale in these cases as well.
Otherwise, this is invalid; as the result of applying the next operation
after channel duplication may not commute with the result of duplicating
the result of applying the next operation on only one channel.
In practice, this was using the last seen channel's coefficients.
Note that this did not actually affect anything in practice, because the only
relevant ops (MIN/MAX) were always generated with identical coefficients for
identical channel ranges.
However, it will matter moving forwards, as e.g. dither ops may not be
commuted freely if their matrix offsets differ per channel.
The current usage is ambiguous between "affects each component equally" and
"affects each component independently" - and arguably, the current behavior
was a bug (since SWS_OP_DITHER should not commute with a SWIZZLE, at least
from a bit-exactness PoV).
However, when trying to define cleaner replacements for these concepts, I
realized there are too many special cases; and given that we only have two
use sites, I decided to just split them directly into "commute" functions
for those particular usage cases.
As an added benefit, this moves the commutation logic out of the already-long
ff_sws_ops_list_optimize().
This can turn any compatible sequence of operations into a single packed
shuffle, including packed swizzling, grayscale->RGB conversion, endianness
swapping, RGB bit depth conversions, rgb24->rgb0 alpha clearing and more.
This is responsible for taking a "naive" ops list and optimizing it
as much as possible. Also includes a small analyzer that generates component
metadata for use by the optimizer.