ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-05-07 09:00:26 +00:00

Author	SHA1	Message	Date
Niklas Haas	b4bcb00cd3	swscale/ops: add and use ff_sws_op_list_input/output() Makes various pieces of code that expect to get a SWS_OP_READ more robust, and also allows us to generalize to introduce more input op types in the future (in particular, I am looking ahead towards filter ops). Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-05 23:34:56 +00:00
Niklas Haas	43f1189af7	swscale/ops_optimizer: eliminate unnecessary dither indices Generates a lot of incremental diffs due to things like ignored alpha planes or chroma planes that are not actually modified. e.g. bgr24 -> gbrap10be: [ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) packed >> 0 [ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> f32 [f32 ...X -> ...X] SWS_OP_SCALE : * 341/85 - [f32 ...X -> ...X] SWS_OP_DITHER : 16x16 matrix + {2 3 0 5} + [f32 ...X -> ...X] SWS_OP_DITHER : 16x16 matrix + {2 3 0 -1} [f32 ...X -> ...X] SWS_OP_MIN : x <= {1023 1023 1023 1023} [f32 ...X -> +++X] SWS_OP_CONVERT : f32 -> u16 [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES [u16 ...X -> zzzX] SWS_OP_SWIZZLE : 1023 [u16 ...X -> zzz+] SWS_OP_CLEAR : {_ _ _ 65283} [u16 .... -> zzz+] SWS_OP_WRITE : 4 elem(s) planar >> 0 (X = unused, z = byteswapped, + = exact, 0 = zero) Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-26 13:09:14 +00:00
Niklas Haas	ef4a597ad8	swscale/ops: allow excluding components from SWS_OP_DITHER We often need to dither only a subset of the components. Previously this was not possible, but we can just use the special value -1 for this. The main motivating factor is actually the fact that "unnecessary" dither ops would otherwise frequently prevent plane splitting, since e.g. a copied alpha plane has to come along for the ride through the whole F32/dither pipeline. Additionally, it somewhat simplifies implementations. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-26 13:09:14 +00:00
Niklas Haas	cff9f29d5b	swscale/ops_optimizer: add sanity checks to scale->expand conversion This only works for integer types.	2026-02-21 11:47:43 +00:00
Niklas Haas	c9404f5b9c	swscale/optimizer: eliminate completely unused operations e.g. empty read when all components are eventually cleared Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	46d721a69a	swscale/optimizer: compress planar reads with unused planes After plane splitting, we can end up with a situation where a subpass wants to read only, say, the alpha plane. In this case, we should compress the planar read by instead swizzling the alpha plane into the correct place in the src plane order, and then reading only a single plane. Results in a bunch of benign diffs like: yuva444p -> ya8: - [ u8 XXXX -> ++++] SWS_OP_READ : 4 elem(s) planar >> 0 - [ u8 .XX. -> ++++] SWS_OP_CONVERT : u8 -> f32 - [f32 .XX. -> .+++] SWS_OP_LINEAR : luma [...] - [f32 .XX. -> .+++] SWS_OP_DITHER : 16x16 matrix + {0 3 2 5} - [f32 .XX. -> .+++] SWS_OP_MAX : {0 0 0 0} <= x - [f32 .XX. -> .+++] SWS_OP_MIN : x <= {255 _ _ 255} - [f32 .XX. -> ++++] SWS_OP_CONVERT : f32 -> u8 - [ u8 .XX. -> ++++] SWS_OP_SWIZZLE : 0312 - [ u8 ..XX -> ++++] SWS_OP_WRITE : 2 elem(s) packed >> 0 + [ u8 XXXX -> ++XX] SWS_OP_READ : 2 elem(s) planar >> 0, via {0, 3} + [ u8 ..XX -> ++XX] SWS_OP_CONVERT : u8 -> f32 + [f32 ..XX -> +XX+] SWS_OP_SWIZZLE : 0321 + [f32 .XX. -> .XX+] SWS_OP_LINEAR : luma [...] + [f32 .XX. -> .XX+] SWS_OP_DITHER : 16x16 matrix + {0 3 2 5} + [f32 .XX. -> .XX+] SWS_OP_MAX : {0 0 0 0} <= x + [f32 .XX. -> .XX+] SWS_OP_MIN : x <= {255 _ _ 255} + [f32 .XX. -> +XX+] SWS_OP_CONVERT : f32 -> u8 + [ u8 .XX. -> ++XX] SWS_OP_SWIZZLE : 0312 + [ u8 ..XX -> ++XX] SWS_OP_WRITE : 2 elem(s) packed >> 0 This may seem noisy, but really is mostly a result of the fact that the unused middle components are now marked as garbage instead of as valid data. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	b01236d5fb	swscale/optimizer: try pushing all swizzles towards the output Now that we can directly promote these to plane swizzles, we generally want to try pushing them in one direction - ideally towards the output, as in the case of split subpasses, the output is guaranteed to be planar. (And there may not even be a read) Results in a lot of diffs, ranging from the benign, e.g.: rgb24 -> bgr48be: [ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) packed >> 0 [ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> u16 (expand) - [u16 ...X -> +++X] SWS_OP_SWIZZLE : 2103 [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES + [u16 ...X -> zzzX] SWS_OP_SWIZZLE : 2103 [u16 ...X -> zzzX] SWS_OP_WRITE : 3 elem(s) packed >> 0 rgb24 -> gbrp9be: [ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) packed >> 0 [ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> f32 [f32 ...X -> ...X] SWS_OP_SCALE : * 511/255 [f32 ...X -> ...X] SWS_OP_DITHER : 16x16 matrix + {0 3 2 5} [f32 ...X -> ...X] SWS_OP_MIN : x <= {511 511 511 _} [f32 ...X -> +++X] SWS_OP_CONVERT : f32 -> u16 - [u16 ...X -> +++X] SWS_OP_SWIZZLE : 1203 [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES - [u16 ...X -> zzzX] SWS_OP_WRITE : 3 elem(s) planar >> 0 + [u16 ...X -> zzzX] SWS_OP_WRITE : 3 elem(s) planar >> 0, via {2, 0, 1} To the clear improvements, e.g.: bgr24 -> gbrp16be: [ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) packed >> 0 - [ u8 ...X -> +++X] SWS_OP_SWIZZLE : 2103 [ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> u16 (expand) - [u16 ...X -> +++X] SWS_OP_SWIZZLE : 1203 [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES - [u16 ...X -> zzzX] SWS_OP_WRITE : 3 elem(s) planar >> 0 + [u16 ...X -> zzzX] SWS_OP_WRITE : 3 elem(s) planar >> 0, via {1, 0, 2} The only case worth careful consideration is when there are swizzled inputs that result in unusual plane patterns, e.g.: argb -> gbrp9be: [ u8 XXXX -> ++++] SWS_OP_READ : 4 elem(s) packed >> 0 - [ u8 X... -> ++++] SWS_OP_SWIZZLE : 1230 - [ u8 ...X -> ++++] SWS_OP_CONVERT : u8 -> f32 - [f32 ...X -> ....] SWS_OP_SCALE : * 511/255 - [f32 ...X -> ....] SWS_OP_DITHER : 16x16 matrix + {0 3 2 5} - [f32 ...X -> ....] SWS_OP_MIN : x <= {511 511 511 _} - [f32 ...X -> ++++] SWS_OP_CONVERT : f32 -> u16 - [u16 ...X -> ++++] SWS_OP_SWIZZLE : 1203 - [u16 ...X -> zzzz] SWS_OP_SWAP_BYTES - [u16 ...X -> zzzz] SWS_OP_WRITE : 3 elem(s) planar >> 0 + [ u8 X... -> ++++] SWS_OP_CONVERT : u8 -> f32 + [f32 X... -> ....] SWS_OP_SCALE : * 511/255 + [f32 X... -> ....] SWS_OP_DITHER : 16x16 matrix + {0 0 3 2} + [f32 X... -> ....] SWS_OP_MIN : x <= {511 511 511 511} + [f32 X... -> ++++] SWS_OP_CONVERT : f32 -> u16 + [u16 X... -> zzzz] SWS_OP_SWAP_BYTES + [u16 X... -> zzzz] SWS_OP_SWIZZLE : 3120 + [u16 ...X -> zzzz] SWS_OP_WRITE : 3 elem(s) planar >> 0, via {1, 2, 0} (X = unused, z = byteswapped, + = exact, 0 = zero) Observe the change from ...X to X..., which is a pattern that doesn't necessarily have a fast path and would usually end up falling back to the generic 4-component implementations (rather than the 3-component ones). That said, this is not a big deal, since we can ultimately re-align the set of implementations with what's actually needed; once we're done with plane splitting and so forth. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	aaa898a2d1	swscale/optimizer: promote component swizzles to plane swizzles In some cases, we can just directly swizzle the order of input/output planes, rather than applying a swizzle operation on the data itself. This can eliminate some such swizzle operations entirely, for example yuv444p -> vuya is now just a read, clear and write. Results in a lot of simplifications like this: rgb24 -> gbrp: [ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) packed >> 0 - [ u8 ...X -> +++X] SWS_OP_SWIZZLE : 1203 - [ u8 ...X -> +++X] SWS_OP_WRITE : 3 elem(s) planar >> 0 + [ u8 ...X -> +++X] SWS_OP_WRITE : 3 elem(s) planar >> 0, via {2, 0, 1} rgb24 -> gbrap16le: [ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) packed >> 0 [ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> u16 (expand) - [u16 ...X -> +++X] SWS_OP_SWIZZLE : 1203 [u16 ...X -> ++++] SWS_OP_CLEAR : {_ _ _ 65535} - [u16 .... -> ++++] SWS_OP_WRITE : 4 elem(s) planar >> 0 + [u16 .... -> ++++] SWS_OP_WRITE : 4 elem(s) planar >> 0, via {2, 0, 1, 3} yuv444p -> vuya: - [ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) planar >> 0 - [ u8 ...X -> +++X] SWS_OP_SWIZZLE : 2103 + [ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) planar >> 0, via {2, 1, 0} [ u8 ...X -> ++++] SWS_OP_CLEAR : {_ _ _ 255} [ u8 .... -> ++++] SWS_OP_WRITE : 4 elem(s) packed >> 0 Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	a02860a292	swscale/optimizer: don't assume op lists start with read This was just a minor/pointless optimization in the first place. We keep the skip on the last component because we can never commute that past the end of the list. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	6df8174f77	swscale/optimizer: don't reject op lists without read When splitting planes, some planes can end up without a read operation altogether, e.g. when just clearing the alpha plane. Just return ENOTSUP for such lists instead of EINVAL. Also fixes the !ops->num_ops check to avoid UB. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	e7965e84c5	swscale/optimizer: fix unswizzle optimization The way this code was written relied on the implicit assumption that no other row was reading from the same column, which was true in practice so far but not necessarily true in general. Fix it by precomputing the nonzero component mask and then adding an explicit check. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	4b8790f107	swscale/optimizer: allow commuting CLEAR past SWAP_BYTES This requires a bit of a manual check in the 32-bit integer case to make sure we don't exceed the value range of AVRational; but it still allows quite a number of optimizations despite that restriction. e.g. rgb24 -> yuva444p9be: - [u16 ...X -> ++++] SWS_OP_CLEAR : {_ _ _ 511} - [u16 .... -> zzzz] SWS_OP_SWAP_BYTES - [u16 .... -> zzzz] SWS_OP_WRITE : 4 elem(s) planar >> 0 + [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES + [u16 ...X -> zzz+] SWS_OP_CLEAR : {_ _ _ 65281} + [u16 .... -> zzz+] SWS_OP_WRITE : 4 elem(s) planar >> 0 gray -> yuv444p12be: - [u16 .XXX -> +++X] SWS_OP_CLEAR : {_ 2048 2048 _} - [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES - [u16 ...X -> zzzX] SWS_OP_WRITE : 3 elem(s) planar >> 0 + [u16 .XXX -> zzXX] SWS_OP_SWAP_BYTES + [u16 .XXX -> z++X] SWS_OP_CLEAR : {_ 8 8 _} + [u16 ...X -> z++X] SWS_OP_WRITE : 3 elem(s) planar >> 0 Ultimately, the benefit of this will only become relevant once we start splitting apart planes, since then we can have planes with only CLEAR operations. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	9662d1fa97	swscale/optimizer: remove read+write optimization This optimization is lossy, since it removes important information about the number of planes to be copied. Subsumed by the more correct Instead, move this code to the new ff_sws_op_list_is_noop(). Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	61eca588dc	swscale/ops: move ff_sws_op_list_update_comps() to ops.c I think this is ultimately a better home, since the semantics of this are not really tied to optimization itself; and because I want to make it an explicitly suported part of the user-facing API (rather than just an internal-use field). The secondary motivating reason here is that I intend to use internal helpers of `ops.c` inside the next commit. (Though this is a weak reason on its own, and not sufficient to justify this move by itself.)	2025-12-24 16:37:22 +00:00
Niklas Haas	c31f3926d1	swscale/ops_optimizer: simplify loop slightly (cosmetic) We always `goto retry` whenever an optimization case is hit, so we don't need to defer the increment of `n`.	2025-12-20 13:52:45 +00:00
Niklas Haas	900d91b541	swscale/ops_optimizer: apply optimizations in a more predictable order Instead of blindly interleaving re-ordering and minimizing optimizations, separate this loop into several passes - the first pass will minimize the operation list in-place as much as possible, and the second pass will apply any desired re-orderings. (We also want to try pushing clear back before any other re-orderings, as this can trigger more phase 1 optimizations) This restructuring leads to significantly more predictable and stable behavior, especially when introducing more operation types going forwards. Does not actually affect the current results, but matters with some upcoming changes I have planned.	2025-12-20 13:52:45 +00:00
Niklas Haas	c51c63058c	swscale/ops_optimizer: don't commute clear with itself These would normally be merged, not swapped.	2025-12-20 13:52:45 +00:00
Niklas Haas	18edb246c8	swscale/ops_optimizer: correctly commute swizzles with dither ops This requires updating the order of the dither matrix offsets as well. In particular, this can only be done cleanly if the dither matrix offsets are compatible between duplicated channels; but this should be guaranteed after the previous commits. As a side note, this also fixes a bug where we pushed SWS_OP_SWIZZLE past SWS_OP_DITHER even for very low bit depth output (e.g. rgb4), which led to a big loss in accuracy due to loss of per-channel dither noise. Improves loss of e.g. gray -> rgb444 from 0.00358659 to 0.00261414, now beating legacy swscale in these cases as well.	2025-12-15 14:31:58 +00:00
Niklas Haas	7bce47cc63	swscale/ops_optimizer: only commute swizzle ops if coefficients are identical Otherwise, this is invalid; as the result of applying the next operation after channel duplication may not commute with the result of duplicating the result of applying the next operation on only one channel. In practice, this was using the last seen channel's coefficients. Note that this did not actually affect anything in practice, because the only relevant ops (MIN/MAX) were always generated with identical coefficients for identical channel ranges. However, it will matter moving forwards, as e.g. dither ops may not be commuted freely if their matrix offsets differ per channel.	2025-12-15 14:31:58 +00:00
Niklas Haas	1cc4c2b236	swscale/ops_optimizer: rework ambiguous op_type_is_independent() The current usage is ambiguous between "affects each component equally" and "affects each component independently" - and arguably, the current behavior was a bug (since SWS_OP_DITHER should not commute with a SWIZZLE, at least from a bit-exactness PoV). However, when trying to define cleaner replacements for these concepts, I realized there are too many special cases; and given that we only have two use sites, I decided to just split them directly into "commute" functions for those particular usage cases. As an added benefit, this moves the commutation logic out of the already-long ff_sws_ops_list_optimize().	2025-12-15 14:31:58 +00:00
Niklas Haas	f39fe6380c	swscale/ops_optimizer: set correct value range for subpixel reads e.g. rgb4 only reads values up to 15, not 255. Setting this correctly eliminates a number of redundant clamps in cases like e.g. rgb4 -> monow.	2025-12-08 20:09:37 +00:00
Andreas Rheinhardt	2451e06f19	all: Use "" instead of <> to include internal headers Reviewed-by: Niklas Haas <ffmpeg@haasn.dev> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-09-04 22:20:58 +02:00
Niklas Haas	696f3be322	swscale/optimizer: add packed shuffle solver This can turn any compatible sequence of operations into a single packed shuffle, including packed swizzling, grayscale->RGB conversion, endianness swapping, RGB bit depth conversions, rgb24->rgb0 alpha clearing and more.	2025-09-01 19:28:36 +02:00
Niklas Haas	ea9ca3ff35	swscale/optimizer: add high-level ops optimizer This is responsible for taking a "naive" ops list and optimizing it as much as possible. Also includes a small analyzer that generates component metadata for use by the optimizer.	2025-09-01 19:28:36 +02:00

24 commits