Commit graph

3279 commits

Author SHA1 Message Date
Martin Storsjö
9653588441 libswscale/arm: Switch consistent indentation to common style
Some of these files aligned instructions to 4/24 columns, while
we commonly indent arm/aarch64 assembly to 8/24 columns.
Some of these files also used a different alignment for the
operands.
2026-04-29 13:49:27 +03:00
Martin Storsjö
946e80fde7 libswscale/arm: Lowercase the "LSL" keyword 2026-04-29 13:49:27 +03:00
Marvin Scholz
e24882912f swscale/yuv2rgb: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
3e48505dda swscale: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
752cf875d8 swscale: replace fall-through comments 2026-04-28 12:29:37 +00:00
Zhao Zhili
1b98286131 swscale: unref on allocation failure in frame_alloc_buffers()
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2026-04-28 11:58:33 +00:00
Marvin Scholz
0fc1183a60 swscale: ops_dispatch: fix leak on error
Assign to `exec_base.in_offset_x` before the error handling,
to ensure the error cleanup path properly frees the already
allocated memory.

Fixes Coverity issue #1691725
2026-04-27 12:29:48 +00:00
Andreas Rheinhardt
4867d251ad swscale/x86/yuv2yuvX: Simplify rotating
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-26 23:48:21 +02:00
Andreas Rheinhardt
f5ed254528 swscale/x86/yuv2yuvX: Port ff_yuv2yuvX_mmxext to SSE2
The mmx function performs two registers in parallel;
given the larger register size of SSE2, the same amount
of data can be processed in one register with some speedups.
(Given that this function is used for tail-processing,
not processing more data is important.)

Switching to SSE2 also fixes a bug introduced in
554c2bc708: Since said
commit, only half the dither values were used. This
seems not to matter in practice, as the functions here
use dither only in the following form:
((filtersize-1)*8+dither)>>4. The dither values used
here come from ff_dither_8x8_128 which has the property
that ff_dither_8x8_128[i][j] and ff_dither_8x8_128[i][j+4]
always lead to the same result in the above formula.

Old benchmarks:
yuv2yuvX_8_2_0_512_approximate_c:                     2309.9 ( 1.00x)
yuv2yuvX_8_2_0_512_approximate_mmxext:                 250.2 ( 9.23x)
yuv2yuvX_8_2_0_512_approximate_sse3:                    98.8 (23.39x)
yuv2yuvX_8_2_0_512_approximate_avx2:                    52.9 (43.63x)
yuv2yuvX_8_2_16_512_approximate_c:                    2263.0 ( 1.00x)
yuv2yuvX_8_2_16_512_approximate_mmxext:                245.3 ( 9.22x)
yuv2yuvX_8_2_16_512_approximate_sse3:                  114.3 (19.80x)
yuv2yuvX_8_2_16_512_approximate_avx2:                   85.6 (26.45x)
yuv2yuvX_8_2_32_512_approximate_c:                    2155.8 ( 1.00x)
yuv2yuvX_8_2_32_512_approximate_mmxext:                235.6 ( 9.15x)
yuv2yuvX_8_2_32_512_approximate_sse3:                   93.6 (23.04x)
yuv2yuvX_8_2_32_512_approximate_avx2:                   78.1 (27.60x)
yuv2yuvX_8_2_48_512_approximate_c:                    2084.8 ( 1.00x)
yuv2yuvX_8_2_48_512_approximate_mmxext:                230.2 ( 9.05x)
yuv2yuvX_8_2_48_512_approximate_sse3:                  105.0 (19.85x)
yuv2yuvX_8_2_48_512_approximate_avx2:                   71.9 (29.00x)
yuv2yuvX_8_4_0_512_approximate_c:                     3496.3 ( 1.00x)
yuv2yuvX_8_4_0_512_approximate_mmxext:                 455.0 ( 7.68x)
yuv2yuvX_8_4_0_512_approximate_sse3:                   157.5 (22.20x)
yuv2yuvX_8_4_0_512_approximate_avx2:                    88.4 (39.53x)
yuv2yuvX_8_4_16_512_approximate_c:                    3380.9 ( 1.00x)
yuv2yuvX_8_4_16_512_approximate_mmxext:                440.0 ( 7.68x)
yuv2yuvX_8_4_16_512_approximate_sse3:                  175.0 (19.32x)
yuv2yuvX_8_4_16_512_approximate_avx2:                  134.1 (25.22x)
yuv2yuvX_8_4_32_512_approximate_c:                    3277.6 ( 1.00x)
yuv2yuvX_8_4_32_512_approximate_mmxext:                427.2 ( 7.67x)
yuv2yuvX_8_4_32_512_approximate_sse3:                  149.7 (21.89x)
yuv2yuvX_8_4_32_512_approximate_avx2:                  115.5 (28.37x)
yuv2yuvX_8_4_48_512_approximate_c:                    3167.8 ( 1.00x)
yuv2yuvX_8_4_48_512_approximate_mmxext:                414.9 ( 7.63x)
yuv2yuvX_8_4_48_512_approximate_sse3:                  164.1 (19.31x)
yuv2yuvX_8_4_48_512_approximate_avx2:                  101.2 (31.30x)
yuv2yuvX_8_8_0_512_approximate_c:                     5987.5 ( 1.00x)
yuv2yuvX_8_8_0_512_approximate_mmxext:                 854.1 ( 7.01x)
yuv2yuvX_8_8_0_512_approximate_sse3:                   294.6 (20.32x)
yuv2yuvX_8_8_0_512_approximate_avx2:                   144.1 (41.56x)
yuv2yuvX_8_8_16_512_approximate_c:                    5848.9 ( 1.00x)
yuv2yuvX_8_8_16_512_approximate_mmxext:                834.4 ( 7.01x)
yuv2yuvX_8_8_16_512_approximate_sse3:                  312.1 (18.74x)
yuv2yuvX_8_8_16_512_approximate_avx2:                  214.9 (27.22x)
yuv2yuvX_8_8_32_512_approximate_c:                    5610.1 ( 1.00x)
yuv2yuvX_8_8_32_512_approximate_mmxext:                811.6 ( 6.91x)
yuv2yuvX_8_8_32_512_approximate_sse3:                  277.5 (20.21x)
yuv2yuvX_8_8_32_512_approximate_avx2:                  189.8 (29.55x)
yuv2yuvX_8_8_48_512_approximate_c:                    5415.8 ( 1.00x)
yuv2yuvX_8_8_48_512_approximate_mmxext:                782.3 ( 6.92x)
yuv2yuvX_8_8_48_512_approximate_sse3:                  289.4 (18.72x)
yuv2yuvX_8_8_48_512_approximate_avx2:                  165.3 (32.76x)
yuv2yuvX_8_16_0_512_approximate_c:                   11100.7 ( 1.00x)
yuv2yuvX_8_16_0_512_approximate_mmxext:               1682.1 ( 6.60x)
yuv2yuvX_8_16_0_512_approximate_sse3:                  558.8 (19.86x)
yuv2yuvX_8_16_0_512_approximate_avx2:                  280.1 (39.63x)
yuv2yuvX_8_16_16_512_approximate_c:                  10772.1 ( 1.00x)
yuv2yuvX_8_16_16_512_approximate_mmxext:              1611.0 ( 6.69x)
yuv2yuvX_8_16_16_512_approximate_sse3:                 578.1 (18.63x)
yuv2yuvX_8_16_16_512_approximate_avx2:                 418.8 (25.72x)
yuv2yuvX_8_16_32_512_approximate_c:                  10381.5 ( 1.00x)
yuv2yuvX_8_16_32_512_approximate_mmxext:              1560.4 ( 6.65x)
yuv2yuvX_8_16_32_512_approximate_sse3:                 525.8 (19.74x)
yuv2yuvX_8_16_32_512_approximate_avx2:                 370.7 (28.01x)
yuv2yuvX_8_16_48_512_approximate_c:                  10046.1 ( 1.00x)
yuv2yuvX_8_16_48_512_approximate_mmxext:              1512.4 ( 6.64x)
yuv2yuvX_8_16_48_512_approximate_sse3:                 546.0 (18.40x)
yuv2yuvX_8_16_48_512_approximate_avx2:                 315.0 (31.89x)

New benchmarks:
yuv2yuvX_8_2_0_512_approximate_c:                     2302.5 ( 1.00x)
yuv2yuvX_8_2_0_512_approximate_sse2:                   184.4 (12.49x)
yuv2yuvX_8_2_0_512_approximate_sse3:                   100.1 (23.01x)
yuv2yuvX_8_2_0_512_approximate_avx2:                    54.9 (41.98x)
yuv2yuvX_8_2_16_512_approximate_c:                    2224.6 ( 1.00x)
yuv2yuvX_8_2_16_512_approximate_sse2:                  180.0 (12.36x)
yuv2yuvX_8_2_16_512_approximate_sse3:                  109.5 (20.31x)
yuv2yuvX_8_2_16_512_approximate_avx2:                   81.3 (27.35x)
yuv2yuvX_8_2_32_512_approximate_c:                    2165.3 ( 1.00x)
yuv2yuvX_8_2_32_512_approximate_sse2:                  176.6 (12.26x)
yuv2yuvX_8_2_32_512_approximate_sse3:                   93.7 (23.11x)
yuv2yuvX_8_2_32_512_approximate_avx2:                   73.1 (29.61x)
yuv2yuvX_8_2_48_512_approximate_c:                    2088.0 ( 1.00x)
yuv2yuvX_8_2_48_512_approximate_sse2:                  170.7 (12.23x)
yuv2yuvX_8_2_48_512_approximate_sse3:                  103.4 (20.20x)
yuv2yuvX_8_2_48_512_approximate_avx2:                   69.4 (30.10x)
yuv2yuvX_8_4_0_512_approximate_c:                     3496.8 ( 1.00x)
yuv2yuvX_8_4_0_512_approximate_sse2:                   320.3 (10.92x)
yuv2yuvX_8_4_0_512_approximate_sse3:                   158.8 (22.02x)
yuv2yuvX_8_4_0_512_approximate_avx2:                    86.4 (40.49x)
yuv2yuvX_8_4_16_512_approximate_c:                    3443.5 ( 1.00x)
yuv2yuvX_8_4_16_512_approximate_sse2:                  325.3 (10.59x)
yuv2yuvX_8_4_16_512_approximate_sse3:                  171.9 (20.03x)
yuv2yuvX_8_4_16_512_approximate_avx2:                  123.6 (27.85x)
yuv2yuvX_8_4_32_512_approximate_c:                    3272.2 ( 1.00x)
yuv2yuvX_8_4_32_512_approximate_sse2:                  302.7 (10.81x)
yuv2yuvX_8_4_32_512_approximate_sse3:                  148.9 (21.98x)
yuv2yuvX_8_4_32_512_approximate_avx2:                  110.6 (29.58x)
yuv2yuvX_8_4_48_512_approximate_c:                    3166.3 ( 1.00x)
yuv2yuvX_8_4_48_512_approximate_sse2:                  291.0 (10.88x)
yuv2yuvX_8_4_48_512_approximate_sse3:                  162.9 (19.44x)
yuv2yuvX_8_4_48_512_approximate_avx2:                  102.3 (30.95x)
yuv2yuvX_8_8_0_512_approximate_c:                     5967.6 ( 1.00x)
yuv2yuvX_8_8_0_512_approximate_sse2:                   691.2 ( 8.63x)
yuv2yuvX_8_8_0_512_approximate_sse3:                   294.2 (20.28x)
yuv2yuvX_8_8_0_512_approximate_avx2:                   154.9 (38.52x)
yuv2yuvX_8_8_16_512_approximate_c:                    5780.2 ( 1.00x)
yuv2yuvX_8_8_16_512_approximate_sse2:                  606.2 ( 9.53x)
yuv2yuvX_8_8_16_512_approximate_sse3:                  309.3 (18.69x)
yuv2yuvX_8_8_16_512_approximate_avx2:                  208.7 (27.69x)
yuv2yuvX_8_8_32_512_approximate_c:                    5604.3 ( 1.00x)
yuv2yuvX_8_8_32_512_approximate_sse2:                  592.3 ( 9.46x)
yuv2yuvX_8_8_32_512_approximate_sse3:                  281.1 (19.94x)
yuv2yuvX_8_8_32_512_approximate_avx2:                  185.4 (30.23x)
yuv2yuvX_8_8_48_512_approximate_c:                    5413.7 ( 1.00x)
yuv2yuvX_8_8_48_512_approximate_sse2:                  570.4 ( 9.49x)
yuv2yuvX_8_8_48_512_approximate_sse3:                  294.9 (18.36x)
yuv2yuvX_8_8_48_512_approximate_avx2:                  166.5 (32.51x)
yuv2yuvX_8_16_0_512_approximate_c:                   11099.4 ( 1.00x)
yuv2yuvX_8_16_0_512_approximate_sse2:                 1213.6 ( 9.15x)
yuv2yuvX_8_16_0_512_approximate_sse3:                  563.0 (19.72x)
yuv2yuvX_8_16_0_512_approximate_avx2:                  294.8 (37.65x)
yuv2yuvX_8_16_16_512_approximate_c:                  10718.1 ( 1.00x)
yuv2yuvX_8_16_16_512_approximate_sse2:                1121.2 ( 9.56x)
yuv2yuvX_8_16_16_512_approximate_sse3:                 563.7 (19.01x)
yuv2yuvX_8_16_16_512_approximate_avx2:                 389.5 (27.51x)
yuv2yuvX_8_16_32_512_approximate_c:                  10373.3 ( 1.00x)
yuv2yuvX_8_16_32_512_approximate_sse2:                1096.2 ( 9.46x)
yuv2yuvX_8_16_32_512_approximate_sse3:                 526.7 (19.70x)
yuv2yuvX_8_16_32_512_approximate_avx2:                 354.7 (29.24x)
yuv2yuvX_8_16_48_512_approximate_c:                  10066.9 ( 1.00x)
yuv2yuvX_8_16_48_512_approximate_sse2:                1055.8 ( 9.53x)
yuv2yuvX_8_16_48_512_approximate_sse3:                 527.9 (19.07x)
yuv2yuvX_8_16_48_512_approximate_avx2:                 313.7 (32.09x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-26 23:48:21 +02:00
Andreas Rheinhardt
62285be009 swscale/x86/swscale_template: Don't set use_mmx_vfilter when disabled
Commit 554c2bc708
ported the yuv2planeX functions that are set iff
use_mmx_vfilter is set to external assembly
and did it in a way that resulted in linking failures
when inline assembly is enabled, but external assembly
is disabled. This was later fixed in commit
c00567647e, but in such
a manner that use_mmx_vfilter can be set without any
of the accompanying yuv2planeX functions being set;
and in case inline assembly was unavailable,
these external assembly functions would never be selected.

This makes the filter-fps and filter-fps-cfr tests fail
with inline assembly but with --disable-x86asm, as
reported in issue #21113. Fix this by moving sws_init_swscale_mmxext
directly into ff_sws_init_swscale_x86() and setting
use_mmx_vfilter directly besides the yuv2planeX function pointer.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-26 23:48:21 +02:00
Niklas Haas
96f82f4fbb swscale/x86/ops: simplify SWS_OP_CLEAR patterns
Mark the components to be cleared, not the components to be preserved.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:25:17 +02:00
Niklas Haas
08707934cc swscale/ops_backend: simplify SWS_OP_CLEAR declarations
Mark the components to be cleared, not the components to be preserved.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:25:17 +02:00
Niklas Haas
7a71a01a1b swscale/ops: nuke SwsComps.unused
Finally, remove the last relic of this accursed design mistake.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:25:17 +02:00
Niklas Haas
a797e30f71 swscale/aarch64/ops: compute SWS_OP_PACK mask directly
Instead of implicitly relying on SwsComps.unused, which contains the exact
same information. (cf. ff_sws_op_list_update_comps)

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:25:17 +02:00
Niklas Haas
6d1e549195 swscale/aarch64/ops: use SWS_OP_NEEDED() instead of next->comps.unused
These are basically identical, but the latter is being phased out.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:25:17 +02:00
Niklas Haas
18cc71fc8e swscale/aarch64/ops: fix SWS_OP_LINEAR mask check
The implementation of AARCH64_SWS_OP_LINEAR loops over elements of this mask
to determine which *output* rows to compute. However, it is being set by this
loop to `op->comps.unused`, which is a mask of unused *input* rows. As such,
it should be looking at `next->comps.unused` instead.

This did not result in problems in practice, because none of the linear
matrices happened to trigger this case (more input columns than output rows).

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:25:17 +02:00
Niklas Haas
df4fe85ae3 swscale/ops_chain: replace SwsOpEntry.unused by SwsCompMask
Needed to allow us to phase out SwsComps.unused altogether.

It's worth pointing out the change in semantics; while unused tracks the
unused *input* components, the mask is defined as representing the
computed *output* components.

This is 90% the same, expect for read/write, pack/unpack, and clear; which
are the only operations that can be used to change the number of components.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:25:10 +02:00
Niklas Haas
215cd90201 swscale/x86/ops: simplify DECL_DITHER definition
This extra indirection boilerplate just for the 0-size fast path really isn't
doing us any favors.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:24:55 +02:00
Niklas Haas
9f0dded48d swscale/ops_chain: check for exact linear mask match
Makes this logic a lot simpler and less brittle. We can trivially adjust the
list of linear masks that are required, whenever it changes as a result of any
future modifications.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:24:55 +02:00
Niklas Haas
e20a32d730 swscale/x86/ops: align linear kernels with reference backend
See previous commit.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:24:55 +02:00
Niklas Haas
9b1c1fe95f swscale/ops_backend: align linear kernels with actually needed masks
Using the power of libswscale/tests/sws_ops -summarize lets us see which
kernels are actually needed by real op lists.

Note: I'm working on a separate series which will obsolete this implementation
whack-a-mole game altogether, by generating a list of all possible op kernels
at compile time.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:24:55 +02:00
Niklas Haas
af2674645f swscale/ops: drop offset from SWS_MASK_ALPHA
This is far more commonly used without an offset than with; so having it there
prevents these special cases from actually doing much good.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:24:55 +02:00
Niklas Haas
526195e0a3 swscale/x86/ops_float: fix typo in linear_row
First vector is %2, not %3. This was never triggered before because all of
the existing masks never hit this exact case.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:24:55 +02:00
Niklas Haas
6a83e15392 swscale/ops_chain: simplify SwsClearOp checking
Since this now has an explicit mask, we can just check that directly, instead
of relying on the unused comps hack/trick.

Additionally, this also allows us to distinguish between fixed value and
arbitrary value clears by just having the SwsOpEntry contain NAN values iff
they support any clear value.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:24:22 +02:00
Niklas Haas
80bd6c0cd5 swscale/ops: don't strip range metadata for unused components
As alluded to by the previous commit, this is now no longer necessary to
prevent their print-out.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:23:36 +02:00
Niklas Haas
3680642e1b swscale/ops: simplify min/max range print check
This does come with a slight change in behavior, as we now don't print the
range information in the case that the range is only known for *unused*
components. However, in practice, that's already guaranteed by update_comps()
stripping the range info explicitly in this case.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:23:36 +02:00
Niklas Haas
9bb2b11d5b swscale/ops: add SwsCompMask parameter to print_q4()
Instead of implicitly excluding NAN values if ignore_den0 is set. This
gives callers more explicit control over which values to print, and in
doing so, makes sure "unintended" NaN values are properly printed as such.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:23:36 +02:00
Niklas Haas
cf2d40f65d swscale/ops: add explicit clear mask to SwsClearOp
Instead of implicitly testing for NaN values. This is mostly a straightforward
translation, but we need some slight extra boilerplate to ensure the mask
is correctly updated when e.g. commuting past a swizzle.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:23:36 +02:00
Niklas Haas
4020607f0a swscale/ops: add SwsCompMask and related helpers
This new type will be used over the following commits to simplify the
codebase.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:23:36 +02:00
Niklas Haas
ce2ca1a186 swscale/ops_optimizer: fix commutation of U32 clear + swap_bytes
This accidentally unconditionally overwrote the entire clear mask, since
Q(n) always set the denominator to 1, resulting in all channels being
cleared instead of just the ones with nonzero denominators.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:23:36 +02:00
Niklas Haas
953d278a01 tests/swscale: fix input pattern generation for very small sizes
This currently completely fails for images smaller than 12x12; and even in that
case, the limited resolution makes these tests a bit useless.

At the risk of triggering a lot of spurious SSIM regressions for very
small sizes (due to insufficiently modelling the effects of low resolution on
the expected noise), this patch allows us to at least *run* such tests.

Incidentally, 8x8 is the smallest size that passes the SSIM check.
2026-04-16 20:59:39 +00:00
Niklas Haas
0da2bbab68 swscale/ops_dispatch: re-indent (cosmetic) 2026-04-16 20:59:39 +00:00
Niklas Haas
4c19f82cc0 swscale/ops_dispatch: compute minimum needed tail size
Not only does this take into account extreme edge cases where the plane
padding can significantly exceed the actual width/stride, but it also
correctly takes into account the filter offsets when scaling; which the
previous code completely ignored.

Simpler, robuster, and more correct. Now valgrind passes for 100% of format
conversions for me, with and without scaling.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
cd8ece4114 swscale/ops_dispatch: generalize the number of tail blocks
This is a mostly straightforward internal mechanical change that I wanted
to isolate from the following commit to make bisection easier in the case of
regressions.

While the number of tail blocks could theoretically be different for input
vs output memcpy, the extra complexity of handling that mismatch (and
adjusting all of the tail offsets, strides etc.) seems not worth it.

I tested this commit by manually setting `p->tail_blocks` to higher values
and seeing if that still passed the self-check under valgrind.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
dba7b81b38 swscale/ops_dispatch: avoid calling comp->func with w=0
The x86 kernel e.g. assumes that at least one block is processed; so avoid
calling this with an empty width. This is currently only possible if e.g.
operating on an unpadded, very small image whose total linesize is less than
a single block.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
35174913ac swscale/ops_dispatch: fix and generalize tail buffer size calculation
This code had two issues:

1. It was over-allocating bytes for the input offset map case, and
2. It was hard-coding the assumption that there is only a single tail block

We can fix both of these issues by rewriting the way the tail size is derived.

In the non-offset case, and assuming only 1 tail block:
    aligned_w - safe_width
  = num_blocks * block_size - (num_blocks - 1) * block_size
  = block_size

Additionally, the FFMAX(tail_size_in/out) is unnecessary, because:
    tail_size = pass->width - safe_width <= aligned_w - safe_width

In the input offset case, we instead realize that the input kernel already
never over-reads the input due to the filter size adjustment/clamping, so
the only thing we need to ensure is that we allocate extra bytes for the
input over-read.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
f604add8c1 swscale/ops_dispatch: remove pointless AV_CEIL_RSHIFT()
The over_read/write fields are not documented as depending on the subsampling
factor. Actually, they are not documented as depending on the plane at all.

If and when we do actually add support for horizontal subsampling to this
code, it will most likely be by turning all of these key variables into
arrays, which will be an upgrade we get basically for free.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
dd8ff89adf swscale/ops_dispatch: add helper to explicitly control pixel->bytes rounding
This makes it far less likely to accidentally add or remove a +7 bias when
repeating this often-used expression.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
16a57b2985 swscale/ops_dispatch: ensure block size is multiple of pixel size
This could trigger if e.g. a backend tries to operate on monow formats with
a block size that is not a multiple of 1. In this case, `block_size_in`
would previously be miscomputed (to e.g. 0), which is obviously wrong.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
86307dad4a swscale/ops_dispatch: make offset calculation code robust against overflow
As well as weird edge cases like trying to filter `monow` and pixels landing
in the middle of a byte. Realistically, this will never happen - we'd instead
pre-process it into something byte-aligned, and then dispatch a byte-aligned
filter on it.

However, I need to add a check for overflow in any case, so we might as well
add the alignment check at the same time. It's basically free.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
95e4f7cac5 swscale/ops_dispatch: fix rounding direction of plane_size
This is an upper bound, so it should be rounded up.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
c6e47b293d swscale/ops_dispatch: pre-emptively guard against int overflow
By using size_t whenever we compute derived figures.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
0524e66aec swscale/ops_dispatch: drop pointless const (cosmetic)
These are clearly not mutated within their constrained scope, and it just
wastes valuable horizontal space.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
c98810ac78 swscale/ops_dispatch: zero-init tail buffer
Prevents valgrind from complaining about operating on uninitialized bytes.
This should be cheap as it's only done once during setup().

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
ba516a34cd swscale/x86/ops_int: use sized mov for packed_shuffle output
This code made the input read conditional on the byte count, but not the
output, leading to a lot of over-write for cases like 15, 5.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
4264045137 swscale/x86/ops: set missing over_read metadata on filter ops
These align the filter size to a multiple of the internal tap grouping
(either 1/2/4 for vpgatherdd, or the XMM size for the 4x4 transposed kernel).
This may over-read past the natural end of the input buffer, if the aligned
size exceeds the true size.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Kacper Michajłow
369dbbe488 swscale/ops_memcpy: guard exec->in_stride[-1] access
When use_loop == true and idx < 0, we would incorrectly check
in_stride[idx], which is OOB read. Reorder conditions to avoid that.

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2026-04-16 18:56:22 +00:00
Niklas Haas
1764683668 swscale/ops_backend: disable FP contraction where possible
In particular, Clang defaults to FP contraction enabled. GCC defaults to
off in standard C mode (-std=c11), but the C standard does not actually
require any particular default.

The #pragma STDC pragma, despite its name, warns on anything except Clang.

Fixes: https://code.ffmpeg.org/FFmpeg/FFmpeg/issues/22796
See-also: https://discourse.llvm.org/t/fp-contraction-fma-on-by-default/64975
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 17:19:51 +00:00
Niklas Haas
e199d6b375 swscale/x86/ops: add missing component annotation on expand_bits
This only does a single component; so it should be marked as such.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-15 14:51:16 +00:00
Niklas Haas
b6755b0158 swscale/ops_memcpy: always use loop on buffers with large padding
The overhead of the loop and memcpy call is less than the overhead of
possibly spilling into  one extra unnecessary cache line. 64 is still a
good rule of thumb for L1 cache line size in 2026.

I leave it to future code archeologists to find and tweak this constant if
it ever becomes unnecessary.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-15 14:51:16 +00:00