Commit graph

3324 commits

Author SHA1 Message Date
Niklas Haas
aa08cf8112 swscale/options: add missing option value for SWS_STRICT
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-23 11:31:54 +02:00
DROOdotFOO
2e142e52ae swscale/aarch64: add NEON yuv->rgb16 fast paths
Add NEON unscaled converters for {yuv420p, yuv422p, yuva420p, nv12, nv21}
to {rgb565le, bgr565le, rgb555le, bgr555le}.

The 16bpp packing uses v8/v9 as the output accumulator. Since AAPCS-64
requires d8-d15 to be callee-saved, declare_func now wraps a
stp d8, d9 / ldp d8, d9 around 16bpp paths only (gated by .ifc on the
output format). Pattern matches libswscale/aarch64/hscale.S.

yuva420p -> 16bpp drops alpha and routes through the yuv420p wrappers,
mirroring how yuva420p -> rgb24/bgr24 already work in tree.

Speedup vs C at width=1920 on Apple M1 (checkasm --bench):

  | input    | rgb565le | bgr565le | rgb555le | bgr555le |
  |----------|----------|----------|----------|----------|
  | yuv420p  | 3.69x    | 3.68x    | 3.28x    | 3.31x    |
  | yuv422p  | 4.70x    | 4.70x    | 4.32x    | 4.35x    |
  | yuva420p | 3.67x    | 3.66x    | 3.32x    | 3.27x    |

NEON cycles are ~48 for planar and ~50.5 for semi-planar across all
four outputs. yuv422p shows the biggest speedup because its C
reference is the most expensive. 555 ratios trail 565 because the C
reference is faster for 555 (one fewer mask bit); NEON cycles are the
same. nv12/nv21 are bench-only (see the preceding checkasm commit) and
run at the same ~50.5 cycles.

This only handles the little endian forms of the 16 bit RGB formats.

Verified with checkasm --test=sw_yuv2rgb (110/110) and the full
checkasm regression (7657/7657) on Apple M1.

Signed-off-by: DROOdotFOO <drew@axol.io>
2026-05-22 10:03:07 +00:00
Lynne
f17c8db820
swscale/vulkan: add a non-bitexact version of OP_LINEAR
Uses matrix*vector + vector multiplication.

Sponsored-by: Sovereign Tech Fund
2026-05-22 15:27:08 +09:00
Lynne
6d57426b6a
swscale/vulkan: create a constant matrix from linear op constants
Sponsored-by: Sovereign Tech Fund
2026-05-22 15:27:07 +09:00
Lynne
2423a719e0
swscale/vulkan: put entire linear matrix+vector as constant data
Rather than only using what we need.
The driver will remove any unused constants.

Sponsored-by: Sovereign Tech Fund
2026-05-22 15:27:07 +09:00
Lynne
198991372c
swscale/vulkan: move linear op handling to a separate function
Sponsored-by: Sovereign Tech Fund
2026-05-22 15:27:03 +09:00
Lynne
c40ac0f03a
swscale/vulkan: add support for filtering on SWS_OP_READ
Sponsored-by: Sovereign Tech Fund
2026-05-22 14:05:27 +09:00
Lynne
448e08aa80
swscale/vulkan: allocate buffers for scaling filters
Simply allocates buffers to hold filter data.

Sponsored-by: Sovereign Tech Fund
2026-05-22 14:05:26 +09:00
Lynne
b7ccdaa018
swscale/vulkan: make buffer descriptor generation generic
Again, simple rename.

Sponsored-by: Sovereign Tech Fund
2026-05-22 14:05:26 +09:00
Lynne
d0af60afa8
swscale/vulkan: make dither buffer allocation path generic
Just a simple rename.

Sponsored-by: Sovereign Tech Fund
2026-05-22 14:05:26 +09:00
Lynne
c8ddaa97db
swscale/vulkan: base dispatch size on output image size, rather than input
Sponsored-by: Sovereign Tech Fund
2026-05-22 14:05:26 +09:00
Lynne
51d4406e07
swscale/graph: support allocating hardware intermediate frames
Sponsored-by: Sovereign Tech Fund
2026-05-22 14:05:21 +09:00
Niklas Haas
d94c293e62 swscale/ops_dispatch: prevent float over-read when horizontal filtering
The code made the fundamental assumption that over-read into the padding
bytes is okay to do; because the most that can happen is that those pixel
values end up corrupted, which doesn't affect any adjacent pixels.

However, this is not true for SWS_OP_FILTER_H, because this operation
fundamentally mixes together horizontal pixels. Normally, this was fine,
because the filter weights for those pixels are set to 0, and 0 * x = 0.

However, that is not true for floating point inputs, which can contain
Infinity; and 0 * Infinity = NaN, thus corrupting the entire pixel.

Solve it by specifically preventing over-read when it would be unsafe.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-20 21:45:28 +00:00
Niklas Haas
6bc0f9517c swscale/ops_dispatch: rename filter_size to filter_size_h
Since this is not set for vertical filters.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-20 21:45:28 +00:00
Lynne
489a3834d2
swscale/vulkan: implement SWS_OP_PACK/SWS_OP_UNPACK
The issue is that while Vulkan already does the decomposition for us,
swscale assumes that the pixels will be in bitstream order, rather than
in their decomposed form.
This is valid for all packed formats for which these instructions are
issued (XV30 and X2RGB10).
This allows us to support the formats in Vulkan.

Sponsored-by: Sovereign Tech Fund
2026-05-19 03:22:29 +09:00
Niklas Haas
0c1a1ee12e swscale/ops_optimizer: don't push scale past truncating conversions
In an op list like:

  [ u8 +XXX] SWS_OP_READ         : 1 elem(s) planar >> 3
  [ u8 .XXX] SWS_OP_FILTER_V     : 256 -> 320 bilinear (2 taps)
  [f32 .XXX] SWS_OP_SCALE        : * 65535
  [f32 +XXX] SWS_OP_CONVERT      : f32 -> u16
  [u16 zXXX] SWS_OP_SWAP_BYTES
  [u16 zzzX] SWS_OP_SWIZZLE      : 0003
  [u16 zzz+] SWS_OP_CLEAR        : {_ _ _ 65535}
  [u16 XXXX] SWS_OP_WRITE        : 4 elem(s) packed >> 0

The current version of the code would happily push the SWS_OP_SCALE past
the truncating conversion, leading to degenerate loss of information. (In
this case, the result was quite extreme)

Affects quality across a wide range of formats, e.g.:

 rgb24 16x16 -> rgb48be 16x32:
   [ u8 +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 ...X] SWS_OP_FILTER_V     : 16 -> 32 bilinear (2 taps)
     min: {0 0 0 _}, max: {255 255 255 _}
+  [f32 ...X] SWS_OP_SCALE        : * 257
+    min: {0 0 0 _}, max: {65535 65535 65535 _}
   [f32 +++X] SWS_OP_CONVERT      : f32 -> u16
-    min: {0 0 0 _}, max: {255 255 255 _}
-  [u16 +++X] SWS_OP_SCALE        : * 257
     min: {0 0 0 _}, max: {65535 65535 65535 _}
   [u16 zzzX] SWS_OP_SWAP_BYTES
     min: {0 0 0 _}, max: {65535 65535 65535 _}
   [u16 XXXX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
     (X = unused, z = byteswapped, + = exact, 0 = zero)

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-17 10:41:34 +00:00
Niklas Haas
812b5654ae swscale/tests/sws_ops: use SWS_SCALE_BILINEAR for printing ops lists
This actually changes the behavior vs SWS_SCALE_POINT, because point scaling
is bit-exact and thus implies a different set of optimizations.

Ideally, we would still try and somehow merge this with tests/swscale.c to
allow testing a different set of scalers; but I still don't have a good idea
for how to accomplish that here.

As it stands, results in additional extra dithering steps in almost all
filters involving scaling, e.g.:

 rgb24 16x16 -> rgb24 16x32:
   [ u8 +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
     min: {0 0 0 _}, max: {255 255 255 _}
-  [ u8 +++X] SWS_OP_FILTER_V     : 16 -> 32 point (1 taps)
+  [ u8 ...X] SWS_OP_FILTER_V     : 16 -> 32 bilinear (2 taps)
     min: {0 0 0 _}, max: {255 255 255 _}
+  [f32 ...X] SWS_OP_DITHER       : 16x16 matrix + {0 3 2 -1}
+    min: {1/512 1/512 1/512 _}, max: {255.998047 255.998047 255.998047 _}
+  [f32 ...X] SWS_OP_MIN          : x <= {255 255 255 _}
+    min: {1/512 1/512 1/512 _}, max: {255 255 255 _}
   [f32 +++X] SWS_OP_CONVERT      : f32 -> u8
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
     (X = unused, z = byteswapped, + = exact, 0 = zero)

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-17 10:41:34 +00:00
Niklas Haas
2dfe055ddd swscale/tests/sws_ops: print split sub-passes for lists with filters
This allows us to inspect exactly the logic that is going on inside the CPU
backends (which don't support bare filter passes).

 rgb24 16x16 -> rgb24 16x32:
   [ u8 +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 +++X] SWS_OP_FILTER_V     : 16 -> 32 point (1 taps)
     min: {0 0 0 _}, max: {255 255 255 _}
   [f32 +++X] SWS_OP_CONVERT      : f32 -> u8
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
     (X = unused, z = byteswapped, + = exact, 0 = zero)
+ Retrying with split passes:
+  [ u8 +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
+    min: {0 0 0 _}, max: {255 255 255 _}
+  [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) planar >> 0
+    (X = unused, z = byteswapped, + = exact, 0 = zero)
+ Sub-pass #1:
+  [ u8 +++X] SWS_OP_READ         : 3 elem(s) planar >> 0 + 1 tap point filter (V)
+    min: {0 0 0 _}, max: {255 255 255 _}
+  [f32 +++X] SWS_OP_CONVERT      : f32 -> u8
+    min: {0 0 0 _}, max: {255 255 255 _}
+  [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
+    (X = unused, z = byteswapped, + = exact, 0 = zero)
 rgb24 16x16 -> rgb24 32x16:
   [ u8 +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 +++X] SWS_OP_FILTER_H     : 16 -> 32 point (1 taps)
     min: {0 0 0 _}, max: {255 255 255 _}
   [f32 +++X] SWS_OP_CONVERT      : f32 -> u8
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
     (X = unused, z = byteswapped, + = exact, 0 = zero)
+ Retrying with split passes:
+  [ u8 +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
+    min: {0 0 0 _}, max: {255 255 255 _}
+  [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) planar >> 0
+    (X = unused, z = byteswapped, + = exact, 0 = zero)
+ Sub-pass #1:
+  [ u8 +++X] SWS_OP_READ         : 3 elem(s) planar >> 0 + 1 tap point filter (H)
+    min: {0 0 0 _}, max: {255 255 255 _}
+  [f32 +++X] SWS_OP_CONVERT      : f32 -> u8
+    min: {0 0 0 _}, max: {255 255 255 _}
+  [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
+    (X = unused, z = byteswapped, + = exact, 0 = zero)

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-17 10:41:34 +00:00
Niklas Haas
369a301669 swscale/tests/sws_ops: use a dummy ops backend for printing
This ensures that the ops printing path goes through the same code as the
actual ops dispatch backend, including all sub-passes etc.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-17 10:41:34 +00:00
Niklas Haas
76dc83d9be swscale/ops_dispatch: make ff_sws_ops_compile() output optional
Allows the uops macro generation code to not actually compile any passes.
More generally, this could be used to e.g. test if an op list is supported by
a backend without actually creating the passes.

The `bool first` change is needed because the `input == prev` check no longer
works if we don't actually compiled any passes.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas
420b1bf368 swscale/ops_dispatch: allow forcing specific ops backend
This will be used eventually when I rewrite checkasm/sw_ops to re-use the
code in ops_dispatch.c instead of hand-rolling the execution layer.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas
9021448857 swscale/ops_dispatch: merge ff_sws_ops_compile_backend() and compile()
Passing backend == NULL now loops over the backends as before.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas
ad17144ce6 swscale/ops_dispatch: move op list print to ff_sws_ops_compile_backend()
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas
90669ab52e swscale/ops: move ff_sws_compile_pass() and friends to ops_dispatch.h
This function actually lives in ops_dispatch.c, and doesn't really make
sense in ops.h anymore. We should also move some stuff out of ops_internal.h,
which doesn't depend on any external ops stuff, here.

This allows the backend/compilation-related stuff to co-exist more nicely.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas
1d841635a4 swscale/ops: also include scaling ops in ff_sws_enum_op_lists()
Using the configured scaler from the SwsContext implicitly. This does affect
the output of libswscale/tests/sws_ops.c, which now prints about 4x as much
data (taking roughly 4x as long, but still within a second on my machine).

We can make this process a lot faster by forcing SWS_SCALE_POINT as the
scaler, which skips calculating any actual filter weights in favor of
generating a trivial 1-tap filter.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas
eec9f712f5 swscale/ops: re-use ff_sws_op_list_generate() in ff_sws_enum_op_lists()
The only difference here is an extra ff_sws_add_filters() call, which is
a no-op because src w/h = dst w/h = 16.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas
cac183f46f swscale/ops: don't silently suppress non-ENOTSUP errors
Matches the behavior to the comment.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas
dacbf080f3 swscale/ops_chain: simplify ff_sws_op_compile_tables() signature
This no longer accesses prev/next as a result of the `unused` removal, so
the signature can be simplified to just take the op directly.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas
064600585e swscale/ops_chain: remove flexible from SWS_OP_MIN/MAX entries
We have other op types that skip checking the data even in non-flexible mode,
so there is a precedent for just leaving away `flexible` for such kernels.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas
98c1dbafbe swscale/ops_memcpy: don't depend on ops_backend.h
This is private to the C template based backend.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas
62aad4513c swscale/graph: move format conversion logic to formats.c
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas
0611abc1bb swscale/graph: move code for adding filters to format.h
Mirroring the precedent established by the other SwsOp-generating functions.
This allows us to re-use it for the uops macro generator.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas
9fe0ff3d56 swscale/graph: make _reinit() only call _init(), not _create()
This allows us to preserve the same memory allocation when
reinitializing a graph, which is a nice bonus.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas
56305c460c swscale/graph: add ff_sws_graph_alloc() and _init()
As an alternative to the current _create() API.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas
5e0dddef80 swscale/graph: move graph uninit logic to helper function
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Ramiro Polla
d0a84c660a swscale/unscaled: fix rgbToRgbWrapper for non-native-endian formats
The fix from 5fa2a65c11 introduced a regression for non-native-endian
formats (such as rgb565be on a little-endian system).

Reproducible with:
$ ./libswscale/tests/swscale -unscaled 1 -src rgb565be -dst rgb24

Also:
$ ./ffmpeg_g -i /opt/samples/jpegls/128.jls -vf "scale=size=512x512,format=rgb24,scale=flags=neighbor,format=rgb565be" -f rawvideo -vframes 1 -y rgb565be.raw
$ magick -size 512x512 -endian MSB RGB565:rgb565be.raw output.png
$ ./ffplay_g output.png

(note: don't use ffmpeg to convert from rgb565be.raw to output for the
test above since it will perform the same bug and cancel out the error)
2026-05-15 14:21:50 +00:00
Ramiro Polla
d812c8b0eb swscale/tests/swscale: log test parameters on loss error
When running with "-v 0", the test parameters were not being printed,
which made it hard to track down which conversion the error referred
to.

Now the test parameters are logged with av_log() when a loss error
happens.
2026-05-15 14:12:48 +00:00
Ramiro Polla
1cc9b15bab swscale/tests/swscale: fix -p option when -flags and/or -unscaled are used
The -p, -flags, and -unscaled options all affected the decision to
select a subsample of the tests to run. When specifying -p 0.1, about
57% of the tests would run instead of the expect 10%.

This commit fixes this by separating -p from -flags and -unscaled.
2026-05-15 14:12:48 +00:00
Ramiro Polla
24d432e227 swscale/tests/swscale: improve help text for -p option 2026-05-15 14:12:48 +00:00
Niklas Haas
c1ff2c24b5 swscale/filters: hard-code radius for trivial kernels
box() and triangle() have well-defined, trivially verifiable numerical
inverses.

We could actually pre-compute and hard-code the numerical inverse of all
non-parametric kernels, but I'm a bit reluctant to do this as I have plans to
adjust the value of SWS_MAX_REDUCE_CUTOFF based on the desired bit depth of the
output, which makes a hard-coding approach unfeasible.

(It would also be a brittle solution that may break whenever we extend the
scaler configuration API, as well as making it harder to add new filters)

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-11 19:59:39 +02:00
Andreas Rheinhardt
2d0d937ed2 swscale/ops_chain: Use av_fallthrough to mark fallthrough
Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-05-03 18:22:05 +02:00
Andreas Rheinhardt
a867648555 swscale/x86/swscale: Fix shadowing
Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-05-03 18:22:03 +02:00
Andreas Rheinhardt
e241a45548 swscale/x86/swscale: Add av_fallthrough
Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-05-03 18:21:45 +02:00
Michael Niedermayer
43a0715e30 swscale/swscale_unscaled: adjust last line copy
Fixes: out of array access
Fixes: DFVULN-694

*Reporter: Zhenpeng (Leo) Lin at depthfirst*

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-03 14:52:32 +00:00
Michael Niedermayer
7d0837a742 swscale/swscale: Check srcSliceY and srcSliceH
Obviously noone should pass negative values, they make no sense, but better to
explicitly check

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-05-03 14:52:32 +00:00
Martin Storsjö
9653588441 libswscale/arm: Switch consistent indentation to common style
Some of these files aligned instructions to 4/24 columns, while
we commonly indent arm/aarch64 assembly to 8/24 columns.
Some of these files also used a different alignment for the
operands.
2026-04-29 13:49:27 +03:00
Martin Storsjö
946e80fde7 libswscale/arm: Lowercase the "LSL" keyword 2026-04-29 13:49:27 +03:00
Marvin Scholz
e24882912f swscale/yuv2rgb: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
3e48505dda swscale: add fall-through annotations 2026-04-28 12:29:37 +00:00
Marvin Scholz
752cf875d8 swscale: replace fall-through comments 2026-04-28 12:29:37 +00:00