ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-06-04 22:50:24 +00:00

Author	SHA1	Message	Date
Niklas Haas	aa08cf8112	swscale/options: add missing option value for SWS_STRICT Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-23 11:31:54 +02:00
DROOdotFOO	2e142e52ae	swscale/aarch64: add NEON yuv->rgb16 fast paths Add NEON unscaled converters for {yuv420p, yuv422p, yuva420p, nv12, nv21} to {rgb565le, bgr565le, rgb555le, bgr555le}. The 16bpp packing uses v8/v9 as the output accumulator. Since AAPCS-64 requires d8-d15 to be callee-saved, declare_func now wraps a stp d8, d9 / ldp d8, d9 around 16bpp paths only (gated by .ifc on the output format). Pattern matches libswscale/aarch64/hscale.S. yuva420p -> 16bpp drops alpha and routes through the yuv420p wrappers, mirroring how yuva420p -> rgb24/bgr24 already work in tree. Speedup vs C at width=1920 on Apple M1 (checkasm --bench): \| input \| rgb565le \| bgr565le \| rgb555le \| bgr555le \| \|----------\|----------\|----------\|----------\|----------\| \| yuv420p \| 3.69x \| 3.68x \| 3.28x \| 3.31x \| \| yuv422p \| 4.70x \| 4.70x \| 4.32x \| 4.35x \| \| yuva420p \| 3.67x \| 3.66x \| 3.32x \| 3.27x \| NEON cycles are ~48 for planar and ~50.5 for semi-planar across all four outputs. yuv422p shows the biggest speedup because its C reference is the most expensive. 555 ratios trail 565 because the C reference is faster for 555 (one fewer mask bit); NEON cycles are the same. nv12/nv21 are bench-only (see the preceding checkasm commit) and run at the same ~50.5 cycles. This only handles the little endian forms of the 16 bit RGB formats. Verified with checkasm --test=sw_yuv2rgb (110/110) and the full checkasm regression (7657/7657) on Apple M1. Signed-off-by: DROOdotFOO <drew@axol.io>	2026-05-22 10:03:07 +00:00
Lynne	f17c8db820	swscale/vulkan: add a non-bitexact version of OP_LINEAR Uses matrix*vector + vector multiplication. Sponsored-by: Sovereign Tech Fund	2026-05-22 15:27:08 +09:00
Lynne	6d57426b6a	swscale/vulkan: create a constant matrix from linear op constants Sponsored-by: Sovereign Tech Fund	2026-05-22 15:27:07 +09:00
Lynne	2423a719e0	swscale/vulkan: put entire linear matrix+vector as constant data Rather than only using what we need. The driver will remove any unused constants. Sponsored-by: Sovereign Tech Fund	2026-05-22 15:27:07 +09:00
Lynne	198991372c	swscale/vulkan: move linear op handling to a separate function Sponsored-by: Sovereign Tech Fund	2026-05-22 15:27:03 +09:00
Lynne	c40ac0f03a	swscale/vulkan: add support for filtering on SWS_OP_READ Sponsored-by: Sovereign Tech Fund	2026-05-22 14:05:27 +09:00
Lynne	448e08aa80	swscale/vulkan: allocate buffers for scaling filters Simply allocates buffers to hold filter data. Sponsored-by: Sovereign Tech Fund	2026-05-22 14:05:26 +09:00
Lynne	b7ccdaa018	swscale/vulkan: make buffer descriptor generation generic Again, simple rename. Sponsored-by: Sovereign Tech Fund	2026-05-22 14:05:26 +09:00
Lynne	d0af60afa8	swscale/vulkan: make dither buffer allocation path generic Just a simple rename. Sponsored-by: Sovereign Tech Fund	2026-05-22 14:05:26 +09:00
Lynne	c8ddaa97db	swscale/vulkan: base dispatch size on output image size, rather than input Sponsored-by: Sovereign Tech Fund	2026-05-22 14:05:26 +09:00
Lynne	51d4406e07	swscale/graph: support allocating hardware intermediate frames Sponsored-by: Sovereign Tech Fund	2026-05-22 14:05:21 +09:00
Niklas Haas	d94c293e62	swscale/ops_dispatch: prevent float over-read when horizontal filtering The code made the fundamental assumption that over-read into the padding bytes is okay to do; because the most that can happen is that those pixel values end up corrupted, which doesn't affect any adjacent pixels. However, this is not true for SWS_OP_FILTER_H, because this operation fundamentally mixes together horizontal pixels. Normally, this was fine, because the filter weights for those pixels are set to 0, and 0 * x = 0. However, that is not true for floating point inputs, which can contain Infinity; and 0 * Infinity = NaN, thus corrupting the entire pixel. Solve it by specifically preventing over-read when it would be unsafe. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-20 21:45:28 +00:00
Niklas Haas	6bc0f9517c	swscale/ops_dispatch: rename filter_size to filter_size_h Since this is not set for vertical filters. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-20 21:45:28 +00:00
Lynne	489a3834d2	swscale/vulkan: implement SWS_OP_PACK/SWS_OP_UNPACK The issue is that while Vulkan already does the decomposition for us, swscale assumes that the pixels will be in bitstream order, rather than in their decomposed form. This is valid for all packed formats for which these instructions are issued (XV30 and X2RGB10). This allows us to support the formats in Vulkan. Sponsored-by: Sovereign Tech Fund	2026-05-19 03:22:29 +09:00
Niklas Haas	0c1a1ee12e	swscale/ops_optimizer: don't push scale past truncating conversions In an op list like: [ u8 +XXX] SWS_OP_READ : 1 elem(s) planar >> 3 [ u8 .XXX] SWS_OP_FILTER_V : 256 -> 320 bilinear (2 taps) [f32 .XXX] SWS_OP_SCALE : * 65535 [f32 +XXX] SWS_OP_CONVERT : f32 -> u16 [u16 zXXX] SWS_OP_SWAP_BYTES [u16 zzzX] SWS_OP_SWIZZLE : 0003 [u16 zzz+] SWS_OP_CLEAR : {_ _ _ 65535} [u16 XXXX] SWS_OP_WRITE : 4 elem(s) packed >> 0 The current version of the code would happily push the SWS_OP_SCALE past the truncating conversion, leading to degenerate loss of information. (In this case, the result was quite extreme) Affects quality across a wide range of formats, e.g.: rgb24 16x16 -> rgb48be 16x32: [ u8 +++X] SWS_OP_READ : 3 elem(s) packed >> 0 min: {0 0 0 _}, max: {255 255 255 _} [ u8 ...X] SWS_OP_FILTER_V : 16 -> 32 bilinear (2 taps) min: {0 0 0 _}, max: {255 255 255 _} + [f32 ...X] SWS_OP_SCALE : * 257 + min: {0 0 0 _}, max: {65535 65535 65535 _} [f32 +++X] SWS_OP_CONVERT : f32 -> u16 - min: {0 0 0 _}, max: {255 255 255 _} - [u16 +++X] SWS_OP_SCALE : * 257 min: {0 0 0 _}, max: {65535 65535 65535 _} [u16 zzzX] SWS_OP_SWAP_BYTES min: {0 0 0 _}, max: {65535 65535 65535 _} [u16 XXXX] SWS_OP_WRITE : 3 elem(s) packed >> 0 (X = unused, z = byteswapped, + = exact, 0 = zero) Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-17 10:41:34 +00:00
Niklas Haas	812b5654ae	swscale/tests/sws_ops: use SWS_SCALE_BILINEAR for printing ops lists This actually changes the behavior vs SWS_SCALE_POINT, because point scaling is bit-exact and thus implies a different set of optimizations. Ideally, we would still try and somehow merge this with tests/swscale.c to allow testing a different set of scalers; but I still don't have a good idea for how to accomplish that here. As it stands, results in additional extra dithering steps in almost all filters involving scaling, e.g.: rgb24 16x16 -> rgb24 16x32: [ u8 +++X] SWS_OP_READ : 3 elem(s) packed >> 0 min: {0 0 0 _}, max: {255 255 255 _} - [ u8 +++X] SWS_OP_FILTER_V : 16 -> 32 point (1 taps) + [ u8 ...X] SWS_OP_FILTER_V : 16 -> 32 bilinear (2 taps) min: {0 0 0 _}, max: {255 255 255 _} + [f32 ...X] SWS_OP_DITHER : 16x16 matrix + {0 3 2 -1} + min: {1/512 1/512 1/512 _}, max: {255.998047 255.998047 255.998047 _} + [f32 ...X] SWS_OP_MIN : x <= {255 255 255 _} + min: {1/512 1/512 1/512 _}, max: {255 255 255 _} [f32 +++X] SWS_OP_CONVERT : f32 -> u8 min: {0 0 0 _}, max: {255 255 255 _} [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) packed >> 0 (X = unused, z = byteswapped, + = exact, 0 = zero) Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-17 10:41:34 +00:00
Niklas Haas	2dfe055ddd	swscale/tests/sws_ops: print split sub-passes for lists with filters This allows us to inspect exactly the logic that is going on inside the CPU backends (which don't support bare filter passes). rgb24 16x16 -> rgb24 16x32: [ u8 +++X] SWS_OP_READ : 3 elem(s) packed >> 0 min: {0 0 0 _}, max: {255 255 255 _} [ u8 +++X] SWS_OP_FILTER_V : 16 -> 32 point (1 taps) min: {0 0 0 _}, max: {255 255 255 _} [f32 +++X] SWS_OP_CONVERT : f32 -> u8 min: {0 0 0 _}, max: {255 255 255 _} [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) packed >> 0 (X = unused, z = byteswapped, + = exact, 0 = zero) + Retrying with split passes: + [ u8 +++X] SWS_OP_READ : 3 elem(s) packed >> 0 + min: {0 0 0 _}, max: {255 255 255 _} + [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) planar >> 0 + (X = unused, z = byteswapped, + = exact, 0 = zero) + Sub-pass #1: + [ u8 +++X] SWS_OP_READ : 3 elem(s) planar >> 0 + 1 tap point filter (V) + min: {0 0 0 _}, max: {255 255 255 _} + [f32 +++X] SWS_OP_CONVERT : f32 -> u8 + min: {0 0 0 _}, max: {255 255 255 _} + [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) packed >> 0 + (X = unused, z = byteswapped, + = exact, 0 = zero) rgb24 16x16 -> rgb24 32x16: [ u8 +++X] SWS_OP_READ : 3 elem(s) packed >> 0 min: {0 0 0 _}, max: {255 255 255 _} [ u8 +++X] SWS_OP_FILTER_H : 16 -> 32 point (1 taps) min: {0 0 0 _}, max: {255 255 255 _} [f32 +++X] SWS_OP_CONVERT : f32 -> u8 min: {0 0 0 _}, max: {255 255 255 _} [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) packed >> 0 (X = unused, z = byteswapped, + = exact, 0 = zero) + Retrying with split passes: + [ u8 +++X] SWS_OP_READ : 3 elem(s) packed >> 0 + min: {0 0 0 _}, max: {255 255 255 _} + [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) planar >> 0 + (X = unused, z = byteswapped, + = exact, 0 = zero) + Sub-pass #1: + [ u8 +++X] SWS_OP_READ : 3 elem(s) planar >> 0 + 1 tap point filter (H) + min: {0 0 0 _}, max: {255 255 255 _} + [f32 +++X] SWS_OP_CONVERT : f32 -> u8 + min: {0 0 0 _}, max: {255 255 255 _} + [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) packed >> 0 + (X = unused, z = byteswapped, + = exact, 0 = zero) Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-17 10:41:34 +00:00
Niklas Haas	369a301669	swscale/tests/sws_ops: use a dummy ops backend for printing This ensures that the ops printing path goes through the same code as the actual ops dispatch backend, including all sub-passes etc. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-17 10:41:34 +00:00
Niklas Haas	76dc83d9be	swscale/ops_dispatch: make ff_sws_ops_compile() output optional Allows the uops macro generation code to not actually compile any passes. More generally, this could be used to e.g. test if an op list is supported by a backend without actually creating the passes. The `bool first` change is needed because the `input == prev` check no longer works if we don't actually compiled any passes. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	420b1bf368	swscale/ops_dispatch: allow forcing specific ops backend This will be used eventually when I rewrite checkasm/sw_ops to re-use the code in ops_dispatch.c instead of hand-rolling the execution layer. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	9021448857	swscale/ops_dispatch: merge ff_sws_ops_compile_backend() and compile() Passing backend == NULL now loops over the backends as before. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	ad17144ce6	swscale/ops_dispatch: move op list print to ff_sws_ops_compile_backend() Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	90669ab52e	swscale/ops: move ff_sws_compile_pass() and friends to ops_dispatch.h This function actually lives in ops_dispatch.c, and doesn't really make sense in ops.h anymore. We should also move some stuff out of ops_internal.h, which doesn't depend on any external ops stuff, here. This allows the backend/compilation-related stuff to co-exist more nicely. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	1d841635a4	swscale/ops: also include scaling ops in ff_sws_enum_op_lists() Using the configured scaler from the SwsContext implicitly. This does affect the output of libswscale/tests/sws_ops.c, which now prints about 4x as much data (taking roughly 4x as long, but still within a second on my machine). We can make this process a lot faster by forcing SWS_SCALE_POINT as the scaler, which skips calculating any actual filter weights in favor of generating a trivial 1-tap filter. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	eec9f712f5	swscale/ops: re-use ff_sws_op_list_generate() in ff_sws_enum_op_lists() The only difference here is an extra ff_sws_add_filters() call, which is a no-op because src w/h = dst w/h = 16. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	cac183f46f	swscale/ops: don't silently suppress non-ENOTSUP errors Matches the behavior to the comment. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	dacbf080f3	swscale/ops_chain: simplify ff_sws_op_compile_tables() signature This no longer accesses prev/next as a result of the `unused` removal, so the signature can be simplified to just take the op directly. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	064600585e	swscale/ops_chain: remove flexible from SWS_OP_MIN/MAX entries We have other op types that skip checking the data even in non-flexible mode, so there is a precedent for just leaving away `flexible` for such kernels. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	98c1dbafbe	swscale/ops_memcpy: don't depend on ops_backend.h This is private to the C template based backend. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	62aad4513c	swscale/graph: move format conversion logic to formats.c Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	0611abc1bb	swscale/graph: move code for adding filters to format.h Mirroring the precedent established by the other SwsOp-generating functions. This allows us to re-use it for the uops macro generator. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	9fe0ff3d56	swscale/graph: make _reinit() only call _init(), not _create() This allows us to preserve the same memory allocation when reinitializing a graph, which is a nice bonus. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	56305c460c	swscale/graph: add ff_sws_graph_alloc() and _init() As an alternative to the current _create() API. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	5e0dddef80	swscale/graph: move graph uninit logic to helper function Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Ramiro Polla	d0a84c660a	swscale/unscaled: fix rgbToRgbWrapper for non-native-endian formats The fix from `5fa2a65c11` introduced a regression for non-native-endian formats (such as rgb565be on a little-endian system). Reproducible with: $ ./libswscale/tests/swscale -unscaled 1 -src rgb565be -dst rgb24 Also: $ ./ffmpeg_g -i /opt/samples/jpegls/128.jls -vf "scale=size=512x512,format=rgb24,scale=flags=neighbor,format=rgb565be" -f rawvideo -vframes 1 -y rgb565be.raw $ magick -size 512x512 -endian MSB RGB565:rgb565be.raw output.png $ ./ffplay_g output.png (note: don't use ffmpeg to convert from rgb565be.raw to output for the test above since it will perform the same bug and cancel out the error)	2026-05-15 14:21:50 +00:00
Ramiro Polla	d812c8b0eb	swscale/tests/swscale: log test parameters on loss error When running with "-v 0", the test parameters were not being printed, which made it hard to track down which conversion the error referred to. Now the test parameters are logged with av_log() when a loss error happens.	2026-05-15 14:12:48 +00:00
Ramiro Polla	1cc9b15bab	swscale/tests/swscale: fix -p option when -flags and/or -unscaled are used The -p, -flags, and -unscaled options all affected the decision to select a subsample of the tests to run. When specifying -p 0.1, about 57% of the tests would run instead of the expect 10%. This commit fixes this by separating -p from -flags and -unscaled.	2026-05-15 14:12:48 +00:00
Ramiro Polla	24d432e227	swscale/tests/swscale: improve help text for -p option	2026-05-15 14:12:48 +00:00
Niklas Haas	c1ff2c24b5	swscale/filters: hard-code radius for trivial kernels box() and triangle() have well-defined, trivially verifiable numerical inverses. We could actually pre-compute and hard-code the numerical inverse of all non-parametric kernels, but I'm a bit reluctant to do this as I have plans to adjust the value of SWS_MAX_REDUCE_CUTOFF based on the desired bit depth of the output, which makes a hard-coding approach unfeasible. (It would also be a brittle solution that may break whenever we extend the scaler configuration API, as well as making it harder to add new filters) Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-11 19:59:39 +02:00
Andreas Rheinhardt	2d0d937ed2	swscale/ops_chain: Use av_fallthrough to mark fallthrough Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-05-03 18:22:05 +02:00
Andreas Rheinhardt	a867648555	swscale/x86/swscale: Fix shadowing Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-05-03 18:22:03 +02:00
Andreas Rheinhardt	e241a45548	swscale/x86/swscale: Add av_fallthrough Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-05-03 18:21:45 +02:00
Michael Niedermayer	43a0715e30	swscale/swscale_unscaled: adjust last line copy Fixes: out of array access Fixes: DFVULN-694 Reporter: Zhenpeng (Leo) Lin at depthfirst Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-05-03 14:52:32 +00:00
Michael Niedermayer	7d0837a742	swscale/swscale: Check srcSliceY and srcSliceH Obviously noone should pass negative values, they make no sense, but better to explicitly check Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-05-03 14:52:32 +00:00
Martin Storsjö	9653588441	libswscale/arm: Switch consistent indentation to common style Some of these files aligned instructions to 4/24 columns, while we commonly indent arm/aarch64 assembly to 8/24 columns. Some of these files also used a different alignment for the operands.	2026-04-29 13:49:27 +03:00
Martin Storsjö	946e80fde7	libswscale/arm: Lowercase the "LSL" keyword	2026-04-29 13:49:27 +03:00
Marvin Scholz	e24882912f	swscale/yuv2rgb: add fall-through annotations	2026-04-28 12:29:37 +00:00
Marvin Scholz	3e48505dda	swscale: add fall-through annotations	2026-04-28 12:29:37 +00:00
Marvin Scholz	752cf875d8	swscale: replace fall-through comments	2026-04-28 12:29:37 +00:00

1 2 3 4 5 ...

3324 commits