WRITEBGR24MMX is unused after a05f22eaf3.
Reviewed-by: Niklas Haas <ffmpeg@haasn.dev>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: signed integer overflow: -1548257 * 2048 cannot be represented in type 'int'
Fixes: #21592
Found-by: HAORAN FANG
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -2147483648 - 65536 cannot be represented in type 'int'
Fixes: #21588
Found-by: HAORAN FANG
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: integer overflow (does not replicate, but looks like it should overflow with some craftet parameters)
Fixes: #21584
Found-by: HAORAN FANG
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
When luma init switched to cascade the chroma init was skiped
Fixes: NULL pointer dereference
Fixes: #21583
Found-by: HAORAN FANG
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The legacy API is defined by sws_init_context(), sws_scale() etc., whereas
the "modern" API is defined by just using sws_scale_frame() without prior
init call.
This int allows us to cleanly distinguish the type of context, paving the
way for some minor refactoring.
As an immediate benefit, we now gain a bunch of explict error checks to
ensure the API is used correctly (i.e. sws_scale() not called before
sws_init_context()).
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This allows distinguishing between different types of failure, e.g.
AVERROR(EINVAL) on invalid pass dimensions.
Signed-off-by: Niklas Haas <git@haasn.dev>
Makes various pieces of code that expect to get a SWS_OP_READ more robust,
and also allows us to generalize to introduce more input op types in the
future (in particular, I am looking ahead towards filter ops).
Signed-off-by: Niklas Haas <git@haasn.dev>
This code is self-contained and logically distinct from the ops-related
helpers in ops.c, so it belongs in its own file.
Purely cosmetic; no functional change.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is a bit more forward-facing than a bare allocation, and importantly,
allows the `swscale/utils.c` code to remain agnostic about how to correctly
uninit this struct.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
By excluding the Vulkan makefile entirely when --disable-unstable is passed.
This also correctly avoids compiling e.g. unused GLSL compilers.
Fixes: #22295
See-Also: #22366
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Add NEON alpha drop/insert using ldp+tbl+stp instead of ld4/st3 and
ld3/st4 structure operations. Both use a 2-register sliding-window
tbl with post-indexed addressing. Instruction scheduling targets
narrow in-order cores (A55) while remaining neutral on wide OoO.
Scalar tails use coalesced loads/stores (ldr+strh+lsr+strb for alpha
drop, ldrh+ldrb+orr+str for alpha insert) to reduce per-pixel
instruction count. Independent instructions placed between loads and
dependent operations to fill load-use latency on in-order cores.
checkasm --bench on Apple M3 Max (decicycles, 1920px):
rgb32tobgr24_c: 114.4 ( 1.00x)
rgb32tobgr24_neon: 64.3 ( 1.78x)
rgb24tobgr32_c: 128.9 ( 1.00x)
rgb24tobgr32_neon: 80.9 ( 1.59x)
C baseline is clang auto-vectorized; speedup is over compiler NEON.
Signed-off-by: David Christle <dev@christle.is>
Add a NEON rgb24tobgr24 using ld3/st3 to swap R and B channels in
packed 24bpp RGB buffers. Handles all input sizes with a 16-pixel
NEON fast path, 8-pixel NEON cleanup, and scalar tail.
checkasm --bench on Apple M3 Max (1920*3 = 5760 bytes):
rgb24tobgr24_c: 722.0 ( 1.00x)
rgb24tobgr24_neon: 94.9 ( 7.61x)
Signed-off-by: David Christle <dev@christle.is>
The code was evidently designed at one point in time to support "direct"
execution (not via a thread pool) for num_threads == 1, but this was never
implemented.
As a side benefit, reduces context creation overhead in single threaded
mode (relevant e.g. inside the libswscale self test), due to not needing to
spawn and destroy several thousand worker threads.
Co-authored-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Niklas Haas <git@haasn.dev>
Add ARM64 NEON-accelerated unscaled YUV-to-RGB conversion for planar
YUV input formats. This extends the existing NV12/NV21 NEON paths with
YUV420P, YUV422P, and YUVA420P support for all packed RGB output
formats (ARGB, RGBA, ABGR, BGRA, RGB24, BGR24) and planar GBRP.
Register with ff_yuv2rgb_init_aarch64() to also cover the scaled path.
checkasm: all 42 sw_yuv2rgb tests pass.
Speedup vs C at 1920px width (Apple M3 Max, avg of 20 runs):
yuv420p->rgb24: 4.3x yuv420p->argb: 3.1x
yuv422p->rgb24: 5.5x yuv422p->argb: 4.1x
yuva420p->argb: 3.5x yuva420p->rgba: 3.5x
Signed-off-by: David Christle <dev@christle.is>
The res variable (pixel residual count for widths not divisible by 16)
is computed once before the row loop, but DEALYUV2RGBLINERES and
DEALYUV2RGBLINERES32 destructively subtract 8 from it inside the loop
body. When srcSliceH > 2, subsequent row pairs get an incorrect
residual count, producing wrong output for the tail pixels.
Fix by recomputing res from the constant c->opts.dst_w at the top of
each row-pair iteration.
Signed-off-by: David Christle <dev@christle.is>
yuy2toyv12, vu9_to_vu12 and yvu9_to_yuy2 are unused. Removing
them saved 7808B of .text and 102B of .text.unlikely
as well as 24B of .bss here.
Thanks to James Almer for pointing out that yuy2toyv12 is unused.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
AVFrame just really doesn't have the semantics we want. However, there a
tangible benefit to having SwsFrame act as a carbon copy of a (subset of)
AVFrame.
This partially reverts commit 67f3627267.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Up until now, several altivec-specific fields are directly
put into SwsInternal #if HAVE_ALTIVEC is true. These fields
are of altivec-specific vector types and therefore
require altivec specific headers to be included.
Unfortunately, said altivec specific headers redefine
bool in a manner that is incompatible with stdbool.
swscale/ops.h uses bool and this led graph.c and ops.c
to disagree about the layout of structures from ops.h,
leading to heap corruption [1], [2] in the sws-unscaled
FATE test.
Fix this by moving the altivec-specific parts out of SwsInternal
and into a structure that extends SwsInternal and is allocated
jointly with it. Said structure is local to yuv2rgb_altivec.c,
because this is the only file accessing said fields. Should
more files need them, an altivec-specific swscale header would
need to be added.
Thanks to jfiusdq <jfiusdq@proton.me> for analyzing the issue.
[1]: https://fate.ffmpeg.org/report.cgi?slot=ppc64-linux-gcc-14.3-asan&time=20260224065643
[2]: https://fate.ffmpeg.org/report.cgi?slot=ppc64-linux-gcc-14.3&time=20260224051615
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is in preparation for removing the util_altivec.h inclusion
in swscale_internal.h, which causes problems (on PPC) because
it redefines bool to something different from stdbool.h.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The pixel format for the process loops have already been checked at
this point to be valid.
The switch added in e4abfb8e51 returns AVERROR(EINVAL) in the default
case without calling ff_sws_op_chain_free(chain), but there's no need
to free it since we mark this branch as unreachable.
This has now become fully redundant with AVFrame, especially since the
existence of SwsPassBuffer. Delete it, simplifying a lot of things and
avoiding reinventing the wheel everywhere.
Also generally reduces overhead, since there is less redundant copying
going on.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
And have ff_sws_graph_run() just take a bare AVFrame. This will help with
an upcoming change, aside from being a bit friendlier towards API users
in general.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This commit replaces the AVBufferRef inside SwsPassBuffer by an AVFrame, in
anticipation of the SwsImg removal.
Incidentally, we could also now just use av_frame_get_buffer() here, but
at the cost of breaking the 1:1 relationship between planes and buffers,
which is required for per-plane refcopies.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This function was originally written to support the use case of e.g.
partially allocated planes that implicitly reference the original input
image, but I've decided that this is stupid and doesn't currently work
anyways.
Plus, I have plans to kill SwsImg, so we need to simplify this mess.
Signed-off-by: Niklas Haas <git@haasn.dev>
The current logic didn't take into account the possible plane shift. Just
re-compute the correctly shifted pointers using the row position.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead, precompute the correctly swizzled data and stride in setup()
and just reference the SwsOpExec fields directly.
To avoid the stack copies in handle_tail() we can introduce a temporary
array to hold just the pointers.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
src/libswscale/x86/ops.c(534): error C2099: initializer is not a constant
src/libswscale/x86/ops.c(535): error C2099: initializer is not a constant
src/libswscale/x86/ops.c(536): error C2099: initializer is not a constant
src/libswscale/x86/ops.c(537): error C2099: initializer is not a constant
src/libswscale/x86/ops.c(539): error C2099: initializer is not a constant
src/libswscale/x86/ops.c(540): error C2099: initializer is not a constant
Fixes: ec959e20c5
Signed-off-by: Niklas Haas <git@haasn.dev>
This would otherwise generate illegal ops and undefined behavior, for
pixel formats without any supported corresponding pixel type.
This didn't cause any issues previously because the following
`ff_sws_encode_pixfmt` would normally fail for such formats, and the UB
was ignored in practice.
Signed-off-by: Niklas Haas <git@haasn.dev>