ffmpeg/libswscale
Arpad Panyik 1f30ff30fb swscale: Add AArch64 Neon path for xyz12Torgb48 LE
Add optimized Neon code path for the little endian case of the
xyz12Torgb48 function. The innermost loop processes the data in 4x2
pixel blocks using software gathers with the matrix multiplication
and clipping done by Neon.

Relative runtime of micro benchmarks after this patch on some
Cortex and Neoverse CPU cores:

 xyz12le_rgb48le    X1      X3      X4    X925      V2
 16x4_neon:       2.55x   4.34x   3.84x   3.31x   3.22x
 32x4_neon:       2.39x   3.63x   3.22x   3.35x   3.29x
 64x4_neon:       2.37x   3.31x   2.91x   3.33x   3.27x
 128x4_neon:      2.34x   3.28x   2.91x   3.35x   3.24x
 256x4_neon:      2.30x   3.17x   2.91x   3.32x   3.10x
 512x4_neon:      2.26x   3.10x   2.91x   3.30x   3.07x
 1024x4_neon:     2.26x   3.07x   2.96x   3.30x   3.05x
 1920x4_neon:     2.26x   3.06x   2.93x   3.28x   3.04x

 xyz12le_rgb48le   A76     A78    A715    A720    A725
 16x4_neon:       2.33x   2.28x   2.53x   3.33x   3.19x
 32x4_neon:       2.35x   2.18x   2.45x   3.23x   3.24x
 64x4_neon:       2.35x   2.16x   2.42x   3.15x   3.21x
 128x4_neon:      2.35x   2.13x   2.39x   3.00x   3.09x
 256x4_neon:      2.36x   2.12x   2.35x   2.85x   2.99x
 512x4_neon:      2.35x   2.14x   2.35x   2.78x   2.95x
 1024x4_neon:     2.31x   2.09x   2.33x   2.80x   2.91x
 1920x4_neon:     2.30x   2.07x   2.32x   2.81x   2.94x

 xyz12le_rgb48le   A55    A510    A520
 16x4_neon:       2.09x   1.92x   2.36x
 32x4_neon:       2.05x   1.89x   2.38x
 64x4_neon:       2.02x   1.77x   2.35x
 128x4_neon:      1.96x   1.74x   2.25x
 256x4_neon:      1.90x   1.72x   2.19x
 512x4_neon:      1.83x   1.75x   2.16x
 1024x4_neon:     1.83x   1.62x   2.15x
 1920x4_neon:     1.82x   1.60x   2.15x

Signed-off-by: Arpad Panyik <Arpad.Panyik@arm.com>
2025-12-05 10:28:18 +00:00
..
aarch64 swscale: Add AArch64 Neon path for xyz12Torgb48 LE 2025-12-05 10:28:18 +00:00
arm all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
loongarch swscale: Fix out-of-bounds write errors in yuv2rgb_lasx.c file. 2025-11-28 03:40:47 +00:00
ppc swscale/ppc/swscale_ppc_template: Fix av_unused placement 2025-09-26 22:38:13 +02:00
riscv swscale/range_convert: saturate output instead of limiting input 2024-12-05 21:10:29 +01:00
tests swscale/tests/swscale: Fix typo 2025-12-05 10:42:01 +01:00
x86 {lib{avcodec,swscale}/x86/,}Makefile: Kill MMX-OBJS 2025-11-30 22:20:13 +01:00
alphablend.c swscale/alphablend: don't overread alpha plane on subsampled odd size 2025-07-31 11:32:20 +00:00
bayer_template.c swscale/internal: constify SwsFunc 2024-10-07 19:51:34 +02:00
cms.c all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
cms.h swscale/utils: split off format code into new file 2025-03-14 19:50:44 +01:00
csputils.c swscale/csputils: Remove unused ff_sws_matrix3x3_rmul() 2025-04-03 06:04:57 +02:00
csputils.h swscale/csputils: Remove unused ff_sws_matrix3x3_rmul() 2025-04-03 06:04:57 +02:00
format.c swscale: allow extended primaries 2025-11-10 21:50:58 +00:00
format.h swscale/format: add new format decode/encode logic 2025-09-01 19:28:36 +02:00
gamma.c swscale: rename SwsContext to SwsInternal 2024-10-24 22:50:00 +02:00
graph.c swscale: Refactor XYZ+RGB state and add function hooks 2025-12-05 10:28:18 +00:00
graph.h swscale/graph: pass per-pass image pointers to setup() 2025-09-01 19:27:53 +02:00
half2float.c swscale/input: add rgbaf16 input support 2022-08-19 22:09:36 +02:00
hscale.c swscale/range_convert: fix mpeg ranges in yuv range conversion for non-8-bit pixel formats 2024-12-05 21:10:29 +01:00
hscale_fast_bilinear.c swscale: rename SwsContext to SwsInternal 2024-10-24 22:50:00 +02:00
input.c Revert "swscale: add support for 10/12-bit grayscale MSB pixfmts" 2025-11-06 21:46:41 +01:00
libswscale.v build: Change structure of the linker version script templates 2016-05-29 16:43:11 +02:00
log2_tab.c lsws: duplicate ff_log2_tab 2014-08-12 20:52:21 +02:00
lut3d.c swscale/lut3d: remove unused function 2025-07-22 19:56:34 +02:00
lut3d.h swscale/utils: split off format code into new file 2025-03-14 19:50:44 +01:00
Makefile configure: allow disabling experimental swscale code 2025-09-01 19:28:36 +02:00
ops.c swscale: Remove the unused ff_sws_pixel_type_to_uint 2025-11-21 21:07:34 +00:00
ops.h swscale: Remove the unused ff_sws_pixel_type_to_uint 2025-11-21 21:07:34 +00:00
ops_backend.c swscale/ops_chain: add type removed ff_sws_op_chain_free_cb 2025-09-13 18:14:02 +02:00
ops_backend.h swscale/ops_tmpl_int: remove unused arguments from wrap read decl 2025-09-13 19:12:44 +02:00
ops_chain.c swscale/ops_chain: add type removed ff_sws_op_chain_free_cb 2025-09-13 18:14:02 +02:00
ops_chain.h swscale/ops_chain: add type removed ff_sws_op_chain_free_cb 2025-09-13 18:14:02 +02:00
ops_internal.h swscale/optimizer: add packed shuffle solver 2025-09-01 19:28:36 +02:00
ops_memcpy.c swscale/ops_memcpy: add 'memcpy' backend for plane->plane copies 2025-09-01 19:28:36 +02:00
ops_optimizer.c all: Use "" instead of <> to include internal headers 2025-09-04 22:20:58 +02:00
ops_tmpl_common.c swscale/ops_backend: add reference backend basend on C templates 2025-09-01 19:28:36 +02:00
ops_tmpl_float.c swscale/ops_backend: add reference backend basend on C templates 2025-09-01 19:28:36 +02:00
ops_tmpl_int.c swscale/ops_tmpl_int: fix signed integer related UB when shifting values 2025-11-21 18:40:58 +00:00
options.c swscale: add SWS_UNSTABLE flag 2025-09-01 19:28:35 +02:00
output.c swscale/output: Fix unsigned cast position in yuv2* 2025-10-14 20:55:54 +02:00
rgb2rgb.c swscale/swscale_unscaled: add unscaled x2rgb10le to packed RGB 2024-11-06 17:34:32 -03:00
rgb2rgb.h swscale/swscale_unscaled: add unscaled x2rgb10le to packed RGB 2024-11-06 17:34:32 -03:00
rgb2rgb_template.c swscale/swscale_unscaled: add unscaled conversion for AYUV/VUYA/UYVA 2024-11-02 15:01:31 -03:00
slice.c swscale/slice: fix init of 32 bpc planes 2024-12-16 12:21:55 +01:00
swscale.c swscale: Add AArch64 Neon path for xyz12Torgb48 LE 2025-12-05 10:28:18 +00:00
swscale.h swscale: add SWS_UNSTABLE flag 2025-09-01 19:28:35 +02:00
swscale_internal.h swscale: Add AArch64 Neon path for xyz12Torgb48 LE 2025-12-05 10:28:18 +00:00
swscale_unscaled.c swscale: Refactor XYZ+RGB state and add function hooks 2025-12-05 10:28:18 +00:00
swscaleres.rc Add Windows resource file support for shared libraries 2013-12-05 23:42:07 +01:00
utils.c swscale: Refactor XYZ+RGB state and add function hooks 2025-12-05 10:28:18 +00:00
version.c lib*/version: Use static_assert for static asserts 2024-03-31 00:08:42 +01:00
version.h swscale: add SWS_UNSTABLE flag 2025-09-01 19:28:35 +02:00
version_major.h libs: bump major version for all libraries 2025-03-28 14:44:34 -03:00
vscale.c swscale/internal: group user-facing options together 2024-11-21 12:49:56 +01:00
yuv2rgb.c ALL: move av_unused to conform with standard requirement 2025-09-26 16:15:46 +00:00