ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-04-21 09:50:25 +00:00

History

Krzysztof Pyrkosz c85a748979 swscale/aarch64/rgb2rgb: Implemented NEON shuf routines The key idea is to pass the pre-generated tables to the TBL instruction and churn through the data 16 bytes at a time. The remaining 4 elements are handled with a specialized block located at the end of the routine. The 3210 variant can be implemented using rev32, but surprisingly it is slower than the generic TBL on A78, but much faster on A72. There may be some room for improvement. Possibly instead of handling last 8 and then 4 bytes separately, we can load these 4 into {v0.s}[2] and process along with the last 8 bytes. Speeds measured with checkasm --test=sw_rgb --bench --runs=10 \| grep shuf - A78 shuffle_bytes_0321_c: 75.5 ( 1.00x) shuffle_bytes_0321_neon: 26.5 ( 2.85x) shuffle_bytes_1203_c: 136.2 ( 1.00x) shuffle_bytes_1203_neon: 27.2 ( 5.00x) shuffle_bytes_1230_c: 135.5 ( 1.00x) shuffle_bytes_1230_neon: 28.0 ( 4.84x) shuffle_bytes_2013_c: 138.8 ( 1.00x) shuffle_bytes_2013_neon: 22.0 ( 6.31x) shuffle_bytes_2103_c: 76.5 ( 1.00x) shuffle_bytes_2103_neon: 20.5 ( 3.73x) shuffle_bytes_2130_c: 137.5 ( 1.00x) shuffle_bytes_2130_neon: 28.0 ( 4.91x) shuffle_bytes_3012_c: 138.2 ( 1.00x) shuffle_bytes_3012_neon: 21.5 ( 6.43x) shuffle_bytes_3102_c: 138.2 ( 1.00x) shuffle_bytes_3102_neon: 27.2 ( 5.07x) shuffle_bytes_3210_c: 138.0 ( 1.00x) shuffle_bytes_3210_neon: 22.0 ( 6.27x) shuf3210 using rev32 shuffle_bytes_3210_c: 139.0 ( 1.00x) shuffle_bytes_3210_neon: 28.5 ( 4.88x) - A72 shuffle_bytes_0321_c: 120.0 ( 1.00x) shuffle_bytes_0321_neon: 36.0 ( 3.33x) shuffle_bytes_1203_c: 188.2 ( 1.00x) shuffle_bytes_1203_neon: 37.8 ( 4.99x) shuffle_bytes_1230_c: 195.0 ( 1.00x) shuffle_bytes_1230_neon: 36.0 ( 5.42x) shuffle_bytes_2013_c: 195.8 ( 1.00x) shuffle_bytes_2013_neon: 43.5 ( 4.50x) shuffle_bytes_2103_c: 117.2 ( 1.00x) shuffle_bytes_2103_neon: 53.5 ( 2.19x) shuffle_bytes_2130_c: 203.2 ( 1.00x) shuffle_bytes_2130_neon: 37.8 ( 5.38x) shuffle_bytes_3012_c: 183.8 ( 1.00x) shuffle_bytes_3012_neon: 46.8 ( 3.93x) shuffle_bytes_3102_c: 180.8 ( 1.00x) shuffle_bytes_3102_neon: 37.8 ( 4.79x) shuffle_bytes_3210_c: 195.8 ( 1.00x) shuffle_bytes_3210_neon: 37.8 ( 5.19x) shuf3210 using rev32 shuffle_bytes_3210_c: 194.8 ( 1.00x) shuffle_bytes_3210_neon: 30.8 ( 6.33x) - x13s: shuffle_bytes_0321_c: 49.4 ( 1.00x) shuffle_bytes_0321_neon: 18.1 ( 2.72x) shuffle_bytes_1203_c: 98.4 ( 1.00x) shuffle_bytes_1203_neon: 18.4 ( 5.35x) shuffle_bytes_1230_c: 97.4 ( 1.00x) shuffle_bytes_1230_neon: 19.1 ( 5.09x) shuffle_bytes_2013_c: 101.4 ( 1.00x) shuffle_bytes_2013_neon: 16.9 ( 6.01x) shuffle_bytes_2103_c: 53.9 ( 1.00x) shuffle_bytes_2103_neon: 13.9 ( 3.88x) shuffle_bytes_2130_c: 100.9 ( 1.00x) shuffle_bytes_2130_neon: 19.1 ( 5.27x) shuffle_bytes_3012_c: 97.4 ( 1.00x) shuffle_bytes_3012_neon: 17.1 ( 5.69x) shuffle_bytes_3102_c: 100.9 ( 1.00x) shuffle_bytes_3102_neon: 19.1 ( 5.27x) shuffle_bytes_3210_c: 100.6 ( 1.00x) shuffle_bytes_3210_neon: 16.9 ( 5.96x) shuf3210 using rev32 shuffle_bytes_3210_c: 100.6 ( 1.00x) shuffle_bytes_3210_neon: 18.6 ( 5.40x) Signed-off-by: Martin Storsjö <martin@martin.st>		2025-02-07 12:54:55 +02:00
..
aarch64	swscale/aarch64/rgb2rgb: Implemented NEON shuf routines	2025-02-07 12:54:55 +02:00
arm	swscale/internal: group user-facing options together	2024-11-21 12:49:56 +01:00
loongarch	loongarch: fixes fate-checkasm-sw_rgb failure	2025-01-15 01:27:36 +01:00
ppc	swscale/ppc: disable YUV2RGB AltiVec acceleration	2024-12-02 02:51:39 +01:00
riscv	swscale/range_convert: saturate output instead of limiting input	2024-12-05 21:10:29 +01:00
tests	tests/swscale: allow nonzero positive return codes from sws_scale_frame()	2024-12-18 17:30:48 +01:00
x86	swscale/x86/rgb2rgb: add AVX512ICL versions of shuffle_bytes	2025-02-03 10:16:44 -03:00
alphablend.c	swscale/internal: group user-facing options together	2024-11-21 12:49:56 +01:00
bayer_template.c	swscale/internal: constify SwsFunc	2024-10-07 19:51:34 +02:00
cms.c	swscale/cms,graph,lut3d: Use ff_-prefix, don't export internal functions	2025-01-12 15:41:39 +01:00
cms.h	swscale/cms,graph,lut3d: Use ff_-prefix, don't export internal functions	2025-01-12 15:41:39 +01:00
csputils.c	swscale/csputils: add internal colorspace math helpers	2024-12-23 12:33:43 +01:00
csputils.h	swscale/csputils: add internal colorspace math helpers	2024-12-23 12:33:43 +01:00
gamma.c	swscale: rename SwsContext to SwsInternal	2024-10-24 22:50:00 +02:00
graph.c	swscale/cms,graph,lut3d: Use ff_-prefix, don't export internal functions	2025-01-12 15:41:39 +01:00
graph.h	swscale/cms,graph,lut3d: Use ff_-prefix, don't export internal functions	2025-01-12 15:41:39 +01:00
half2float.c	swscale/input: add rgbaf16 input support	2022-08-19 22:09:36 +02:00
hscale.c	swscale/range_convert: fix mpeg ranges in yuv range conversion for non-8-bit pixel formats	2024-12-05 21:10:29 +01:00
hscale_fast_bilinear.c	swscale: rename SwsContext to SwsInternal	2024-10-24 22:50:00 +02:00
input.c	swscale: 16bit planar float input support	2025-01-21 21:06:14 +01:00
libswscale.v	build: Change structure of the linker version script templates	2016-05-29 16:43:11 +02:00
log2_tab.c
lut3d.c	swscale/cms,graph,lut3d: Use ff_-prefix, don't export internal functions	2025-01-12 15:41:39 +01:00
lut3d.h	swscale/cms,graph,lut3d: Use ff_-prefix, don't export internal functions	2025-01-12 15:41:39 +01:00
Makefile	swscale/lut3d: add 3DLUT dispatch system	2024-12-23 12:33:43 +01:00
options.c	swscale/options: add -sws_dither none alias	2024-12-23 12:47:10 +01:00
output.c	swscale/output: Fix undefined overflow in yuv2rgba64_full_X_c_template()	2025-01-08 23:23:24 +01:00
rgb2rgb.c	swscale/swscale_unscaled: add unscaled x2rgb10le to packed RGB	2024-11-06 17:34:32 -03:00
rgb2rgb.h	swscale/swscale_unscaled: add unscaled x2rgb10le to packed RGB	2024-11-06 17:34:32 -03:00
rgb2rgb_template.c	swscale/swscale_unscaled: add unscaled conversion for AYUV/VUYA/UYVA	2024-11-02 15:01:31 -03:00
slice.c	swscale/slice: fix init of 32 bpc planes	2024-12-16 12:21:55 +01:00
swscale.c	swscale/swscale: don't reject scaling when color parameters are not supported but conversion is not required	2025-01-22 12:15:18 -03:00
swscale.h	swscale: add ICC intent enum and option	2024-12-23 12:33:43 +01:00
swscale_internal.h	swscale: use 16-bit intermediate precision for RGB/XYZ conversion	2024-12-26 20:31:36 +01:00
swscale_unscaled.c	swscale: 16bit planar float input support	2025-01-21 21:06:14 +01:00
swscaleres.rc
utils.c	swscale: 16bit planar float input support	2025-01-21 21:06:14 +01:00
utils.h	swscale/swscale: don't reject scaling when color parameters are not supported but conversion is not required	2025-01-22 12:15:18 -03:00
version.c	lib*/version: Use static_assert for static asserts	2024-03-31 00:08:42 +01:00
version.h	swscale: add ICC intent enum and option	2024-12-23 12:33:43 +01:00
version_major.h	libs: bump major version for all libraries	2024-03-07 11:29:43 -03:00
vscale.c	swscale/internal: group user-facing options together	2024-11-21 12:49:56 +01:00
yuv2rgb.c	swscale/internal: group user-facing options together	2024-11-21 12:49:56 +01:00