Commit graph

16 commits

Author SHA1 Message Date
Jun Zhao
91ae6d10ab lavfi/nlmeans: add aarch64 neon for compute_weights_line
Implement NEON optimization for compute_weights_line.

Also update the function signature to use ptrdiff_t for stack arguments
(max_meaningful_diff, startx, endx). This is done to unify the stack
layout between Apple platforms (which pack 32-bit stack arguments tightly)
and the generic AAPCS64 ABI (which requires 8-byte stack slots for 32-bit
arguments). Using ptrdiff_t ensures 8-byte slots are used on all AArch64
platforms, avoiding ABI mismatches with the assembly implementation.

The x86 AVX2 prototype is updated to match the new signature.

Performance benchmark (AArch64) in MacOS M4:
./tests/checkasm/checkasm --test=vf_nlmeans --bench
compute_weights_line_c:     151.1 ( 1.00x)
compute_weights_line_neon:  62.6 ( 2.42x)

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-01-09 16:10:10 +00:00
Andreas Rheinhardt
a35c91dc14 avfilter/vf_colordetect: Rename header to vf_colordetectdsp.h
It is more in line with our naming conventions.

Reviewed-by: Martin Storsjö <martin@martin.st>
Reviewed-by: Niklas Haas <ffmpeg@haasn.dev>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-09-16 18:22:24 +02:00
Zhao Zhili
eb14d45824 avfilter/vf_colordetect: add aarch64 asm
| rpi5 gcc 12  | m1 clang -fno-vectorize | m1 clang
---------------------------------------------------------------------------
alpha_8_full_c:        | 32159.2 ( 1.00x) | 135.8 ( 1.00x) |  26.4 ( 1.00x)
alpha_8_full_neon:     |  1266.0 (25.40x) |   8.0 (17.03x) |   8.4 ( 3.15x)
alpha_8_limited_c:     | 37561.9 ( 1.00x) | 169.1 ( 1.00x) |  47.7 ( 1.00x)
alpha_8_limited_neon:  |  3967.0 ( 9.47x) |  12.5 (13.53x) |  13.3 ( 3.59x)
alpha_16_full_c:       | 15867.9 ( 1.00x) |  64.5 ( 1.00x) |  13.7 ( 1.00x)
alpha_16_full_neon:    |  1256.9 (12.62x) |   7.9 ( 8.15x) |   8.3 ( 1.64x)
alpha_16_limited_c:    | 16723.7 ( 1.00x) |  88.7 ( 1.00x) | 103.3 ( 1.00x)
alpha_16_limited_neon: |  4031.3 ( 4.15x) |  12.5 ( 7.08x) |  13.2 ( 7.86x)
range_8_c:             | 21819.7 ( 1.00x) | 120.0 ( 1.00x) |   9.4 ( 1.00x)
range_8_neon:          |  1148.3 (19.00x) |   4.3 (27.60x) |   4.8 ( 1.97x)
range_16_c:            | 10757.1 ( 1.00x) |  45.7 ( 1.00x) |   7.9 ( 1.00x)
range_16_neon:         |  1141.5 ( 9.42x) |   4.4 (10.38x) |   4.6 ( 1.72x)
2025-09-01 15:35:16 +00:00
Timo Rothenpieler
262d41c804 all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
Timo Rothenpieler
8d439b2483 all: fix whitespace/new-line issues 2025-08-03 13:48:47 +02:00
Martin Storsjö
93cda5a9c2 aarch64: Lowercase UXTW/SXTW and similar flags
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-21 23:25:23 +03:00
Martin Storsjö
184103b310 aarch64: Consistently use lowercase for vector element specifiers
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-21 23:25:18 +03:00
Andreas Rheinhardt
fa06f48371 avfilter/bwdifdsp: Constify
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-28 00:17:47 +02:00
Andreas Rheinhardt
80afcc8539 avfilter/bwdif: Add proper BWDIFDSPContext
This already avoids unnecessary indirectly included headers
in the arch-specific vf_bwdif_init.c files; it is also in
preparation for splitting the actual functions out of vf_bwdif.c.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-28 00:17:47 +02:00
John Cox
f00222e81f avfilter/vf_bwdif: Add neon for filter_line3
Signed-off-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-06 00:21:05 +03:00
John Cox
94cb94a2c0 avfilter/vf_bwdif: Add neon for filter_line
Exports C filter_line needed for tail fixup of neon code
Adds neon for filter_line

Signed-off-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-06 00:21:05 +03:00
John Cox
8130df83e0 avfilter/vf_bwdif: Add neon for filter_edge
Adds clip and spatial macros for aarch64 neon
Exports C filter_edge needed for tail fixup of neon code
Adds neon for filter_edge

Signed-off-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-06 00:21:05 +03:00
John Cox
5075cfb4e6 avfilter/vf_bwdif: Add neon for filter_intra
Adds an outline for aarch neon functions
Adds common macros and consts for aarch64 neon
Exports C filter_intra needed for tail fixup of neon code
Adds neon for filter_intra

Signed-off-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-06 00:21:05 +03:00
Andreas Rheinhardt
f3c197b129 Include attributes.h directly
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-04-19 14:34:10 +02:00
Jan Ekström
eb94ec3257 lavfi/nlmeans: fix aarch64 assembly with clang
Clang is more strict about some things.
2018-07-28 17:41:19 +03:00
Clément Bœsch
5a71bce371 lavfi/nlmeans: add AArch64 SIMD for compute_safe_ssd_integral_image
ssd_integral_image_c: 49204.6
ssd_integral_image_neon: 28346.8
2018-05-08 10:28:06 +02:00