mirror of
https://git.ffmpeg.org/ffmpeg.git
synced 2026-02-15 06:40:26 +00:00
Currently the AVX2 version of uyvytoyuv422 in the SIMD loop does the following: 4 vinsertq to have interleaving of the vector lanes during load from memory. 4 vperm2i128 inside 4 RSHIFT_COPY calls to achieve the desired layout. This patch replaces the above 8 instructions with 2 vpermq and 2 vpermd with a vector register similar to AVX512ICL version. Observed the following numbers on various microarchitectures: On AMD Zen3 laptop: Before: uyvytoyuv422_c: 51979.7 ( 1.00x) uyvytoyuv422_sse2: 5410.5 ( 9.61x) uyvytoyuv422_avx: 4642.7 (11.20x) uyvytoyuv422_avx2: 4249.0 (12.23x) After: uyvytoyuv422_c: 51659.8 ( 1.00x) uyvytoyuv422_sse2: 5420.8 ( 9.53x) uyvytoyuv422_avx: 4651.2 (11.11x) uyvytoyuv422_avx2: 3953.8 (13.07x) On Intel Macbook Pro 2019: Before: uyvytoyuv422_c: 185014.4 ( 1.00x) uyvytoyuv422_sse2: 22800.4 ( 8.11x) uyvytoyuv422_avx: 19796.9 ( 9.35x) uyvytoyuv422_avx2: 13141.9 (14.08x) After: uyvytoyuv422_c: 185093.4 ( 1.00x) uyvytoyuv422_sse2: 22795.4 ( 8.12x) uyvytoyuv422_avx: 19791.9 ( 9.35x) uyvytoyuv422_avx2: 12043.1 (15.37x) On AMD Zen4 desktop: Before: uyvytoyuv422_c: 29105.0 ( 1.00x) uyvytoyuv422_sse2: 3888.0 ( 7.49x) uyvytoyuv422_avx: 3374.2 ( 8.63x) uyvytoyuv422_avx2: 2649.8 (10.98x) uyvytoyuv422_avx512icl: 1615.0 (18.02x) After: uyvytoyuv422_c: 29093.4 ( 1.00x) uyvytoyuv422_sse2: 3874.4 ( 7.51x) uyvytoyuv422_avx: 3371.6 ( 8.63x) uyvytoyuv422_avx2: 2174.6 (13.38x) uyvytoyuv422_avx512icl: 1625.1 (17.90x) Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com> |
||
|---|---|---|
| .. | ||
| aarch64 | ||
| arm | ||
| loongarch | ||
| ppc | ||
| riscv | ||
| tests | ||
| x86 | ||
| alphablend.c | ||
| bayer_template.c | ||
| cms.c | ||
| cms.h | ||
| csputils.c | ||
| csputils.h | ||
| format.c | ||
| format.h | ||
| gamma.c | ||
| graph.c | ||
| graph.h | ||
| half2float.c | ||
| hscale.c | ||
| hscale_fast_bilinear.c | ||
| input.c | ||
| libswscale.v | ||
| log2_tab.c | ||
| lut3d.c | ||
| lut3d.h | ||
| Makefile | ||
| options.c | ||
| output.c | ||
| rgb2rgb.c | ||
| rgb2rgb.h | ||
| rgb2rgb_template.c | ||
| slice.c | ||
| swscale.c | ||
| swscale.h | ||
| swscale_internal.h | ||
| swscale_unscaled.c | ||
| swscaleres.rc | ||
| utils.c | ||
| version.c | ||
| version.h | ||
| version_major.h | ||
| vscale.c | ||
| yuv2rgb.c | ||