mirror of
https://git.ffmpeg.org/ffmpeg.git
synced 2025-12-08 06:09:50 +00:00
The scalar loop is replaced with masked AVX512 instructions. For extracting the Y from UYVY, vperm2b is used instead of various AND and packuswb. Instead of loading the vectors with interleaved lanes as done in AVX2 version, normal load is used. At the end of packuswb, for U and V, an extra permute operation is done to get the required layout. AMD 7950x Zen 4 benchmark data: uyvytoyuv422_c: 29105.0 ( 1.00x) uyvytoyuv422_sse2: 3888.0 ( 7.49x) uyvytoyuv422_avx: 3374.2 ( 8.63x) uyvytoyuv422_avx2: 2649.8 (10.98x) uyvytoyuv422_avx512icl: 1615.0 (18.02x) Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com> |
||
|---|---|---|
| .. | ||
| hscale_fast_bilinear_simd.c | ||
| input.asm | ||
| Makefile | ||
| output.asm | ||
| range_convert.asm | ||
| rgb2rgb.c | ||
| rgb_2_rgb.asm | ||
| scale.asm | ||
| scale_avx2.asm | ||
| swscale.c | ||
| swscale_template.c | ||
| w64xmmtest.c | ||
| yuv2rgb.c | ||
| yuv2yuvX.asm | ||
| yuv_2_rgb.asm | ||