Commit graph

54 commits

Author SHA1 Message Date
Andreas Rheinhardt
35fcdb2132 swscale/x86/rgb2rgb: Deduplicate ASM constants
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-04-13 22:49:21 +02:00
Shreesh Adiga
e18f87ed9f swscale/x86/rgb2rgb: add AVX512ICL version of uyvytoyuv422
The scalar loop is replaced with masked AVX512 instructions.
For extracting the Y from UYVY, vperm2b is used instead of
various AND and packuswb.

Instead of loading the vectors with interleaved lanes as done
in AVX2 version, normal load is used. At the end of packuswb,
for U and V, an extra permute operation is done to get the
required layout.

AMD 7950x Zen 4 benchmark data:
uyvytoyuv422_c:                                      29105.0 ( 1.00x)
uyvytoyuv422_sse2:                                    3888.0 ( 7.49x)
uyvytoyuv422_avx:                                     3374.2 ( 8.63x)
uyvytoyuv422_avx2:                                    2649.8 (10.98x)
uyvytoyuv422_avx512icl:                               1615.0 (18.02x)

Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2025-02-18 12:43:57 -03:00
Shreesh Adiga
59f9dbaa31 swscale/x86/rgb2rgb: add AVX512ICL versions of shuffle_bytes
On a AMD 7950x Zen 4

shuffle_bytes_0321_c:                                   56.5 ( 1.00x)
shuffle_bytes_0321_ssse3:                               15.2 ( 3.70x)
shuffle_bytes_0321_avx2:                                10.2 ( 5.51x)
shuffle_bytes_0321_avx512icl:                            9.2 ( 6.11x)
shuffle_bytes_1230_c:                                   84.5 ( 1.00x)
shuffle_bytes_1230_ssse3:                               14.2 ( 5.93x)
shuffle_bytes_1230_avx2:                                15.2 ( 5.54x)
shuffle_bytes_1230_avx512icl:                           11.2 ( 7.51x)
shuffle_bytes_2103_c:                                   48.5 ( 1.00x)
shuffle_bytes_2103_ssse3:                               21.2 ( 2.28x)
shuffle_bytes_2103_avx2:                                13.8 ( 3.53x)
shuffle_bytes_2103_avx512icl:                            9.2 ( 5.24x)
shuffle_bytes_3012_c:                                   84.5 ( 1.00x)
shuffle_bytes_3012_ssse3:                               14.2 ( 5.93x)
shuffle_bytes_3012_avx2:                                16.2 ( 5.20x)
shuffle_bytes_3012_avx512icl:                           10.2 ( 8.24x)
shuffle_bytes_3210_c:                                   89.2 ( 1.00x)
shuffle_bytes_3210_ssse3:                               24.2 ( 3.68x)
shuffle_bytes_3210_avx2:                                16.2 ( 5.49x)
shuffle_bytes_3210_avx512icl:                            9.2 ( 9.65x)

Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
2025-02-03 10:16:44 -03:00
Andreas Rheinhardt
3797e9239e swscale/x86/swscale: Move some constants to rgb2rgb.c
ff_w1111 and ff_bgr2(Y|UV)Offset are only used there
(and only on x86-32 since caaec2ea95).
Also make them static.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-02-02 17:00:07 +01:00
James Almer
78ba06928a swscale/x86/rgb2rgb: add optimized versions of the remaining shuffle_bytes functions
Signed-off-by: James Almer <jamrial@gmail.com>
2024-11-02 15:01:31 -03:00
Martin Storsjö
b9145fcab2 swscale: Fix aarch64 and i386 compilation failures
This unbreaks builds after c1a0e65763,
which broke with errors like

src/libswscale/aarch64/rgb2rgb.c:66:25: error: incompatible function pointer types assigning to 'void (*)(const uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int, const int32_t *)' (aka 'void (*)(const unsigned char *, unsigned char *, unsigned char *, unsigned char *, int, int, int, int, int, const int *)') from 'void (const uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int, int32_t *)' (aka 'void (const unsigned char *, unsigned char *, unsigned char *, unsigned char *, int, int, int, int, int, int *)') [-Wincompatible-function-pointer-types]
   66 |         ff_rgb24toyv12  = rgb24toyv12;
      |                         ^ ~~~~~~~~~~~

and

src/libswscale/aarch64/swscale_unscaled.c:213:29: error: incompatible function pointer types assigning to 'SwsFunc' (aka 'int (*)(struct SwsContext *, const unsigned char *const *, const int *, int, int, unsigned char *const *, const int *)') from 'int (SwsContext *, const uint8_t *const *, const int *, int, int, const uint8_t **, const int *)' (aka 'int (struct SwsContext *, const unsigned char *const *, const int *, int, int, const unsigned char **, const int *)') [-Wincompatible-function-pointer-types]
  213 |         c->convert_unscaled = nv24_to_yuv420p_neon_wrapper;
      |                             ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-10-08 09:29:07 +03:00
Ramiro Polla
caaec2ea95 swscale/x86/rgb2rgb: disable rgb24toyv12_mmxext for x86_64
The mmxext implementation is slower than the C version in x86_64.

                                m32               m64
rgb24toyv12_16_200_c:       24942.7           14812.6
rgb24toyv12_16_200_mmxext:  17857.2 ( 1.40x)  17400.4 ( 0.85x)
rgb24toyv12_128_60_c:       56892.9           35616.9
rgb24toyv12_128_60_mmxext:  40730.9 ( 1.40x)  39610.4 ( 0.90x)
rgb24toyv12_512_16_c:       58402.7           37209.4
rgb24toyv12_512_16_mmxext:  44842.4 ( 1.30x)  41136.2 ( 0.90x)
rgb24toyv12_1920_4_c:       54827.4           34737.4
rgb24toyv12_1920_4_mmxext:  51169.9 ( 1.07x)  34818.9 ( 1.00x)
2024-09-06 23:06:38 +02:00
Ramiro Polla
4c824ad391 swscale/x86/rgb2rgb: fix deinterleaveBytes writing past the end of the buffers 2024-09-06 23:05:04 +02:00
James Almer
03546f49a3 swscale/x86/rgb2rgb: add missing wrap for ff_uyvytoyuv422_avx2
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 15:56:52 -03:00
James Almer
e8cef5e152 swscale/x86/rgb2rgb: remove mmxext version of shuffle_bytes_2103
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:43:11 -03:00
James Almer
c578bb9864 swscale/x86/input: add AVX2 optimized uyvytoyuv422
uyvytoyuv422_c: 23991.8
uyvytoyuv422_sse2: 2817.8
uyvytoyuv422_avx: 2819.3
uyvytoyuv422_avx2: 1972.3

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:43:11 -03:00
Andreas Rheinhardt
8b62fb231a swscale/x86/rgb2rgb: Detemplatize
Every function in rgb2rgb_template.c is only compiled exactly
once; there is no overlap at all between the MMXEXT and the
SSE2 functions, so detemplatize it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
c1c35380a7 swscale/x86/rgb2rgb: Don't unnecessarily check for inline ASM
The SSE2 and AVX versions of deinterleaveBytes are external ASM.
Move them out of the inline ASM template.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
608319a311 swscale/x86/rgb2rgb: Remove obsolete MMX, 3dnow functions
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2) for x64. So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-22 13:35:38 +02:00
Wu Jianhua
2c734a8496 libswscale/x86/rgb2rgb: add shuffle_bytes avx2
Performance data(Less is better):
    shuffle_bytes_ssse3   3.64654
    shuffle_bytes_avx2    0.94288

Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
2021-10-15 10:59:20 +02:00
Andreas Rheinhardt
aad597a93c swscale/x86/rgb2rgb: Remove unused ASM constants
mask24hh etc. are unused since f099fbf5f3,
mask32b and mask32r since 296609f859,
mask32g since b38d487466 and mask32 since
f8a138be52.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-24 09:45:17 +01:00
Anton Khirnov
e15371061d lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bump
They are not properly namespaced and not intended for public use.
2021-01-01 14:14:57 +01:00
Martin Vignali
296609f859 swscale/x86/rgb2rgb : port shuffle 2103 mmxext to external asm and remove inline asm version 2018-10-13 14:12:41 +02:00
Martin Vignali
07a566e7d6 swscale/swscale_unscaled : add X86_64 (SSE2 and AVX) for uyvyto422
and checkasm test
2018-04-22 19:15:32 +02:00
Martin Vignali
1ba5ca2d72 swscale/rgb : add X86 SIMD (SSSE3), for shuffle_bytes_1230, shuffle_bytes_3012, shuffle_bytes_3210 2018-03-24 20:22:08 +01:00
Martin Vignali
923a324174 swscale/rgb : add X86 SIMD (SSSE3) for shuffle_bytes_2103 and shuffle_bytes_0321 2018-03-24 20:21:58 +01:00
Hendrik Leppkes
c142dc203e Merge commit 'dc40a70c57'
* commit 'dc40a70c57':
  Drop unnecessary libavutil/x86/asm.h #includes

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-06-26 15:53:00 +02:00
Diego Biurrun
dc40a70c57 Drop unnecessary libavutil/x86/asm.h #includes 2016-05-28 19:18:26 +02:00
Matt Oliver
9eb3f11c55 Add missing external declarations.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-17 00:48:09 +01:00
Michael Niedermayer
7597e6efe4 swscale/x86/rgb2rgb: add support for AVX
This does not yet include any actual AVX code

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-01-21 18:01:29 +01:00
Michael Niedermayer
445c58a8c6 swscale/x86/rgb2rgb: Make sure COMPILE_TEMPLATE_AVX is defined
Found-by: iive
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-12-14 02:54:28 +01:00
Diego Biurrun
c16bfb147d swscale: x86: Consistently use lowercase function name suffixes 2013-11-22 23:01:51 +01:00
Michael Niedermayer
1de064e21e swscale/x86/rgb2rgb: change cpu optim identifiers to lower case
This makes the code more similar to the other optims and allows us
to use the same macros to build function names

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-11-19 15:13:48 +01:00
Michael Niedermayer
4729b529e6 swscale/x86/rgb2rgb: extend framework to also include AVX
This does not yet include any actual AVX code

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-11-19 15:13:48 +01:00
Michael Niedermayer
920dd84bf1 sws/x86: remove 8bit rgb2yuv coefficient case for rgb24toyv12 special converter
This simplifies the code and improves quality at the expense of a slight
slowdown of a rarely used function (no fate test uses it).

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-04-15 03:19:52 +02:00
Michael Niedermayer
add7513e64 Merge commit 'fa8fcab1e0'
* commit 'fa8fcab1e0':
  x86: h264_chromamc_10bit: drop pointless PAVG %define
  x86: mmx2 ---> mmxext in function names
  swscale: do not forget to swap data in formats with different endianness

Conflicts:
	libavcodec/x86/dsputil_mmx.c
	libavfilter/x86/gradfun.c
	libswscale/input.c
	libswscale/utils.c
	libswscale/x86/swscale.c
	tests/ref/lavfi/pixfmts_scale

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-01 13:11:51 +01:00
Diego Biurrun
d8eda37080 x86: mmx2 ---> mmxext in function names 2012-10-31 17:53:57 +01:00
Michael Niedermayer
78ec407d5a Merge commit '652f518594'
* commit '652f518594':
  x86: mmx2 ---> mmxext in comments and messages

Conflicts:
	libswscale/x86/swscale_template.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-31 14:02:35 +01:00
Diego Biurrun
652f518594 x86: mmx2 ---> mmxext in comments and messages 2012-10-31 00:37:42 +01:00
Michael Niedermayer
77aedc77ab Merge remote-tracking branch 'qatar/master'
* qatar/master:
  swscale: Provide the right alignment for external mmx asm
  x86: Replace checks for CPU extensions and flags by convenience macros
  configure: msvc: fix/simplify setting of flags for hostcc
  x86: mlpdsp: mlp_filter_channel_x86 requires inline asm

Conflicts:
	libavcodec/x86/fft_init.c
	libavcodec/x86/h264_intrapred_init.c
	libavcodec/x86/h264dsp_init.c
	libavcodec/x86/mpegaudiodec.c
	libavcodec/x86/proresdsp_init.c
	libavutil/x86/float_dsp_init.c
	libswscale/utils.c
	libswscale/x86/swscale.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-09-09 13:27:42 +02:00
Diego Biurrun
e0c6cce447 x86: Replace checks for CPU extensions and flags by convenience macros
This separates code relying on inline from that relying on external
assembly and fixes instances where the coalesced check was incorrect.
2012-09-08 18:18:34 +02:00
Michael Niedermayer
9f088a1ed4 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  mpegvideo: reduce excessive inlining of mpeg_motion()
  mpegvideo: convert mpegvideo_common.h to a .c file
  build: factor out mpegvideo.o dependencies to CONFIG_MPEGVIDEO
  Move MASK_ABS macro to libavcodec/mathops.h
  x86: move MANGLE() and related macros to libavutil/x86/asm.h
  x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h
  aacdec: Don't fall back to the old output configuration when no old configuration is present.
  rtmp: Add message tracking
  rtsp: Support mpegts in raw udp packets
  rtsp: Support receiving plain data over UDP without any RTP encapsulation
  rtpdec: Remove an unused include
  rtpenc: Remove an av_abort() that depends on user-supplied data
  vsrc_movie: discourage its use with avconv.
  avconv: allow no input files.
  avconv: prevent invalid reads in transcode_init()
  avconv: rename OutputStream.is_past_recording_time to finished.

Conflicts:
	configure
	doc/filters.texi
	ffmpeg.c
	ffmpeg.h
	libavcodec/Makefile
	libavcodec/aacdec.c
	libavcodec/mpegvideo.c
	libavformat/version.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-09 19:31:56 +02:00
Mans Rullgard
c318626ce2 x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h
This puts x86-specific things in the x86/ subdirectory where they
belong.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-09 00:58:20 +01:00
Michael Niedermayer
e776ee8f29 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  lavr: fix handling of custom mix matrices
  fate: force pix_fmt in lagarith-rgb32 test
  fate: add tests for lagarith lossless video codec.
  ARMv6: vp8: fix stack allocation with Apple's assembler
  ARM: vp56: allow inline asm to build with clang
  fft: 3dnow: fix register name typo in DECL_IMDCT macro
  x86: dct32: port to cpuflags
  x86: build: replace mmx2 by mmxext
  Revert "wmapro: prevent division by zero when sample rate is unspecified"
  wmapro: prevent division by zero when sample rate is unspecified
  lagarith: fix color plane inversion for YUY2 output.
  lagarith: pad RGB buffer by 1 byte.
  dsputil: make add_hfyu_left_prediction_sse4() support unaligned src.

Conflicts:
	doc/APIchanges
	libavcodec/lagarith.c
	libavfilter/x86/gradfun.c
	libavutil/cpu.h
	libavutil/version.h
	libswscale/utils.c
	libswscale/version.h
	libswscale/x86/yuv2rgb.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-04 23:51:43 +02:00
Diego Biurrun
239fdf1b4a x86: build: replace mmx2 by mmxext
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
2012-08-03 22:51:05 +02:00
Michael Niedermayer
2cb4d51654 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  v410dec: Implement explode mode support
  zerocodec: fix direct rendering.
  wav: init st to NULL to avoid a false-positive warning.
  wavpack: set bits_per_raw_sample for S32 samples to properly identify 24-bit
  h264: refactor NAL decode loop
  RTMPTE protocol support
  RTMPE protocol support
  rtmp: Add ff_rtmp_calc_digest_pos()
  rtmp: Rename rtmp_calc_digest to ff_rtmp_calc_digest and make it global
  swscale: add missing HAVE_INLINE_ASM check.
  lavfi: place x86 inline assembly under HAVE_INLINE_ASM.
  vc1: Add a test for interlaced field pictures
  swscale: Mark all init functions as av_cold
  swscale: x86: Drop pointless _mmx suffix from filenames
  lavf: use conditional notation for default codec in muxer declarations.
  swscale: place inline assembly bilinear scaler under HAVE_INLINE_ASM.
  dsputil: ppc: cosmetics: pretty-print
  dsputil: x86: add SHUFFLE_MASK_W macro
  configure: respect CC_O setting in check_cc

Conflicts:
	Changelog
	configure
	libavcodec/v410dec.c
	libavcodec/zerocodec.c
	libavformat/asfenc.c
	libavformat/version.h
	libswscale/utils.c
	libswscale/x86/swscale.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-07-23 21:25:09 +02:00
Diego Biurrun
5a6e3c039c swscale: Mark all init functions as av_cold 2012-07-23 01:30:05 +02:00
Michael Niedermayer
32c3038734 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: swscale: Place inline assembly code under appropriate #ifdefs
  rtsp: remove terminal comma in FF_RTP_FLAG_OPTS macro.
  configure: Remove redundant RTMPT/RTMPTS dependencies
  configure: add filtering of host cflags/ldflags
  configure: initialise all flag filters at the same place
  configure: add filtering of linker flags
  configure: name some variables more consistently
  configure: remove filter_cppflags
  configure: set icc_version where it is needed
  mpegenc: remove disabled code

Conflicts:
	configure
	libavformat/movenc.c
	libswscale/x86/swscale_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-07-22 05:05:02 +02:00
Ronald S. Bultje
b2668c85e9 x86: swscale: Place inline assembly code under appropriate #ifdefs
Fixes compilation for compilers that do not support gcc inline assembly.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-07-21 22:22:58 +02:00
Themaister
0827222b9c Use more accurate conversion for rgb15/16 to rgb24/32 (C/MMX).
Fate update by michael.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2011-11-09 01:58:22 +01:00
Michael Niedermayer
7a02527b05 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  ac3enc: use correct alignment and length in channel coupling dsp functions.
  ffmpeg: don't abuse a global for passing framerate from input to output
  ffmpeg: don't abuse a global for passing channels from input to output
  ffmpeg: don't abuse a global for passing samplerate from input to output
  ARM: update ff_h264_idct8_add4_neon for 4:4:4 changes
  swscale: use SwsContext for av_log when available
  swscale: Remove HAVE_MMX from files that are only compiled with MMX enabled.
  swscale: Fix compilation with --disable-mmx2.

Conflicts:
	ffmpeg.c
	libswscale/utils.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2011-06-16 03:53:58 +02:00
Diego Biurrun
a60466dbc3 swscale: Remove HAVE_MMX from files that are only compiled with MMX enabled. 2011-06-15 01:18:10 +02:00
Ronald S. Bultje
78046dadc3 rgb2rgb: remove duplicate mmx/mmx2/3dnow/sse2 functions.
Many functions have such a prefix, but do not actually use any
instructions or features from that set, thus giving the false
impression that swscale is highly optimized for a particular
system, whereas in reality it is not.
2011-05-28 11:41:32 +02:00
Ronald S. Bultje
522d65ba25 rgb2rgb: remove duplicate mmx/mmx2/3dnow/sse2 functions.
Many functions have such a prefix, but do not actually use any
instructions or features from that set, thus giving the false
impression that swscale is highly optimized for a particular
system, whereas in reality it is not.
2011-05-26 09:31:02 -04:00
Michael Niedermayer
034fc7bf12 Merge remote-tracking branch 'qatar/master'
* qatar/master: (22 commits)
  configure: enable memalign_hack automatically when needed
  swscale: unbreak the build on non-x86 systems.
  swscale: remove if(bitexact) branch from functions.
  swscale: remove if(canMMX2BeUsed) conditional.
  swscale: remove swScale_{c,MMX,MMX2} duplication.
  swscale: use emms_c().
  Move emms_c() from libavcodec to libavutil.
  tiff: set palette in the context when specified in TIFF_PAL tag
  rtsp: use strtoul to parse rtptime and seq values.
  pgssubdec: fix incorrect colors.
  dvdsubdec: fix incorrect colors.
  ape: Allow demuxing of files with metadata tags.
  swscale: remove dead macro WRITEBGR24OLD.
  swscale: remove AMD3DNOW "optimizations".
  swscale: remove duplicate code in ppc/ subdirectory.
  swscale: remove duplicated x86/ functions.
  swscale: force --enable-runtime-cpudetect and remove SWS_CPU_CAPS_*.
  vsrc_buffer.h: add file doxy
  vsrc_buffer: tweak error message in init()
  msmpeg4: reindent.
  ...

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2011-05-25 06:32:45 +02:00