Commit graph

22 commits

Author SHA1 Message Date
Frank Plowman
1e3dc705df lavc/x86/videodsp: Drop MMX usage
Remove the MMX versions of these functions and modify the SSE
implementations to avoid using MMX registers.

Signed-off-by: Frank Plowman <post@frankplowman.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
2024-12-01 13:26:34 +08:00
Lynne
bbe95f7353
x86: replace explicit REP_RETs with RETs
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)

x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.

The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.

In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
2023-02-01 04:23:55 +01:00
Andreas Rheinhardt
19abc4c0a9 avcodec/x86/videodsp: Remove obsolete MMX, 3dnow, SSE functions
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT, SSE and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2). So given that the only systems which benefit
from these functions are truely ancient 32bit x86s they are removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-22 13:37:35 +02:00
James Almer
bac44a5020 Merge commit 'b89804da9b'
* commit 'b89804da9b':
  x86: videodsp: Add parentheses to expression to work around warning

Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:35:49 -03:00
Diego Biurrun
b89804da9b x86: videodsp: Add parentheses to expression to work around warning
libavcodec/x86/videodsp.asm:128: warning: signed dword value exceeds bounds
2016-10-19 10:13:34 +02:00
Ronald S. Bultje
0f88b3f82f videodsp: fix 1-byte overread in top/bottom READ_NUM_BYTES iterations.
This can overread (either before start or beyond end) of the buffer in
Nx1 (i.e. height=1) images.

Fixes mozilla bug 1240080.
2016-01-18 11:12:47 -05:00
Ronald S. Bultje
52f84d82bd videodsp: don't overread edges in vfix3 emu_edge.
Fixes trac ticket 3226. Also see Andreas' analysis in
https://bugs.debian.org/801745, which was very helpful.
2015-10-24 14:34:50 -04:00
James Almer
70277d1d23 x86/videodsp: add ff_emu_edge_{hfix,hvar}_avx2
~15% faster than sse2.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-24 16:12:55 -03:00
Michael Niedermayer
5bca5f87d1 Revert "x86/videodsp: add emulated_edge_mc_mmxext"
The commit causes minor out of array reads and was mainly intended for
future optimizations which turned out not to be meassurably faster.
Itself it was just 1 cpu cycle faster

Approved-by: jamrial

This reverts commit 057d2704e7.
2014-06-28 05:39:07 +02:00
James Almer
057d2704e7 x86/videodsp: add emulated_edge_mc_mmxext
This also changes hfix8_mmx and above to use mmx regs instead of
gprs, and makes emulated_edge_mc_sse and emulated_edge_mc_sse2 use
mmxext hfix and hvar functions instead of mmx where possible.

This is mostly in preparation for an ssse3 version.

Signed-off-by: James Almer <jamrial@gmail.com>

code is about 1 cpu cycle faster approximately

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-26 17:58:57 +02:00
Ronald S. Bultje
9ee9c679a7 x86: videodsp: Fix a bug in a %if statement where we used '%%' instead of '&&'.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-01-30 15:33:23 +01:00
Ronald S. Bultje
51daafb02e x86: videodsp: Properly mark sse2 instructions in emulated_edge_mc as such.
Should fix crashes or corrupt output on pre-SSE2 CPUs when they were
using SSE2-code (e.g. AMD Athlon XP 2400+ or Intel Pentium III) in
hfix or hvar single-edge (left/right) extension functions.

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-01-30 15:30:01 +01:00
Ronald S. Bultje
458446acfa lavc: Edge emulation with dst/src linesize
Allow supporting files for which the image stride is smaller than
the maximum block size + number of subpel mc taps, e.g. a 64x64 VP9
file or a 16x16 VP8 file with -fflags +emu_edge.
2013-11-15 10:16:27 +01:00
Ronald S. Bultje
960490c0b2 avcodec/x86/videodsp: Small speedups in ff_emulated_edge_mc x86 SIMD.
Don't use word-size multiplications if size == 2, and if we're using
SIMD instructions (size >= 8), complete leftover 4byte sets using movd,
not mov. Both of these changes lead to minor speedups.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-27 15:02:48 +01:00
Ronald S. Bultje
cd86eb265f avcodec/x86/videodsp: fix a bug in a %if statement where we used '%%' instead of '&&'.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-27 15:02:48 +01:00
Ronald S. Bultje
1b3a7e1f42 avcodec/x86/videodsp: Properly mark sse2 instructions in emulated_edge_mc x86 simd as such.
Should fix crashes or corrupt output on pre-SSE2 CPUs when they were
using SSE2-code (e.g. AMD Athlon XP 2400+ or Intel Pentium III) in
hfix or hvar single-edge (left/right) extension functions.

Tested-by: Ingo Brückl <ib@wupperonline.de>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-24 13:36:55 +02:00
Ronald S. Bultje
20d78a8606 libavcodec/x86: Fix emulated_edge_mc SSE code to not contain SSE2 instructions on x86-32.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-10 13:36:06 +02:00
Ronald S. Bultje
face578d56 Rewrite emu_edge functions to have separate src/dst_stride arguments.
This allows supporting files for which the image stride is smaller than
the max. block size + number of subpel mc taps, e.g. a 64x64 VP9 file
or a 16x16 VP8 file with -fflags +emu_edge.
2013-09-28 20:28:08 -04:00
Michael Niedermayer
63a97d5674 Merge commit 'b6649ab503'
* commit 'b6649ab503':
  cosmetics: Remove unnecessary extern keywords from function declarations

Conflicts:
	libswscale/x86/swscale.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-28 11:20:41 +01:00
Diego Biurrun
b6649ab503 cosmetics: Remove unnecessary extern keywords from function declarations 2013-03-27 14:21:45 +01:00
Michael Niedermayer
e16bac7b33 videodsp: Fix project name
These are all part of splited out dsp utils from FFmpeg

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-22 00:58:08 +01:00
Ronald S. Bultje
8c53d39e7f lavc: introduce VideoDSPContext
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-20 13:40:45 +01:00