Once upon a time, 413abbe164
added versions of some put_no_rnd_pixels functions for use
in VP3 and Theora (with an explicit check so that they are
only used for VP3 and Theora). When this was moved to hpeldsp
(from dsputil) in 3ced55d51c,
the check was replaced by a check for the bitexact flag
(and a CONFIG_VP3_DECODER compile-time check), so that
these functions were now used for other codecs as well.
Later commit 1dfc3cf89d
split off the "VP3-specific bits into a separate file",
yet these bits were not really VP3-specific bits at all
any more. (The error was repeated in commit
0a39c9ac0b.) This commit
has not been reverted, because this would make future
changes from Libav (from where it originated) harder,
yet Libav is no more, so this commit effectively reverts
1dfc3cf89d.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)
x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.
The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.
In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2) for x64. So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Loading pb_1 rather than pw_8192 was benchmarked to be more efficient.
Loading of the 2 yields no advantage. Loading of one saves ~11 cycles.
decicycles count:
put8: 3223(mmx) -> 2387
avg8: 2863(mmxext) -> 2125
put16: 4356(sse2) -> 3553
avg16: 4481(sse2) -> 3513
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This is a port of the inline assembly of the mmx version to use the
pavg(us|)b instruction.
8 16
mmx 1498 4355
mmx2 1242 3509
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Currently, only the mmx version is bitexact, the others (mmxext and
3dnow) are not, in spite of their naming.
Therefore, make their name more obvious. Also restore a comment that
was removed in 71155d7b.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '55519926ef':
x86: Make function prototype comments in assembly code consistent
Conflicts:
libavcodec/x86/sbrdsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '831a118078':
Update dsputil- and SIMD-related comments to match reality more closely
Conflicts:
libavcodec/x86/hpeldsp.asm
libavutil/arm/float_dsp_init_arm.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
There are instructions pavgb and pavgusb. Both instructions do the same
operation but they have different enconding. Pavgb exists in SSE (or
MMXEXT) instruction set and pavgusb exists in 3D-NOW instruction set.
livavcodec uses the macro PAVGB to select the proper instruction. However,
the function avg_pixels8_xy2 doesn't use this macro, it uses pavgb
directly.
As a consequence, the function avg_pixels8_xy2 crashes on AMD K6-2 and
K6-3 processors, because they have pavgusb, but not pavgb.
This bug seems to be introduced by commit
71155d7b41, "dsputil: x86: Convert mpeg4
qpel and dsputil avg to yasm"
Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This way, they can be shared between mpeg4qpel and h264qpel without
requiring either one to be compiled unconditionally.
Signed-off-by: Martin Storsjö <martin@martin.st>
This way, they can be shared between mpeg4qpel and h264qpel without
requiring either one to be compiled unconditionally.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '25841dfe80':
Use ptrdiff_t instead of int for {avg, put}_pixels line_size parameter.
Conflicts:
libavcodec/alpha/dsputil_alpha.c
libavcodec/dsputil_template.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
x86: hpel: Move {avg,put}_pixels16_sse2 to hpeldsp
configure: Add a comment indicating why uclibc is checked before glibc
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '05b0998f51':
dsputil: Fix error by not using redzone and register name
swscale: GBRP output support
Conflicts:
libswscale/output.c
libswscale/swscale.c
libswscale/swscale_internal.h
libswscale/utils.c
tests/ref/lavfi/pixdesc
tests/ref/lavfi/pixfmts_copy
tests/ref/lavfi/pixfmts_null
tests/ref/lavfi/pixfmts_scale
tests/ref/lavfi/pixfmts_vflip
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This also fixes the project name
Original authors fabrice and nick go back to the initial ffmpeg commit
Others for example contributed in: (for a complete list please use git blame / show / log)
commit e9c0a38ff0
Author: Zdenek Kabelac <kabi@informatics.muni.cz>
Date: Tue May 28 16:35:58 2002 +0000
* optimized avg_* functions (except xy2)
* minor speedup for put_pixels_x2 & cleanup
Originally committed as revision 619 to svn://svn.ffmpeg.org/ffmpeg/trunk
commit 607dce96c0
Author: Michael Niedermayer <michaelni@gmx.at>
Date: Fri May 17 01:04:14 2002 +0000
hopefully faster mmx2&3dnow MC
Originally committed as revision 506 to svn://svn.ffmpeg.org/ffmpeg/trunk
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>