ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-02-11 20:49:59 +00:00

Author	SHA1	Message	Date
Lynne	bbe95f7353	x86: replace explicit REP_RETs with RETs From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.	2023-02-01 04:23:55 +01:00
James Almer	48615f0a78	x86/aacpsdsp: add ps_hybrid_analysis_fma3 This replace the sse3 version, which was not really faster than the sse one. Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-22 13:27:43 -03:00
James Almer	2bcf86d53d	x86/aacpsdsp: precompute constant factors Inspired by the optimization done to the C version by Rémi Denis-Courmont. Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-22 13:27:43 -03:00
Clément Bœsch	b12a36170b	lavc/aacpsdsp: use ptrdiff_t for stride in hybrid_analysis	2017-06-28 12:22:39 +02:00
James Almer	8bb59e6742	x86/aacpsdsp: add ff_ps_hybrid_analysis_ileave_sse About 2x faster than the c version.	2017-06-18 22:34:22 -03:00
James Almer	e229df9478	x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4} About 2x faster than the c version.	2017-06-18 22:33:27 -03:00
James Almer	623d217ed1	avcodec/aacps: move checks for valid length outside the stereo_interpolate dsp function Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-15 23:49:40 -03:00
James Almer	497a4b554c	x86/aacpsdsp: fix output of ff_ps_stereo_interpolate_ipdopd_sse3 The fate-aac-al_sbr_ps_04_ur test did not detect this mistake.	2017-06-07 13:53:51 -03:00
James Almer	933dd62288	x86/aacpsdsp: optimize ff_ps_mul_pair_single_sse ~2% faster.	2017-06-04 23:29:56 -03:00
James Almer	be3809a521	x86/aacpsdsp: optimize ff_ps_stereo_interpolate_sse3 Move the unpacking outside of the loop. 5% to 10% faster. Suggested-by: ubitux Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-03 12:39:43 -03:00
James Almer	b5a0971ff0	x86/aacps: add ff_ps_stereo_interpolate_ipdopd_sse3() About 2x faster than the c version. Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-02 11:06:24 -03:00
James Almer	ede4ec1f8f	x86/aacpsdsp: optimize add_squares loop Signed-off-by: James Almer <jamrial@gmail.com>	2016-06-14 12:41:23 -03:00
James Almer	82dbfccaf0	x86/aacdec: use HADDPS macro Signed-off-by: James Almer <jamrial@gmail.com>	2016-06-08 14:18:18 -03:00
Henrik Gramner	f0b7882ceb	x86inc: Drop SECTION_TEXT macro The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.	2015-08-04 20:13:09 +02:00
James Almer	9dcaae70f2	x86/aacpsdsp: add SSE and SSE3 optimized functions Between 1.5 and 2.5 times faster Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	2015-07-30 19:01:15 -03:00

15 commits