ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-02-09 11:20:14 +00:00

Author	SHA1	Message	Date
James Almer	4fee63b241	x86/takdsp: add missing wrappers to AVX2 functions Fixes compilation with old yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2023-12-25 22:31:15 -03:00
James Almer	591dc3b4b8	x86/takdsp: add avx2 versions of all functions On an Intel Core i7 12700k: decorrelate_ls_c: 814.3 decorrelate_ls_sse2: 165.8 decorrelate_ls_avx2: 101.3 decorrelate_sf_c: 1602.6 decorrelate_sf_sse4: 640.1 decorrelate_sf_avx2: 324.6 decorrelate_sm_c: 1564.8 decorrelate_sm_sse2: 379.3 decorrelate_sm_avx2: 203.3 decorrelate_sr_c: 785.3 decorrelate_sr_sse2: 176.3 decorrelate_sr_avx2: 99.8 Tested-by: Lynne <dev@lynne.ee> Signed-off-by: James Almer <jamrial@gmail.com>	2023-12-23 08:39:22 -03:00
Lynne	bbe95f7353	x86: replace explicit REP_RETs with RETs From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.	2023-02-01 04:23:55 +01:00
James Almer	dab5f65b25	x86/takdsp: use arithmetic shift instructions p1 and p2 are int32_t. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-10-09 23:52:39 -03:00
Paul B Mahol	35af7add6f	avcodec/takdec: add x86 SIMD for rest of decorrelation modes Signed-off-by: Paul B Mahol <onemda@gmail.com>	2015-10-09 21:38:15 +02:00

5 commits