ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-02-08 10:50:02 +00:00

Author	SHA1	Message	Date
James Darnley	33de0fee2c	avcodec/h264: enable sse2 chroma deblock/loop filter functions Between 1.00 and 1.16 times faster on Intel Yorkfield Core 2 Quad. Between 1.11 and 1.39 times faster on Intel Kaby Lake Pentium.	2017-02-27 13:22:06 +01:00
James Darnley	cd893b9307	avcodec/h264: add avx 8-bit 4:2:2 chroma h intra deblock/loop filter ~1.37x faster (147 vs. 108 cycles) compared to mmxext function	2017-02-27 13:22:06 +01:00
James Darnley	0e16b3e2be	avcodec/h264: add avx 8-bit 4:2:0 chroma h intra deblock/loop filter ~1.10x faster (69 vs. 63 cycles) compared to mmxext function	2017-02-27 13:22:06 +01:00
James Darnley	987ffe4b8d	avcodec/h264: add avx 8-bit chroma v intra deblock/loop filter ~1.14x faster (90 vs 78 cycles) compared with mmxext	2017-02-27 13:22:06 +01:00
James Darnley	88307b3eec	avcodec/h264: add avx 8-bit 4:2:2 chroma h deblock/loop filter ~1.21x faster (68 vs. 56 cycles) compared with mmxext function	2017-02-27 13:22:06 +01:00
James Darnley	ac096fc82d	avcodec/h264: add avx 8-bit 4:2:0 chroma h deblock/loop filter ~1.14x faster (93 vs. 81 cycles) compared with mmxext function	2017-02-27 13:22:06 +01:00
James Darnley	5c56758843	avcodec/h264: add avx 8-bit chroma v deblock/loop filter ~1.24x faster (101 vs. 81 cycles) compared with mmxext function	2017-02-27 13:22:06 +01:00
James Darnley	5336887867	avcodec/h264: sse2, avx h luma mbaff deblock/loop filter x86-64 only Yorkfield: - sse2: ~2.17x (434 vs. 200 cycles) Nehalem: - sse2: ~2.94x (409 vs. 139 cycles) Skylake: - sse2: ~3.10x (370 vs. 119 cycles) - avx: ~3.29x (370 vs. 112 cycles)	2017-02-18 20:26:52 +01:00
James Darnley	e18bc2114f	avcodec/h264: add named parameters to x86 function	2017-02-18 20:26:50 +01:00
James Darnley	9d815b7424	avcodec/x86: deduplicate PASS8ROWS macro	2017-02-18 20:26:49 +01:00
James Almer	c8467abbad	x86/rv34dsp: add ff_rv34_idct_dc_add_sse2 Also disable ff_rv34_idct_dc_add_mmx on x86_64 as the presence of sse2 is guaranteed in such builds. Signed-off-by: James Almer <jamrial@gmail.com>	2017-02-02 17:51:21 -03:00
James Almer	ab5c4d006d	x86/vp8dsp: add ff_vp8_idct_dc_add_sse2 Also disable ff_vp8_idct_dc_add_mmx on x86_64 as the presence of sse2 is guaranteed in such builds. Signed-off-by: James Almer <jamrial@gmail.com>	2017-02-02 17:18:58 -03:00
Michael Niedermayer	536ac72f46	Revert "Merge commit '`0a39c9ac0b`'" The assumption this is based on is wrong, the code is not always run with bitexact flags This reverts commit `a956164e1e`, reversing changes made to `f6005907fd`. Approved-by: James Almer <jamrial@gmail.com>	2017-02-01 02:01:07 +01:00
James Almer	ba5d089381	Merge commit '`d06dfaa5cb`' * commit '`d06dfaa5cb`': x86: huffyuv: Use EXTERNAL_SSSE3_FAST convenience macro where appropriate Merged-by: James Almer <jamrial@gmail.com>	2017-01-31 15:36:49 -03:00
James Almer	ac774cfa57	Merge commit '`4efab89332`' * commit '`4efab89332`': x86: Use _FAST/_SLOW CPU feature detection macros where appropriate Merged-by: James Almer <jamrial@gmail.com>	2017-01-31 15:08:19 -03:00
James Almer	a956164e1e	Merge commit '`0a39c9ac0b`' * commit '`0a39c9ac0b`': x86: hpeldsp: Don't check for bitexact flag when initializing VP3-specific code Merged-by: James Almer <jamrial@gmail.com>	2017-01-31 14:59:29 -03:00
James Almer	f6005907fd	Merge commit '`95c1df929b`' * commit '`95c1df929b`': x86: hpeldsp: Drop unused function parameters Merged-by: James Almer <jamrial@gmail.com>	2017-01-31 14:56:11 -03:00
James Almer	4d0e89ce27	Merge commit '`c3e83ad3b7`' * commit '`c3e83ad3b7`': x86: hpeldsp: Use EXTERNAL_SSE2_FAST where appropriate Merged-by: James Almer <jamrial@gmail.com>	2017-01-31 14:53:27 -03:00
James Almer	ca8a3978e5	Merge commit '`1dfc3cf89d`' * commit '`1dfc3cf89d`': x86: hpeldsp: Split off VP3-specific bits into a separate file Merged-by: James Almer <jamrial@gmail.com>	2017-01-31 14:49:29 -03:00
Clément Bœsch	7c300a8ed4	lavc/hevc: remove a few random spaces to reduce diff with libav	2017-01-31 17:02:24 +01:00
Clément Bœsch	78d16eb452	Merge commit '`fca3c3b619`' * commit '`fca3c3b619`': hevc: Add AVX2 DC IDCT Mostly noop as we already have that code. In the ASM, code is merged with the exception of SECTION which is kept uppercase for consistency with the rest of the codebase. Still in the ASM, the prototype comment is fixed to honor the '_' added from the original commit. idct_dc_proto() is dropped as it's not used anymore here. Merged-by: Clément Bœsch <cboesch@gopro.com>	2017-01-31 16:53:37 +01:00
Clément Bœsch	d0e132bab6	Merge commit '`1bd890ad17`' * commit '`1bd890ad17`': hevc: Separate adding residual to prediction from IDCT This commit should be a noop but isn't because of the following renames: - transform_add → add_residual - transform_skip → dequant - idct_4x4_luma → transform_4x4_luma Merged-by: Clément Bœsch <cboesch@gopro.com>	2017-01-31 15:31:34 +01:00
James Almer	6d4c9f2ade	lossless_videodsp: rename add_hfyu_left_pred_int16 to add_left_pred_int16 Signed-off-by: James Almer <jamrial@gmail.com>	2017-01-12 22:53:05 -03:00
James Almer	47f212329e	huffyuvdsp: move functions only used by huffyuv from lossless_videodsp Signed-off-by: James Almer <jamrial@gmail.com>	2017-01-12 22:53:05 -03:00
James Almer	cf9ef83960	huffyuvencdsp: move shared functions to a new lossless_videoencdsp context Signed-off-by: James Almer <jamrial@gmail.com>	2017-01-12 22:53:04 -03:00
James Almer	30c1f27299	huffyuvencdsp: move functions only used by huffyuv from lossless_videodsp Signed-off-by: James Almer <jamrial@gmail.com>	2017-01-12 22:53:04 -03:00
James Almer	5ac1dd8e23	lossless_videodsp: move shared functions from huffyuvdsp Several codecs other than huffyuv use them. Signed-off-by: James Almer <jamrial@gmail.com>	2017-01-12 22:53:04 -03:00
Michael Niedermayer	aa95292043	avcodec/x86/vc1dsp_mc: Fix build with NASM 2.09.10 make fate passes Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-02 22:37:55 +01:00
John Comeau	d06518752b	avcodec/x86/imdct36: fix building with nasm 2.11.05 fixes `operation size not specified` errors as described here: http://stackoverflow.com/questions/36854583/compiling-ffmpeg-for-kali-linux-2 I rebuilt again with yasm and made sure it didn't break that. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-02 20:44:16 +01:00
Paul B Mahol	6d09d6edbc	avcodec/magicyuv: add 10 bit support Signed-off-by: Paul B Mahol <onemda@gmail.com>	2016-12-20 13:32:15 +01:00
James Darnley	acdd2d805d	avcodec/h264: resolve assert being triggered when stack is not aligned 32-bit msvc.	2016-12-07 22:32:19 +01:00
James Darnley	728651df06	avcodec/h264: mmx2, sse2, avx 10-bit 4:2:2 h chroma deblock/loop filter Yorkfield: - mmx2: 2.53x (504 vs. 199 cycles) - sse2: 3.83x (504 vs. 131 cycles) Nehalem: - mmx2: 2.42x (365 vs. 151 cycles) - sse2: 3.56x (365 vs. 103 cycles) Skylake: - mmx2: 1.81x (308 vs. 170 cycles) - sse2: 2.84x (308 vs. 108 cycles) - avx: 2.93x (308 vs. 105 cycles)	2016-12-07 00:29:13 +01:00
James Darnley	add21d0bb3	avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter Yorkfield: - mmx2: 2.45x (279 vs. 114 cycles) - sse2: 3.36x (279 vs. 83 cycles) Nehalem: - mmx2: 2.10x (192 vs. 92 cycles) - sse2: 2.84x (192 vs. 68 cycles) Skylake: - mmx2: 1.75x (170 vs. 97 cycles) - sse2: 2.47x (170 vs. 69 cycles) - avx: 2.47x (170 vs. 69 cycles)	2016-12-07 00:29:13 +01:00
James Darnley	58ca2ef62e	whitespace changes after last commit	2016-12-07 00:29:13 +01:00
James Darnley	f33714a694	avcodec/h264: clean up and expand x86 function definitions	2016-12-07 00:29:13 +01:00
James Darnley	13d71c28cc	avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions Yorkfield: - sse2: - complex: 4.13x faster (1514 vs. 367 cycles) - simple: 4.38x faster (1836 vs. 419 cycles) Skylake: - sse2: - complex: 3.61x faster ( 936 vs. 260 cycles) - simple: 3.97x faster (1126 vs. 284 cycles) - avx (versus sse2): - complex: 1.07x faster (260 vs. 244 cycles) - simple: 1.03x faster (284 vs. 274 cycles)	2016-11-30 22:58:28 +01:00
James Darnley	1dae7ffa0b	avcodec/h264: mmx 4:2:2 idct add8 function 2.87 times faster (1830 vs. 638 cycles)	2016-11-30 22:58:27 +01:00
James Darnley	815ea8c6cc	avcodec/h264: mmxext 4:2:2 chroma intra deblock/loop filter 2.1 times faster (401 vs. 194 cycles)	2016-11-30 22:58:27 +01:00
James Almer	2de1c79b61	x86/vp9itxfm: add missing AVX2 guards Fixes compilation with Yasm 1.1.0 and older. Signed-off-by: James Almer <jamrial@gmail.com>	2016-11-18 17:01:11 -03:00
Ronald S. Bultje	83a139e3d8	vp9: add avx2 iadst16 implementations. Also a small cosmetic change to the avx2 idct16 version to make it explicit that one of the arguments to the write-out macros is unused for >=avx2 (it uses pmovzxbw instead of punpcklbw).	2016-11-15 11:01:36 -05:00
Hendrik Leppkes	db854c6c4a	Merge commit '`4a081f224e`' * commit '`4a081f224e`': libavcodec: fix constness in clobber test avcodec_open2() wrappers Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2016-11-13 17:30:33 +01:00
Andreas Cadhalpun	c8a6eb58d7	doc: fix spelling errors Thanks to Mathieu Malaterre <malat@debian.org> for reporting the Que/Queue typo. (https://bugs.debian.org/839542) Reviewed-by: Lou Logan <lou@lrcd.com> Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>	2016-10-21 23:58:47 +02:00
Rostislav Pehlivanov	d2ae5f77c6	aacenc: add SIMD optimizations for abs_pow34 and quantization Performance improvements: quant_bands: with: 681 decicycles in quant_bands, 8388453 runs, 155 skips without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips Around 42% for the function Twoloop coder: abs_pow34: with/without: 7.82s/8.17s Around 4% for the entire encoder Both: with/without: 7.15s/8.17s Around 12% for the entire encoder Fast coder: abs_pow34: with/without: 3.40s/3.77s Around 10% for the entire encoder Both: with/without: 3.02s/3.77s Around 20% faster for the entire encoder Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com> Tested-by: Michael Niedermayer <michael@niedermayer.cc> Reviewed-by: James Almer <jamrial@gmail.com>	2016-10-18 21:41:18 +01:00
James Almer	42111e8543	avcodec: fix arguments on xmm/neon clobber test wrappers Signed-off-by: James Almer <jamrial@gmail.com>	2016-10-02 02:15:47 -03:00
James Almer	449f263f9f	avcodec: add missing xmm/neon clobber test wrappers for the new encode API Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2016-10-01 14:08:50 -03:00
Hendrik Leppkes	5ae0ad001a	x86/h264_weight: use appropriate register size for weight parameters Fixes trac 5579 Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Acked-by: Michael Niedermayer <michael@niedermayer.cc>	2016-09-23 16:40:57 +02:00
Michael Niedermayer	bc26fe8927	avcodec/h264: Use ptrdiff_t for (bi)weight functions Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2016-09-23 04:10:44 +02:00
James Almer	d950279cbf	avcodec/ttadsp: cosmetics Clean some header includes and use the same naming scheme as in ttaencdsp Signed-off-by: James Almer <jamrial@gmail.com>	2016-08-06 18:27:01 -03:00
James Almer	efc9d5c4bc	x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4} Signed-off-by: James Almer <jamrial@gmail.com>	2016-08-02 15:48:04 -03:00
Clément Bœsch	15b26e88cb	Merge commit '`9df889a5f1`' * commit '`9df889a5f1`': h264: rename h264.[ch] to h264dec.[ch] Merged-by: Clément Bœsch <u@pkh.me>	2016-07-29 11:01:36 +02:00

1 2 3 4 5 ...

2211 commits