ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-02-10 12:09:53 +00:00

Author	SHA1	Message	Date
Michael Niedermayer	516c213f08	avcodec/x86/vp9dsp_init_16bpp: Fix linking to missing ff_vp9_ipred_dr_32x32_16_avx2() on 32bit Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-06-28 00:31:33 +02:00
Ilia Valiakhmetov	35a5d9715d	avcodec/vp9: add 64-bit ipred_dr_32x32_16 avx2 implementation vp9_diag_downright_32x32_12bpp_c: 429.7 vp9_diag_downright_32x32_12bpp_sse2: 158.9 vp9_diag_downright_32x32_12bpp_ssse3: 144.6 vp9_diag_downright_32x32_12bpp_avx: 141.0 vp9_diag_downright_32x32_12bpp_avx2: 73.8 Almost 50% faster than avx implementation Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2017-06-27 16:10:50 -04:00
Paul B Mahol	4ed7c2bbc3	avcodec/utvideodec: add SIMD for restore_rgb_planes Signed-off-by: Paul B Mahol <onemda@gmail.com>	2017-06-27 09:54:10 +02:00
Matthieu Bouron	db5bf64b21	lavc/x86: clear r2 higher bits in ff_sbr_sum_square Suggested-by: James Almer <jamrial@gmail.com>	2017-06-26 09:55:23 +02:00
James Almer	349446e36f	x86/mdct15: use three operand form for some instructions Fixes compilation with old yasm	2017-06-24 01:44:49 -03:00
Rostislav Pehlivanov	e1120b1c54	mdct15: add assembly optimizations for the 15-point FFT c: 1802 decicycles in fft15,16774635 runs, 2581 skips avx: 865 decicycles in fft15,16776378 runs, 838 skips Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	2017-06-23 23:45:37 +01:00
Diego Biurrun	fd502f4f5f	build: Generalize yasm/nasm-related variable names None of them are specific to the YASM assembler. (Cherry-picked from libav commit `39e208f4d4`) Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-21 17:00:29 -03:00
James Darnley	8221c71703	avcodec/x86: allow future 8-bit simple idct to use slightly different coefficients	2017-06-20 16:12:25 +02:00
James Darnley	d2597fb0c1	avcodec/x86: modify simple_idct10 macros to add an action paramter	2017-06-20 13:35:01 +02:00
James Darnley	8781330d80	avcodec/x86: cleanup simple_idct10 Use named arguments for the functions so we can remove a define. The stride/linesize argument is now ptrdiff_t type so we no longer need to sign extend the register.	2017-06-20 13:34:38 +02:00
James Darnley	e3db94302c	avcodec/x86/mpegenc: support transpose permuation type	2017-06-20 12:12:13 +02:00
James Darnley	fa30a0a548	avcodec/x86/mpegenc: check IDCT permutation type is a valid value	2017-06-20 12:12:13 +02:00
Michael Niedermayer	ae6f6d4e34	avcodec/x86/mpegvideo: Use intra scantable in dct_unquantize_h263_intra_mmx() Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-06-20 00:07:51 +02:00
James Almer	8bb59e6742	x86/aacpsdsp: add ff_ps_hybrid_analysis_ileave_sse About 2x faster than the c version.	2017-06-18 22:34:22 -03:00
James Almer	e229df9478	x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4} About 2x faster than the c version.	2017-06-18 22:33:27 -03:00
James Almer	623d217ed1	avcodec/aacps: move checks for valid length outside the stereo_interpolate dsp function Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-15 23:49:40 -03:00
James Almer	b3446862bf	x86/vorbisdsp: optimize ff_vorbis_inverse_coupling_sse About 7% faster.	2017-06-15 23:20:05 -03:00
Ronald S. Bultje	d35ff98e27	vp9: fix overwrite in ff_vp9_ipred_dr_16x16_16_avx2. Fixes trac issue 6459.	2017-06-14 11:37:38 -04:00
Ilia Valiakhmetov	81fc617c12	avcodec/vp9: ipred_dr_16x16_16 avx2 implementation Signed-off-by: Ilia Valiakhmetov <zakne0ne@gmail.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2017-06-12 12:40:58 -04:00
James Almer	497a4b554c	x86/aacpsdsp: fix output of ff_ps_stereo_interpolate_ipdopd_sse3 The fate-aac-al_sbr_ps_04_ur test did not detect this mistake.	2017-06-07 13:53:51 -03:00
Ilia Valiakhmetov	73d9a9a6af	libavcodec/vp9: ipred_dl_32x32_16 avx2 implementation vp9_diag_downleft_32x32_8bpp_c: 580.2 vp9_diag_downleft_32x32_8bpp_sse2: 75.6 vp9_diag_downleft_32x32_8bpp_ssse3: 73.7 vp9_diag_downleft_32x32_8bpp_avx: 72.7 vp9_diag_downleft_32x32_10bpp_c: 1101.2 vp9_diag_downleft_32x32_10bpp_sse2: 145.4 vp9_diag_downleft_32x32_10bpp_ssse3: 137.5 vp9_diag_downleft_32x32_10bpp_avx: 134.8 vp9_diag_downleft_32x32_10bpp_avx2: 94.0 vp9_diag_downleft_32x32_12bpp_c: 1108.5 vp9_diag_downleft_32x32_12bpp_sse2: 145.5 vp9_diag_downleft_32x32_12bpp_ssse3: 137.3 vp9_diag_downleft_32x32_12bpp_avx: 135.2 vp9_diag_downleft_32x32_12bpp_avx2: 94.0 ~30% faster than avx implementation Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2017-06-06 08:05:03 -04:00
James Almer	933dd62288	x86/aacpsdsp: optimize ff_ps_mul_pair_single_sse ~2% faster.	2017-06-04 23:29:56 -03:00
James Almer	be3809a521	x86/aacpsdsp: optimize ff_ps_stereo_interpolate_sse3 Move the unpacking outside of the loop. 5% to 10% faster. Suggested-by: ubitux Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-03 12:39:43 -03:00
James Almer	b5a0971ff0	x86/aacps: add ff_ps_stereo_interpolate_ipdopd_sse3() About 2x faster than the c version. Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-02 11:06:24 -03:00
James Darnley	0dea0114fb	avcodec/x86/idctdsp_init: reindent	2017-05-30 13:20:44 +02:00
James Darnley	8e89f6fd37	avcodec/x86: move simple_idct to external assembly	2017-05-30 13:20:42 +02:00
Clément Bœsch	584366a436	lavc/mpegvideoenc: reformat inv_zigzag_direct16 so the zigzag pattern is visible	2017-05-19 11:17:58 +02:00
Clément Bœsch	19bb2cade5	Merge commit '`b4a911c189`' * commit '`b4a911c189`': mpegvideoenc: make a table const Merged-by: Clément Bœsch <u@pkh.me>	2017-05-19 11:15:16 +02:00
James Darnley	7aa90b4e94	avcodec/h264: add sse2 versions of previous idct functions Kaby Lake Pentium: - ff_h264_idct_add_8_sse2: ~1.18x faster than mmxext - ff_h264_idct_dc_add_8_sse2: ~1.07x faster than mmxext	2017-05-15 15:00:20 +02:00
James Darnley	27460dfebc	avcodec/h264: add avx 8-bit h264_idct_dc_add Haswell: - 1.02x faster (405±0.7 vs. 397±0.8 decicycles) compared with mmxext Skylake-U: - 1.06x faster (498±1.8 vs. 470±1.3 decicycles) compared with mmxext	2017-05-15 15:00:19 +02:00
James Darnley	f61d454ca1	avcodec/h264: add avx 8-bit h264_idct_add Haswell: - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext Skylake-U: - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext	2017-05-15 15:00:17 +02:00
James Darnley	b5325c6711	avcodec/h264: use some 3 operand forms	2017-05-15 15:00:16 +02:00
James Darnley	060ba9e5e3	avcodec/h264: change RETs into REP_RETs where appropriate	2017-05-15 15:00:15 +02:00
Michael Niedermayer	fa8fd0808f	avcodec/x86/vc1dsp_init: Fix build failure with --disable-optimizations and clang compilers doing DCE at -O0 do not necessarily understand "complex" boolean expressions Build succeeds with this change, this was the only failure Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-04-27 04:25:31 +02:00
Clément Bœsch	5be1440c74	Merge commit '`0a35f128f3`' * commit '`0a35f128f3`': cabac: x86: Give optimizations header a more meaningful name Merged-by: Clément Bœsch <u@pkh.me>	2017-04-08 14:30:13 +02:00
Ronald S. Bultje	83ae7e6350	x86/idctdsp_init: reindent.	2017-04-06 10:03:28 -04:00
Ronald S. Bultje	e0c205677f	x86/simple_idct: add explicit sse2 simple_idct_put/add versions. These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations. This way we don't need to use the ff_put/add_pixels_clamped function pointers.	2017-04-06 10:03:28 -04:00
Ronald S. Bultje	2f0591cfa3	cavs: add a sse2 idct implementation. This makes using the function pointer ff_add_pixels_clamped() unnecessary, since we always know what the best implementation is at compile-time.	2017-04-06 10:03:28 -04:00
Ronald S. Bultje	c9d98c5649	cavs: convert idct from inline asm to yasm.	2017-04-06 10:03:27 -04:00
Ronald S. Bultje	b51d7d89f8	x86/xvididct: remove use of ff_put/add_pixels_clamped function pointer. Since there's separate SSE2 implementations of xvid_idct_put/add, this patch has no practical impact on performance.	2017-04-06 10:03:27 -04:00
James Almer	6171f178e7	x86/hevc_add_res: merge last remaining changes from `3d65359832` See https://lists.libav.org/pipermail/libav-devel/2016-October/079829.html	2017-03-31 20:49:45 -03:00
Clément Bœsch	1ea0df14c3	Merge commit '`0361e4dcb4`' * commit '`0361e4dcb4`': h264_qpel: x86: Move function with only one instance out of template macro Note: warning is present with clang. Merged-by: Clément Bœsch <cboesch@gopro.com>	2017-03-31 09:44:04 +02:00
Ronald S. Bultje	f8c019944d	vp9: re-split the decoder/format/dsp interface header files. The advantage here is that the internal software decoder interface is not exposed to the DSP functions or the hardware accelerations.	2017-03-28 18:04:26 -04:00
Clément Bœsch	1c9f4b5078	lavc/vp9: split into vp9{block,data,mvs} This is following Libav layout to ease merges.	2017-03-27 21:38:21 +02:00
Michael Niedermayer	73fb40dc87	avcodec/x86/idctdsp: Remove duplicate include Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-03-26 19:17:30 +02:00
James Almer	ac42f08099	x86/hevc_add_res: merge missing changes from `3d65359832` Unrolling the loops triplicates the size of the assembled output while not generating any gain in performance.	2017-03-24 11:24:18 -03:00
Clément Bœsch	3d65359832	Merge commit '`6d5636ad9a`' * commit '`6d5636ad9a`': hevc: x86: Add add_residual() SIMD optimizations See `a6af4bf64d` This merge is only cosmetics (renames, space shuffling, etc). The functionnal changes in the ASM are not merged: - unrolling with %rep is kept - ADD_RES_MMX_4_8 is left untouched: this needs investigation Merged-by: Clément Bœsch <u@pkh.me>	2017-03-24 12:33:25 +01:00
Clément Bœsch	40ac226014	lavc/x86/hevc: rename hevc_res_add to hevc_add_res This will simplify incoming merge.	2017-03-24 11:45:23 +01:00
James Almer	bac44a5020	Merge commit '`b89804da9b`' * commit '`b89804da9b`': x86: videodsp: Add parentheses to expression to work around warning Merged-by: James Almer <jamrial@gmail.com>	2017-03-23 18:35:49 -03:00
James Almer	29db87af52	Merge commit '`6be7944ee2`' * commit '`6be7944ee2`': x86: Add missing colons after assembly labels Merged-by: James Almer <jamrial@gmail.com>	2017-03-23 18:05:27 -03:00

1 2 3 4 5 ...

2355 commits