Diego Biurrun
fd502f4f5f
build: Generalize yasm/nasm-related variable names
...
None of them are specific to the YASM assembler.
(Cherry-picked from libav commit 39e208f4d4 )
Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-21 17:00:29 -03:00
James Darnley
7aa90b4e94
avcodec/h264: add sse2 versions of previous idct functions
...
Kaby Lake Pentium:
- ff_h264_idct_add_8_sse2: ~1.18x faster than mmxext
- ff_h264_idct_dc_add_8_sse2: ~1.07x faster than mmxext
2017-05-15 15:00:20 +02:00
James Darnley
27460dfebc
avcodec/h264: add avx 8-bit h264_idct_dc_add
...
Haswell:
- 1.02x faster (405±0.7 vs. 397±0.8 decicycles) compared with mmxext
Skylake-U:
- 1.06x faster (498±1.8 vs. 470±1.3 decicycles) compared with mmxext
2017-05-15 15:00:19 +02:00
James Darnley
f61d454ca1
avcodec/h264: add avx 8-bit h264_idct_add
...
Haswell:
- 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext
Skylake-U:
- 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext
2017-05-15 15:00:17 +02:00
James Darnley
33de0fee2c
avcodec/h264: enable sse2 chroma deblock/loop filter functions
...
Between 1.00 and 1.16 times faster on Intel Yorkfield Core 2 Quad.
Between 1.11 and 1.39 times faster on Intel Kaby Lake Pentium.
2017-02-27 13:22:06 +01:00
James Darnley
cd893b9307
avcodec/h264: add avx 8-bit 4:2:2 chroma h intra deblock/loop filter
...
~1.37x faster (147 vs. 108 cycles) compared to mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
0e16b3e2be
avcodec/h264: add avx 8-bit 4:2:0 chroma h intra deblock/loop filter
...
~1.10x faster (69 vs. 63 cycles) compared to mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
987ffe4b8d
avcodec/h264: add avx 8-bit chroma v intra deblock/loop filter
...
~1.14x faster (90 vs 78 cycles) compared with mmxext
2017-02-27 13:22:06 +01:00
James Darnley
88307b3eec
avcodec/h264: add avx 8-bit 4:2:2 chroma h deblock/loop filter
...
~1.21x faster (68 vs. 56 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
ac096fc82d
avcodec/h264: add avx 8-bit 4:2:0 chroma h deblock/loop filter
...
~1.14x faster (93 vs. 81 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
5c56758843
avcodec/h264: add avx 8-bit chroma v deblock/loop filter
...
~1.24x faster (101 vs. 81 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
5336887867
avcodec/h264: sse2, avx h luma mbaff deblock/loop filter
...
x86-64 only
Yorkfield:
- sse2: ~2.17x (434 vs. 200 cycles)
Nehalem:
- sse2: ~2.94x (409 vs. 139 cycles)
Skylake:
- sse2: ~3.10x (370 vs. 119 cycles)
- avx: ~3.29x (370 vs. 112 cycles)
2017-02-18 20:26:52 +01:00
James Darnley
728651df06
avcodec/h264: mmx2, sse2, avx 10-bit 4:2:2 h chroma deblock/loop filter
...
Yorkfield:
- mmx2: 2.53x (504 vs. 199 cycles)
- sse2: 3.83x (504 vs. 131 cycles)
Nehalem:
- mmx2: 2.42x (365 vs. 151 cycles)
- sse2: 3.56x (365 vs. 103 cycles)
Skylake:
- mmx2: 1.81x (308 vs. 170 cycles)
- sse2: 2.84x (308 vs. 108 cycles)
- avx: 2.93x (308 vs. 105 cycles)
2016-12-07 00:29:13 +01:00
James Darnley
add21d0bb3
avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter
...
Yorkfield:
- mmx2: 2.45x (279 vs. 114 cycles)
- sse2: 3.36x (279 vs. 83 cycles)
Nehalem:
- mmx2: 2.10x (192 vs. 92 cycles)
- sse2: 2.84x (192 vs. 68 cycles)
Skylake:
- mmx2: 1.75x (170 vs. 97 cycles)
- sse2: 2.47x (170 vs. 69 cycles)
- avx: 2.47x (170 vs. 69 cycles)
2016-12-07 00:29:13 +01:00
James Darnley
58ca2ef62e
whitespace changes after last commit
2016-12-07 00:29:13 +01:00
James Darnley
f33714a694
avcodec/h264: clean up and expand x86 function definitions
2016-12-07 00:29:13 +01:00
James Darnley
13d71c28cc
avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions
...
Yorkfield:
- sse2:
- complex: 4.13x faster (1514 vs. 367 cycles)
- simple: 4.38x faster (1836 vs. 419 cycles)
Skylake:
- sse2:
- complex: 3.61x faster ( 936 vs. 260 cycles)
- simple: 3.97x faster (1126 vs. 284 cycles)
- avx (versus sse2):
- complex: 1.07x faster (260 vs. 244 cycles)
- simple: 1.03x faster (284 vs. 274 cycles)
2016-11-30 22:58:28 +01:00
James Darnley
1dae7ffa0b
avcodec/h264: mmx 4:2:2 idct add8 function
...
2.87 times faster (1830 vs. 638 cycles)
2016-11-30 22:58:27 +01:00
James Darnley
815ea8c6cc
avcodec/h264: mmxext 4:2:2 chroma intra deblock/loop filter
...
2.1 times faster (401 vs. 194 cycles)
2016-11-30 22:58:27 +01:00
Michael Niedermayer
bc26fe8927
avcodec/h264: Use ptrdiff_t for (bi)weight functions
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-09-23 04:10:44 +02:00
James Darnley
7042a55c55
avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter
...
2.6 times faster (366 vs. 142 cycles)
2016-02-05 17:26:04 +01:00
Michael Niedermayer
11ba0c8207
Merge commit ' 5ab03e41e5'
...
* commit '5ab03e41e5 ':
x86: h264dsp: Fix link failure with optimizations disabled
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-26 02:58:59 +02:00
Diego Biurrun
5ab03e41e5
x86: h264dsp: Fix link failure with optimizations disabled
...
With optimzations disabled compilers have trouble doing dead code
elimination on 'if (foo && 0)' expressions, while 'if (0 && foo)'
still works, so use the latter to avoid problems.
Bug-Id: 707
2014-06-25 15:24:51 -07:00
Michael Niedermayer
874f27a8f7
Merge commit ' b42f49e42f'
...
* commit 'b42f49e42f ':
x86: dsputil: Eliminate some unnecessary dsputil_x86.h #includes
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-05 19:05:00 +02:00
Diego Biurrun
b42f49e42f
x86: dsputil: Eliminate some unnecessary dsputil_x86.h #includes
2014-04-04 19:08:05 +02:00
Michael Niedermayer
30056fd0be
Merge commit ' a03a642d5c'
...
* commit 'a03a642d5c ':
h264: do not use 422 functions for monochrome
See: 07abf13da4
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-06 16:51:23 +01:00
Anton Khirnov
a03a642d5c
h264: do not use 422 functions for monochrome
...
Fixes invalid memory access.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC:libav-stable@libav.org
2014-01-06 08:25:36 +01:00
Michael Niedermayer
62a6052974
Merge commit ' e998b56362'
...
* commit 'e998b56362 ':
x86: avcodec: Consistently structure CPU extension initialization
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-30 12:50:01 +02:00
Diego Biurrun
e998b56362
x86: avcodec: Consistently structure CPU extension initialization
2013-08-29 13:07:37 +02:00
Michael Niedermayer
9d01bf7d66
Merge remote-tracking branch 'qatar/master'
...
* qatar/master:
Consistently use "cpu_flags" as variable/parameter name for CPU flags
Conflicts:
libavcodec/x86/dsputil_init.c
libavcodec/x86/h264dsp_init.c
libavcodec/x86/hpeldsp_init.c
libavcodec/x86/motion_est.c
libavcodec/x86/mpegvideo.c
libavcodec/x86/proresdsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-18 09:53:47 +02:00
Diego Biurrun
3ac7fa81b2
Consistently use "cpu_flags" as variable/parameter name for CPU flags
2013-07-18 00:31:35 +02:00
Michael Niedermayer
a887372109
Merge commit ' 1399931d07'
...
* commit '1399931d07 ':
x86: dsputil: Rename dsputil_mmx.h --> dsputil_x86.h
Conflicts:
libavcodec/x86/dsputil_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-05-14 12:12:20 +02:00
Diego Biurrun
1399931d07
x86: dsputil: Rename dsputil_mmx.h --> dsputil_x86.h
...
The header is not (anymore) MMX-specific.
2013-05-12 22:28:07 +02:00
Michael Niedermayer
dbcf7e9ef7
Merge commit ' 7f75f2f2bd'
...
* commit '7f75f2f2bd ':
ppc: Drop unnecessary ff_ name prefixes from static functions
x86: Drop unnecessary ff_ name prefixes from static functions
arm: Drop unnecessary ff_ name prefixes from static functions
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-05-01 18:21:35 +02:00
Diego Biurrun
f2e9d44a57
x86: Drop unnecessary ff_ name prefixes from static functions
2013-04-30 16:02:03 +02:00
Michael Niedermayer
6c38884876
Merge commit ' 620289a20e'
...
* commit '620289a20e ':
sh4: Fix silly type vs. variable name search and replace typo
configure: Group all hwaccels together in a separate variable
Add av_cold attributes to arch-specific init functions
Conflicts:
configure
libavcodec/arm/mpegvideo_armv5te.c
libavcodec/x86/mlpdsp.c
libavcodec/x86/motion_est.c
libavcodec/x86/mpegvideoenc.c
libavcodec/x86/videodsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-06 13:27:24 +01:00
Diego Biurrun
c9f933b5b6
Add av_cold attributes to arch-specific init functions
2013-02-05 17:01:05 +01:00
Michael Niedermayer
ac8987591f
Merge commit ' 88bd7fdc82'
...
* commit '88bd7fdc82 ':
Drop DCTELEM typedef
Conflicts:
libavcodec/alpha/dsputil_alpha.h
libavcodec/alpha/motion_est_alpha.c
libavcodec/arm/dsputil_init_armv6.c
libavcodec/bfin/dsputil_bfin.h
libavcodec/bfin/pixels_bfin.S
libavcodec/cavs.c
libavcodec/cavsdec.c
libavcodec/dct-test.c
libavcodec/dnxhdenc.c
libavcodec/dsputil.c
libavcodec/dsputil.h
libavcodec/dsputil_template.c
libavcodec/eamad.c
libavcodec/h264_cavlc.c
libavcodec/h264idct_template.c
libavcodec/mpeg12.c
libavcodec/mpegvideo.c
libavcodec/mpegvideo.h
libavcodec/mpegvideo_enc.c
libavcodec/ppc/dsputil_altivec.c
libavcodec/proresdsp.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-23 17:44:56 +01:00
Diego Biurrun
88bd7fdc82
Drop DCTELEM typedef
...
It does not help as an abstraction and adds dsputil dependencies.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2013-01-22 18:32:56 -08:00
Ronald S. Bultje
ce58642ed0
x86inc: support stack mem allocation and re-alignment in PROLOGUE.
...
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-12 10:37:52 +01:00
Ronald S. Bultje
6f40e9f070
x86inc: support stack mem allocation and re-alignment in PROLOGUE
...
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-12 05:23:46 +01:00
Michael Niedermayer
7dc0ed80e8
Merge commit ' 1f3f896564'
...
* commit '1f3f896564 ':
fate: Add dependencies for Vorbis, ProRes, QTRLE, utvideo tests
fate: real: Add dependencies
fate: lossless-audio: Add dependencies
x86: h264dsp: Fix linking with yasm and optimizations disabled
Conflicts:
libavcodec/x86/h264dsp_init.c
tests/fate/lossless-audio.mak
tests/fate/real.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-29 13:35:56 +01:00
Diego Biurrun
89145fbbfe
x86: h264dsp: Fix linking with yasm and optimizations disabled
...
Some optimized functions reference optimized symbols, so the functions
must be explicitly disabled when those symbols are unavailable.
2012-11-28 14:45:28 +01:00
Michael Niedermayer
a1b5c9634e
Merge remote-tracking branch 'qatar/master'
...
* qatar/master:
x86: mmx2 ---> mmxext in asm constructs
Conflicts:
libavcodec/x86/h264_chromamc_10bit.asm
libavcodec/x86/h264_deblock.asm
libavcodec/x86/h264dsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-14 12:34:30 +01:00
Diego Biurrun
26301caaa1
x86: mmx2 ---> mmxext in asm constructs
2012-11-14 00:58:51 +01:00
Michael Niedermayer
add7513e64
Merge commit ' fa8fcab1e0'
...
* commit 'fa8fcab1e0 ':
x86: h264_chromamc_10bit: drop pointless PAVG %define
x86: mmx2 ---> mmxext in function names
swscale: do not forget to swap data in formats with different endianness
Conflicts:
libavcodec/x86/dsputil_mmx.c
libavfilter/x86/gradfun.c
libswscale/input.c
libswscale/utils.c
libswscale/x86/swscale.c
tests/ref/lavfi/pixfmts_scale
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-01 13:11:51 +01:00
Diego Biurrun
d8eda37080
x86: mmx2 ---> mmxext in function names
2012-10-31 17:53:57 +01:00
Michael Niedermayer
6add8eb2ce
x86/h264dsp_init: put a HAVE_YASM back
...
Should fix compilation on open solaris
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-09-09 17:21:02 +02:00
Michael Niedermayer
77aedc77ab
Merge remote-tracking branch 'qatar/master'
...
* qatar/master:
swscale: Provide the right alignment for external mmx asm
x86: Replace checks for CPU extensions and flags by convenience macros
configure: msvc: fix/simplify setting of flags for hostcc
x86: mlpdsp: mlp_filter_channel_x86 requires inline asm
Conflicts:
libavcodec/x86/fft_init.c
libavcodec/x86/h264_intrapred_init.c
libavcodec/x86/h264dsp_init.c
libavcodec/x86/mpegaudiodec.c
libavcodec/x86/proresdsp_init.c
libavutil/x86/float_dsp_init.c
libswscale/utils.c
libswscale/x86/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-09-09 13:27:42 +02:00
Diego Biurrun
e0c6cce447
x86: Replace checks for CPU extensions and flags by convenience macros
...
This separates code relying on inline from that relying on external
assembly and fixes instances where the coalesced check was incorrect.
2012-09-08 18:18:34 +02:00