Commit graph

59 commits

Author SHA1 Message Date
Andreas Rheinhardt
afc95a10ac avcodec/h264dsp, h264idct: Fix lengths of array parameters
Fixes many -Warray-parameter warnings from GCC 11.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-08-08 17:44:57 +02:00
James Almer
d5d699ab6e avcodec/h264dsp: change loop filter stride argument to ptrdiff_t 2019-02-20 15:27:43 -03:00
Diego Biurrun
fd502f4f5f build: Generalize yasm/nasm-related variable names
None of them are specific to the YASM assembler.

(Cherry-picked from libav commit 39e208f4d4)

Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-21 17:00:29 -03:00
James Darnley
7aa90b4e94 avcodec/h264: add sse2 versions of previous idct functions
Kaby Lake Pentium:
 - ff_h264_idct_add_8_sse2:    ~1.18x faster than mmxext
 - ff_h264_idct_dc_add_8_sse2: ~1.07x faster than mmxext
2017-05-15 15:00:20 +02:00
James Darnley
27460dfebc avcodec/h264: add avx 8-bit h264_idct_dc_add
Haswell:
 - 1.02x faster (405±0.7 vs. 397±0.8 decicycles) compared with mmxext

Skylake-U:
 - 1.06x faster (498±1.8 vs. 470±1.3 decicycles) compared with mmxext
2017-05-15 15:00:19 +02:00
James Darnley
f61d454ca1 avcodec/h264: add avx 8-bit h264_idct_add
Haswell:
 - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext

Skylake-U:
 - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext
2017-05-15 15:00:17 +02:00
James Darnley
33de0fee2c avcodec/h264: enable sse2 chroma deblock/loop filter functions
Between 1.00 and 1.16 times faster on Intel Yorkfield Core 2 Quad.
Between 1.11 and 1.39 times faster on Intel Kaby Lake Pentium.
2017-02-27 13:22:06 +01:00
James Darnley
cd893b9307 avcodec/h264: add avx 8-bit 4:2:2 chroma h intra deblock/loop filter
~1.37x faster (147 vs. 108 cycles) compared to mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
0e16b3e2be avcodec/h264: add avx 8-bit 4:2:0 chroma h intra deblock/loop filter
~1.10x faster (69 vs. 63 cycles) compared to mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
987ffe4b8d avcodec/h264: add avx 8-bit chroma v intra deblock/loop filter
~1.14x faster (90 vs 78 cycles) compared with mmxext
2017-02-27 13:22:06 +01:00
James Darnley
88307b3eec avcodec/h264: add avx 8-bit 4:2:2 chroma h deblock/loop filter
~1.21x faster (68 vs. 56 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
ac096fc82d avcodec/h264: add avx 8-bit 4:2:0 chroma h deblock/loop filter
~1.14x faster (93 vs. 81 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
5c56758843 avcodec/h264: add avx 8-bit chroma v deblock/loop filter
~1.24x faster (101 vs. 81 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
5336887867 avcodec/h264: sse2, avx h luma mbaff deblock/loop filter
x86-64 only

Yorkfield:
- sse2: ~2.17x (434 vs. 200 cycles)

Nehalem:
- sse2: ~2.94x (409 vs. 139 cycles)

Skylake:
- sse2: ~3.10x (370 vs. 119 cycles)
- avx:  ~3.29x (370 vs. 112 cycles)
2017-02-18 20:26:52 +01:00
James Darnley
728651df06 avcodec/h264: mmx2, sse2, avx 10-bit 4:2:2 h chroma deblock/loop filter
Yorkfield:
 - mmx2: 2.53x (504 vs. 199 cycles)
 - sse2: 3.83x (504 vs. 131 cycles)

Nehalem:
 - mmx2: 2.42x (365 vs. 151 cycles)
 - sse2: 3.56x (365 vs. 103 cycles)

Skylake:
 - mmx2: 1.81x (308 vs. 170 cycles)
 - sse2: 2.84x (308 vs. 108 cycles)
 - avx:  2.93x (308 vs. 105 cycles)
2016-12-07 00:29:13 +01:00
James Darnley
add21d0bb3 avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter
Yorkfield:
 - mmx2: 2.45x (279 vs. 114 cycles)
 - sse2: 3.36x (279 vs.  83 cycles)

Nehalem:
 - mmx2: 2.10x (192 vs.  92 cycles)
 - sse2: 2.84x (192 vs.  68 cycles)

Skylake:
 - mmx2: 1.75x (170 vs.  97 cycles)
 - sse2: 2.47x (170 vs.  69 cycles)
 - avx:  2.47x (170 vs.  69 cycles)
2016-12-07 00:29:13 +01:00
James Darnley
58ca2ef62e whitespace changes after last commit 2016-12-07 00:29:13 +01:00
James Darnley
f33714a694 avcodec/h264: clean up and expand x86 function definitions 2016-12-07 00:29:13 +01:00
James Darnley
13d71c28cc avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions
Yorkfield:
 - sse2:
   - complex: 4.13x faster (1514 vs. 367 cycles)
   - simple:  4.38x faster (1836 vs. 419 cycles)

Skylake:
 - sse2:
   - complex: 3.61x faster ( 936 vs. 260 cycles)
   - simple:  3.97x faster (1126 vs. 284 cycles)
 - avx (versus sse2):
   - complex: 1.07x faster (260 vs. 244 cycles)
   - simple:  1.03x faster (284 vs. 274 cycles)
2016-11-30 22:58:28 +01:00
James Darnley
1dae7ffa0b avcodec/h264: mmx 4:2:2 idct add8 function
2.87 times faster (1830 vs. 638 cycles)
2016-11-30 22:58:27 +01:00
James Darnley
815ea8c6cc avcodec/h264: mmxext 4:2:2 chroma intra deblock/loop filter
2.1 times faster (401 vs. 194 cycles)
2016-11-30 22:58:27 +01:00
Michael Niedermayer
bc26fe8927 avcodec/h264: Use ptrdiff_t for (bi)weight functions
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-09-23 04:10:44 +02:00
James Darnley
7042a55c55 avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter
2.6 times faster (366 vs. 142 cycles)
2016-02-05 17:26:04 +01:00
Michael Niedermayer
11ba0c8207 Merge commit '5ab03e41e5'
* commit '5ab03e41e5':
  x86: h264dsp: Fix link failure with optimizations disabled

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-26 02:58:59 +02:00
Diego Biurrun
5ab03e41e5 x86: h264dsp: Fix link failure with optimizations disabled
With optimzations disabled compilers have trouble doing dead code
elimination on 'if (foo && 0)' expressions, while 'if (0 && foo)'
still works, so use the latter to avoid problems.

Bug-Id: 707
2014-06-25 15:24:51 -07:00
Michael Niedermayer
874f27a8f7 Merge commit 'b42f49e42f'
* commit 'b42f49e42f':
  x86: dsputil: Eliminate some unnecessary dsputil_x86.h #includes

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-05 19:05:00 +02:00
Diego Biurrun
b42f49e42f x86: dsputil: Eliminate some unnecessary dsputil_x86.h #includes 2014-04-04 19:08:05 +02:00
Michael Niedermayer
30056fd0be Merge commit 'a03a642d5c'
* commit 'a03a642d5c':
  h264: do not use 422 functions for monochrome

See: 07abf13da4
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-06 16:51:23 +01:00
Anton Khirnov
a03a642d5c h264: do not use 422 functions for monochrome
Fixes invalid memory access.

Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC:libav-stable@libav.org
2014-01-06 08:25:36 +01:00
Michael Niedermayer
62a6052974 Merge commit 'e998b56362'
* commit 'e998b56362':
  x86: avcodec: Consistently structure CPU extension initialization

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-30 12:50:01 +02:00
Diego Biurrun
e998b56362 x86: avcodec: Consistently structure CPU extension initialization 2013-08-29 13:07:37 +02:00
Michael Niedermayer
9d01bf7d66 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  Consistently use "cpu_flags" as variable/parameter name for CPU flags

Conflicts:
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/h264dsp_init.c
	libavcodec/x86/hpeldsp_init.c
	libavcodec/x86/motion_est.c
	libavcodec/x86/mpegvideo.c
	libavcodec/x86/proresdsp_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-18 09:53:47 +02:00
Diego Biurrun
3ac7fa81b2 Consistently use "cpu_flags" as variable/parameter name for CPU flags 2013-07-18 00:31:35 +02:00
Michael Niedermayer
a887372109 Merge commit '1399931d07'
* commit '1399931d07':
  x86: dsputil: Rename dsputil_mmx.h --> dsputil_x86.h

Conflicts:
	libavcodec/x86/dsputil_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-05-14 12:12:20 +02:00
Diego Biurrun
1399931d07 x86: dsputil: Rename dsputil_mmx.h --> dsputil_x86.h
The header is not (anymore) MMX-specific.
2013-05-12 22:28:07 +02:00
Michael Niedermayer
dbcf7e9ef7 Merge commit '7f75f2f2bd'
* commit '7f75f2f2bd':
  ppc: Drop unnecessary ff_ name prefixes from static functions
  x86: Drop unnecessary ff_ name prefixes from static functions
  arm: Drop unnecessary ff_ name prefixes from static functions

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-05-01 18:21:35 +02:00
Diego Biurrun
f2e9d44a57 x86: Drop unnecessary ff_ name prefixes from static functions 2013-04-30 16:02:03 +02:00
Michael Niedermayer
6c38884876 Merge commit '620289a20e'
* commit '620289a20e':
  sh4: Fix silly type vs. variable name search and replace typo
  configure: Group all hwaccels together in a separate variable
  Add av_cold attributes to arch-specific init functions

Conflicts:
	configure
	libavcodec/arm/mpegvideo_armv5te.c
	libavcodec/x86/mlpdsp.c
	libavcodec/x86/motion_est.c
	libavcodec/x86/mpegvideoenc.c
	libavcodec/x86/videodsp_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-06 13:27:24 +01:00
Diego Biurrun
c9f933b5b6 Add av_cold attributes to arch-specific init functions 2013-02-05 17:01:05 +01:00
Michael Niedermayer
ac8987591f Merge commit '88bd7fdc82'
* commit '88bd7fdc82':
  Drop DCTELEM typedef

Conflicts:
	libavcodec/alpha/dsputil_alpha.h
	libavcodec/alpha/motion_est_alpha.c
	libavcodec/arm/dsputil_init_armv6.c
	libavcodec/bfin/dsputil_bfin.h
	libavcodec/bfin/pixels_bfin.S
	libavcodec/cavs.c
	libavcodec/cavsdec.c
	libavcodec/dct-test.c
	libavcodec/dnxhdenc.c
	libavcodec/dsputil.c
	libavcodec/dsputil.h
	libavcodec/dsputil_template.c
	libavcodec/eamad.c
	libavcodec/h264_cavlc.c
	libavcodec/h264idct_template.c
	libavcodec/mpeg12.c
	libavcodec/mpegvideo.c
	libavcodec/mpegvideo.h
	libavcodec/mpegvideo_enc.c
	libavcodec/ppc/dsputil_altivec.c
	libavcodec/proresdsp.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-23 17:44:56 +01:00
Diego Biurrun
88bd7fdc82 Drop DCTELEM typedef
It does not help as an abstraction and adds dsputil dependencies.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2013-01-22 18:32:56 -08:00
Ronald S. Bultje
ce58642ed0 x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-12 10:37:52 +01:00
Ronald S. Bultje
6f40e9f070 x86inc: support stack mem allocation and re-alignment in PROLOGUE
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-12 05:23:46 +01:00
Michael Niedermayer
7dc0ed80e8 Merge commit '1f3f896564'
* commit '1f3f896564':
  fate: Add dependencies for Vorbis, ProRes, QTRLE, utvideo tests
  fate: real: Add dependencies
  fate: lossless-audio: Add dependencies
  x86: h264dsp: Fix linking with yasm and optimizations disabled

Conflicts:
	libavcodec/x86/h264dsp_init.c
	tests/fate/lossless-audio.mak
	tests/fate/real.mak

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-29 13:35:56 +01:00
Diego Biurrun
89145fbbfe x86: h264dsp: Fix linking with yasm and optimizations disabled
Some optimized functions reference optimized symbols, so the functions
must be explicitly disabled when those symbols are unavailable.
2012-11-28 14:45:28 +01:00
Michael Niedermayer
a1b5c9634e Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: mmx2 ---> mmxext in asm constructs

Conflicts:
	libavcodec/x86/h264_chromamc_10bit.asm
	libavcodec/x86/h264_deblock.asm
	libavcodec/x86/h264dsp_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-14 12:34:30 +01:00
Diego Biurrun
26301caaa1 x86: mmx2 ---> mmxext in asm constructs 2012-11-14 00:58:51 +01:00
Michael Niedermayer
add7513e64 Merge commit 'fa8fcab1e0'
* commit 'fa8fcab1e0':
  x86: h264_chromamc_10bit: drop pointless PAVG %define
  x86: mmx2 ---> mmxext in function names
  swscale: do not forget to swap data in formats with different endianness

Conflicts:
	libavcodec/x86/dsputil_mmx.c
	libavfilter/x86/gradfun.c
	libswscale/input.c
	libswscale/utils.c
	libswscale/x86/swscale.c
	tests/ref/lavfi/pixfmts_scale

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-01 13:11:51 +01:00
Diego Biurrun
d8eda37080 x86: mmx2 ---> mmxext in function names 2012-10-31 17:53:57 +01:00
Michael Niedermayer
6add8eb2ce x86/h264dsp_init: put a HAVE_YASM back
Should fix compilation on open solaris

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-09-09 17:21:02 +02:00