Commit graph

49 commits

Author SHA1 Message Date
Anton Khirnov
c8c2dfbc37 lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h
That is a more appropriate place for it.
2021-01-01 14:11:01 +01:00
Clément Bœsch
3d65359832 Merge commit '6d5636ad9a'
* commit '6d5636ad9a':
  hevc: x86: Add add_residual() SIMD optimizations

See a6af4bf64d

This merge is only cosmetics (renames, space shuffling, etc).

The functionnal changes in the ASM are *not* merged:
- unrolling with %rep is kept
- ADD_RES_MMX_4_8 is left untouched: this needs investigation

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-24 12:33:25 +01:00
Clément Bœsch
947230837c Merge commit '112cee0241'
* commit '112cee0241':
  hevc: Add SSE2 and AVX IDCT

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 15:58:46 +01:00
Clément Bœsch
7c300a8ed4 lavc/hevc: remove a few random spaces to reduce diff with libav 2017-01-31 17:02:24 +01:00
Clément Bœsch
78d16eb452 Merge commit 'fca3c3b619'
* commit 'fca3c3b619':
  hevc: Add AVX2 DC IDCT

Mostly noop as we already have that code.

In the ASM, code is merged with the exception of SECTION which is kept
uppercase for consistency with the rest of the codebase.

Still in the ASM, the prototype comment is fixed to honor the '_' added
from the original commit.

idct_dc_proto() is dropped as it's not used anymore here.

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-01-31 16:53:37 +01:00
Clément Bœsch
d0e132bab6 Merge commit '1bd890ad17'
* commit '1bd890ad17':
  hevc: Separate adding residual to prediction from IDCT

This commit should be a noop but isn't because of the following renames:

- transform_add  → add_residual
- transform_skip → dequant
- idct_4x4_luma  → transform_4x4_luma

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-01-31 15:31:34 +01:00
Pierre Edouard Lepere
6d5636ad9a hevc: x86: Add add_residual() SIMD optimizations
Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>,
extended by James Almer <jamrial@gmail.com>.

Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
2016-10-22 17:33:35 +02:00
Alexandra Hájková
112cee0241 hevc: Add SSE2 and AVX IDCT
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-11 18:21:04 +02:00
James Almer
fca3c3b619 hevc: Add AVX2 DC IDCT
Originally written by Pierre Edouard Lepere <pierre-edouard.lepere@insa-rennes.fr>.
Integrated to Libav by Josh de Kock <josh@itanimul.li>.

Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
2016-07-18 15:27:13 +02:00
Diego Biurrun
257b30af8e x86: hevc: Fix linking with both yasm and optimizations disabled
Some optimized functions reference optimized symbols, so the functions
must be explicitly disabled when those symbols are unavailable.
2016-02-23 11:47:54 +01:00
James Almer
70d685a77f x86: use the new helper macros where useful
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-14 20:00:21 -03:00
James Almer
d4c47333e1 x86/hevc_sao: add ff_hevc_sao_edge_filter_{8,16}_{10,12}
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-12-20 17:01:15 -03:00
Anton Khirnov
e7078e842d hevcdsp: add x86 SIMD for MC 2015-12-05 21:11:52 +01:00
Ganesh Ajjanagadde
38f4e973ef all: fix -Wextra-semi reported on clang
This fixes extra semicolons that clang 3.7 on GNU/Linux warns about.
These were trigggered when built under -Wpedantic, which essentially
checks for strict ISO compliance in numerous ways.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-10-24 17:58:17 -04:00
Christophe Gisquet
b533949813 x86: hevc: remove a parameter to WP internals
The second stride is always the internal buffer one, MAX_PB_SIZE (times 2 to
get the value in bytes).

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-14 17:22:50 +01:00
James Almer
14b44c1614 x86/hevc_sao: make sao_edge_filter_{10,12} work on x86_32
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-12 13:21:30 -03:00
James Almer
06fe6dfe12 x86/hevc_sao: make sao_band_filter work on x86_32
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-09 20:41:21 -03:00
Christophe Gisquet
5eedd36df1 x86: hevc_mc: use epel_hv 16-wide function
The epel_hv functions were still relying on only epel_hv 8-wide
being the maximum width instanciated.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 17:37:56 +01:00
Pierre Edouard Lepere
a0d1300f71 x86: hevc_mc: add AVX2 optimizations
before
33304 decicycles in luma_bi_1, 523066 runs, 1222 skips
38138 decicycles in luma_bi_2, 523427 runs, 861 skips
13490 decicycles in luma_uni, 516138 runs, 8150 skips
after
20185 decicycles in luma_bi_1, 519970 runs, 4318 skips
24620 decicycles in luma_bi_2, 521024 runs, 3264 skips
10397 decicycles in luma_uni, 515715 runs, 8573 skips

Conflicts:
	libavcodec/x86/hevc_mc.asm
	libavcodec/x86/hevcdsp_init.c

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 17:20:47 +01:00
James Almer
15574c505b x86/hevcdsp: add ff_hevc_sao_edge_filter_{10,12}_{sse2,avx2}
Original x86 intrinsics code by Pierre-Edouard Lepere.
Yasm port, refactoring and optimizations by James Almer.

Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U

Width 32
342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips
29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 skips
13996 decicycles in ff_hevc_sao_edge_filter_32_10_avx2, 16381 runs, 3 skips

Width 64
581163 decicycles in sao_edge_filter_10, 8192 runs, 0 skips
59774 decicycles in ff_hevc_sao_edge_filter_64_10_ssse3, 8192 runs, 0 skips
28383 decicycles in ff_hevc_sao_edge_filter_64_10_avx2, 8191 runs, 1 skips

Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-05 15:02:33 -03:00
James Almer
042c1159fc x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3,avx2}
Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere.
Refactoring and optimizations by James Almer.

Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U

Width 32
158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips
5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips
2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips

Width 64
705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips
10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips

Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-05 15:02:27 -03:00
James Almer
fa3eccb4f9 x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2}
Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere.
10/12bit yasm ports, refactoring and optimizations by James Almer

Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U

width 32
40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips
8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips
7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips
4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips

width 64
136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips
28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips
26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips
14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips

Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-01 20:22:35 -03:00
James Almer
c3d2426cca x86/hevc_res_add: add ff_hevc_transform_add32_8_avx2
~20% faster than AVX.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-04 20:21:29 -03:00
Christophe Gisquet
3e892b2bcd x86: hevc_mc: split differently calls
In some cases, 2 or 3 calls are performed to functions for unusual
widths. Instead, perform 2 calls for different widths to split the
workload.

The 8+16 and 4+8 widths for respectively 8 and more than 8 bits can't
be processed that way without modifications: some calls use unaligned
buffers, and having branches to handle this was resulting in no
micro-benchmark benefit.

For block_w == 12 (around 1% of the pixels of the sequence):
Before:
12758 decicycles in epel_uni, 4093 runs, 3 skips
19389 decicycles in qpel_uni, 8187 runs, 5 skips
22699 decicycles in epel_bi, 32743 runs, 25 skips
34736 decicycles in qpel_bi, 32733 runs, 35 skips

After:
11929 decicycles in epel_uni, 4096 runs, 0 skips
18131 decicycles in qpel_uni, 8184 runs, 8 skips
20065 decicycles in epel_bi, 32750 runs, 18 skips
31458 decicycles in qpel_bi, 32753 runs, 15 skips

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-24 12:05:33 +02:00
Christophe Gisquet
dad7f15567 hevcdsp: remove more instances of compile-time-fixed parameters
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-22 15:22:42 +02:00
Christophe Gisquet
d4f44b66d3 hevcdsp: remove compilation-time-fixed parameter
The dststride parameter is always MAX_PB_SIZE.

Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-22 14:57:37 +02:00
James Almer
54ca4dd43b x86/hevc_res_add: refactor ff_hevc_transform_add{16,32}_8
* Reduced xmm register count to 7 (As such they are now enabled for x86_32).
* Removed four movdqa (affects the sse2 version only).
* pxor is now used to clear m0 only once.

~5% faster.

Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-08-21 15:01:33 -03:00
James Almer
76a99d467f x86/hecv_res_add: add ff_hevc_transform_add{8,16,32}_8_avx
~15% faster than sse2

Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-08-20 16:54:52 -03:00
Pierre Edouard Lepere
a6af4bf64d x86: hevc: adding transform_add
Reviewed-by: James Almer <jamrial@gmail.com>
Approved-by: Ronald S. Bultje
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-20 01:28:56 +02:00
James Almer
73c4f63ba5 x86/hevc_deblock: add add ff_hevc_[hv]_loop_filter_luma_{8, 10, 12}_avx
~5% faster than SSSE3

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-29 14:04:59 +02:00
James Almer
bfb3b2b7a6 x86/hevc_idct: add 12bit idct_dc
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-27 00:30:56 +02:00
Michael Niedermayer
d4a9e89b27 avcodec/x86/hevcdsp_init: make license header consistent
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-27 00:28:44 +02:00
Michael Niedermayer
706f81a2c2 Merge commit '1a880b2fb8'
* commit '1a880b2fb8':
  hevc: SSE2 and SSSE3 loop filters

Conflicts:
	libavcodec/hevcdsp.c
	libavcodec/hevcdsp.h
	libavcodec/x86/Makefile
	libavcodec/x86/hevc_deblock.asm
	libavcodec/x86/hevcdsp_init.c

See: de7b89fd43 and several others
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-27 00:20:48 +02:00
James Almer
1ace9573dc x86/hevc_idct: replace old and unused idct functions
Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial).

Benchmarks on an Intel Core i5-4200U:

idct8x8_dc
       SSE2   MMXEXT  C
cycles 22     26      57

idct16x16_dc
       AVX2   SSE2    C
cycles 27     32      249

idct32x32_dc
       AVX2   SSE2    C
cycles 62     126     1375

Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-26 18:00:11 +02:00
Pierre Edouard Lepere
1a880b2fb8 hevc: SSE2 and SSSE3 loop filters
Additional contributions by James Almer <jamrial@gmail.com>,
Carl Eugen Hoyos <cehoyos@ag.or.at>, Fiona Glaser <fiona@x264.com> and
Anton Khirnov <anton@khirnov.net>

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-07-26 15:01:01 +00:00
Mickaël Raulet
bd0f2d316f x86/hevc: add 12bits support for MC
cherry picked from commit 3fcb7a4595a6f40100a22110a5805e3b7510c0fd

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-26 01:55:20 +02:00
Mickaël Raulet
7bdcf5c934 x86/hevc: add 12bits support for deblocking filter
cherry picked from commit 97d46afe320c7d61d7b9525e5f5588355cde4bb0

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-26 01:19:42 +02:00
Christophe Gisquet
670b7f203a x86: hevcdsp: align
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-23 22:18:08 +02:00
Michael Niedermayer
ca6b33b8bd avcodec/x86/hevcdsp_init: Fix "warning: assignment from incompatible pointer type" 2014-07-22 16:36:12 +02:00
James Almer
276bef5340 x86/hevc_deblock: add ff_hevc_[hv]_loop_filter_luma_{8, 10}_sse2
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Kieran Kunhya <kierank@obe.tv>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-13 13:48:31 +02:00
plepere
942e22c651 avcodec/x86/hevc: add avx2 dc idct
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-25 14:49:44 +02:00
plepere
92cccb7bcd avcodec/hevc: new idct + asm
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-17 13:23:36 +02:00
Christophe Gisquet
09fc28aed1 x86: hevcdsp_init: fix macro usage
The macro was not using the parameter but unconditionally using sse4.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-01 23:20:07 +02:00
plepere
de7b89fd43 avcodec/x86/hevc: added DBF assembly functions
Reviewed-by: James Almer <jamrial@gmail.com>
Reviewed-by: Ronald S. Bultje
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-16 21:11:03 +02:00
Michael Niedermayer
341cacb9ac avcodec/x86/hevcdsp_init: fix build failure with --disable-mmx
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-09 05:16:27 +02:00
plepere
63832e01c3 hvcodec/x86/hevcdsp: make macros more modular to support functions that are not sse4
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-09 00:14:50 +02:00
Michael Niedermayer
fc7d0d8201 avcodec/x86/hevcdsp_init: fix SSE4 checks
Found-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-06 18:27:49 +02:00
Michael Niedermayer
3b3db02f2e avcodec/x86/hevcdsp_init: fix build on 32bit
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-06 18:23:42 +02:00
plepere
7a2491c436 HEVC : added assembly MC functions
pretty print x86

Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-06 18:23:36 +02:00