Commit graph

13 commits

Author SHA1 Message Date
James Darnley
651cb867b1 avcodec/v210enc: add new 10-bit function for avx512 avx512icl
avx512 on Skylake-X (Xeon D-2123IT):
1.19x faster (970±91.2 vs. 817±104.4 decicycles) compared with avx2

avx512icl on Ice Lake (Xeon Silver 4316):
2.52x faster (1350±5.3 vs. 535±9.5 decicycles) compared with avx2
2022-12-01 18:19:03 +01:00
James Darnley
c3d36e1b3d avcodec/v210enc: add new function for avx2 avx512 avx512icl
Negligible speed difference for avx2 on Zen 2 (Ryzen 5700X) and
Broadwell (Xeon E5-2620 v4):
    1690±4.3 decicycles vs. 1693±78.4
    1439±31.1 decicycles vs 1429±16.7

Moderate speedup with avx512 on Skylake-X (Xeon D-2123IT):
1.22x faster (793±0.8 vs. 649±5.5 decicycles) compared with avx2

Better speedup with avx512icl on Ice Lake (Xeon Silver 4316):
1.77x faster (784±1.8 vs. 442±11.6 decicycles) compared with avx2

Co-authors:
    Henrik Gramner <henrik@gramner.com>
    Kieran Kunhya <kierank@obe.tv>
2022-11-04 19:37:46 +01:00
Andreas Rheinhardt
f3c197b129 Include attributes.h directly
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-04-19 14:34:10 +02:00
Derek Buitenhuis
04e4166536 Merge commit 'e280fe1329'
* commit 'e280fe1329':
  v210: Use separate sample_factors

Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-02-16 17:23:32 +00:00
James Almer
70d685a77f x86: use the new helper macros where useful
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-14 20:00:21 -03:00
Luca Barbato
e280fe1329 v210: Use separate sample_factors
The 10bit and the 8bit functions can now be implemented to process
a different amount of samples.

And while at it simplify a little the code.
2016-02-01 13:40:07 +01:00
James Darnley
15ec7aa417 v210: Add avx2 version of the 10-bit line encoder
Around 25% faster than the ssse3 version.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-02-01 13:40:07 +01:00
James Darnley
d29237e557 v210: Add avx2 version of the 8-bit line encoder
Around 35% faster than the avx version.

Signed-off-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-02-01 13:40:07 +01:00
James Darnley
2cba1825f7 avcodec/v210: add avx2 version of the 10-bit line encoder
Around 25% faster than the ssse3 version.
2016-01-17 16:03:43 +01:00
James Darnley
3836f404a8 avcodec/v210: add avx2 version of the 8-bit line encoder
Around 35% faster than the avx version.

Signed-off-by: Henrik Gramner <henrik@gramner.com>
2016-01-17 16:03:43 +01:00
Michael Niedermayer
1d048f762d Merge commit '9a738c27dc'
* commit '9a738c27dc':
  v210enc: Add SIMD optimised 8-bit and 10-bit encoders

Conflicts:
	libavcodec/v210enc.c
	libavcodec/v210enc.h
	libavcodec/x86/Makefile
	libavcodec/x86/v210enc.asm
	libavcodec/x86/v210enc_init.c
	tests/ref/vsynth/vsynth1-v210
	tests/ref/vsynth/vsynth2-v210

See: 36091742d1
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-06 01:54:10 +01:00
Kieran Kunhya
9a738c27dc v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2014-12-05 13:03:49 +00:00
Kieran Kunhya
36091742d1 v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-26 20:30:47 +01:00