avx512 on Skylake-X (Xeon D-2123IT):
1.19x faster (970±91.2 vs. 817±104.4 decicycles) compared with avx2
avx512icl on Ice Lake (Xeon Silver 4316):
2.52x faster (1350±5.3 vs. 535±9.5 decicycles) compared with avx2
Negligible speed difference for avx2 on Zen 2 (Ryzen 5700X) and
Broadwell (Xeon E5-2620 v4):
1690±4.3 decicycles vs. 1693±78.4
1439±31.1 decicycles vs 1429±16.7
Moderate speedup with avx512 on Skylake-X (Xeon D-2123IT):
1.22x faster (793±0.8 vs. 649±5.5 decicycles) compared with avx2
Better speedup with avx512icl on Ice Lake (Xeon Silver 4316):
1.77x faster (784±1.8 vs. 442±11.6 decicycles) compared with avx2
Co-authors:
Henrik Gramner <henrik@gramner.com>
Kieran Kunhya <kierank@obe.tv>
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>