Commit graph

5 commits

Author SHA1 Message Date
Andreas Rheinhardt
84f16bb5e6 avutil/avassert: Don't include avutil.h
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-02-24 12:56:49 +01:00
Andreas Rheinhardt
f3c197b129 Include attributes.h directly
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-04-19 14:34:10 +02:00
Rostislav Pehlivanov
29eb1c51d7 mdct15: simplify x86 exptab permutation
Removes an unneeded copy and does the 5-point permute in-place.

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2018-05-07 23:44:40 +01:00
Rostislav Pehlivanov
70eb77b34e mdct15: add inverse transform postrotation SIMD
2.5ms frames:
Before   (c):  2638 decicycles in postrotate, 2097040 runs,    112 skips
After (sse3):  1467 decicycles in postrotate, 2097083 runs,     69 skips
After (avx2):  1244 decicycles in postrotate, 2097085 runs,     67 skips

5ms frames:
Before   (c):  4987 decicycles in postrotate, 1048371 runs,    205 skips
After (sse3):  2644 decicycles in postrotate, 1048509 runs,     67 skips
After (avx2):  2031 decicycles in postrotate, 1048523 runs,     53 skips

10ms frames:
Before   (c):  9153 decicycles in postrotate,  523575 runs,    713 skips
After (sse3):  5110 decicycles in postrotate,  523726 runs,    562 skips
After (avx2):  3738 decicycles in postrotate,  524223 runs,     65 skips

20ms frames:
Before   (c): 17857 decicycles in postrotate,  261866 runs,    278 skips
After (sse3): 10041 decicycles in postrotate,  261746 runs,    398 skips
After (avx2):  7050 decicycles in postrotate,  262116 runs,     28 skips

Improves total decoding performance for real world content by 9% with avx2.

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2017-07-30 07:38:39 +01:00
Rostislav Pehlivanov
e1120b1c54 mdct15: add assembly optimizations for the 15-point FFT
c:    1802 decicycles in fft15,16774635 runs,   2581 skips
avx:   865 decicycles in fft15,16776378 runs,    838 skips

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2017-06-23 23:45:37 +01:00