ffmpeg/libavcodec/aarch64
Lynne 4d2f62150d aarch64/opusdsp: implement NEON accelerated postfilter and deemphasis
153372 UNITS in postfilter_c,   65536 runs,      0 skips
73164 UNITS in postfilter_neon,   65536 runs,      0 skips -> 2.1x speedup

80591 UNITS in deemphasis_c,  131072 runs,      0 skips
43969 UNITS in deemphasis_neon,  131072 runs,      0 skips -> 1.83x speedup

Total decoder speedup: ~15% on a Raspberry Pi 3 (from 28.1x to 33.5x realtime)

Deemphasis SIMD based on the following unrolling:
const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1;
float state = coeff;

for (int i = 0; i < len; i += 4) {
    y[0] = x[0] + c1*state;
    y[1] = x[1] + c2*state + c1*x[0];
    y[2] = x[2] + c3*state + c1*x[1] + c2*x[0];
    y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0];

    state = y[3];
    y += 4;
    x += 4;
}

Unlike the x86 version, duplication is used instead of pslldq so
the structure and tables are different.
2019-04-10 01:08:54 +02:00
..
aacpsdsp_init_aarch64.c lavc/aacpsdsp: use ptrdiff_t for stride in hybrid_analysis 2017-06-28 12:22:39 +02:00
aacpsdsp_neon.S lavc/aacpsdsp: use ptrdiff_t for stride in hybrid_analysis 2017-06-28 12:22:39 +02:00
asm-offsets.h Merge commit '705f5e5e15' 2016-01-02 11:14:28 +01:00
cabac.h
fft_init_aarch64.c Merge commit '97aec6e75e' 2016-04-12 15:43:09 +01:00
fft_neon.S
fmtconvert_init.c Merge commit 'a0fc780a20' 2016-01-02 11:21:16 +01:00
fmtconvert_neon.S Merge commit 'a0fc780a20' 2016-01-02 11:21:16 +01:00
h264chroma_init_aarch64.c Merge commit 'e4a94d8b36' 2017-03-21 15:20:45 -03:00
h264cmc_neon.S Merge commit 'e4a94d8b36' 2017-03-21 15:20:45 -03:00
h264dsp_init_aarch64.c Merge commit '186bd30aa3' 2019-03-14 16:29:41 -03:00
h264dsp_neon.S Merge commit '186bd30aa3' 2019-03-14 16:29:41 -03:00
h264idct_neon.S libavcodec: Remove dynamic relocs from aarch64/h264idct_neon.S 2019-01-03 20:12:07 +01:00
h264pred_init.c
h264pred_neon.S
h264qpel_init_aarch64.c
h264qpel_neon.S
hpeldsp_init_aarch64.c
hpeldsp_neon.S
idct.h Merge commit '2ec9fa5ec6' 2017-03-21 14:29:52 -03:00
idctdsp_init_aarch64.c lavc/aarch64: add ff_simple_idct{,_add,_put}_neon functions 2017-03-16 12:00:41 +01:00
Makefile aarch64/opusdsp: implement NEON accelerated postfilter and deemphasis 2019-04-10 01:08:54 +02:00
mdct_neon.S
mpegaudiodsp_init.c Merge commit '72a19f4013' 2017-03-31 14:43:37 -03:00
mpegaudiodsp_neon.S Merge commit '732510636e' 2017-11-11 17:47:10 -03:00
neon.S Merge commit 'cdb1665f70' 2016-04-24 12:51:42 +01:00
neontest.c Merge commit 'de2ae3c1fa' 2017-03-21 14:43:53 +01:00
opusdsp_init.c aarch64/opusdsp: implement NEON accelerated postfilter and deemphasis 2019-04-10 01:08:54 +02:00
opusdsp_neon.S aarch64/opusdsp: implement NEON accelerated postfilter and deemphasis 2019-04-10 01:08:54 +02:00
rv40dsp_init_aarch64.c Merge commit 'e4a94d8b36' 2017-03-21 15:20:45 -03:00
sbrdsp_init_aarch64.c lavc/aarch64: add sbrdsp neon implementation 2017-07-03 14:29:22 +02:00
sbrdsp_neon.S lavc/aarch64/sbrdsp_neon: fix build on old binutils 2018-01-26 02:42:01 -06:00
simple_idct_neon.S lavc/aarch64/simple_idct: fix build with Xcode 7.2 2017-06-14 23:20:58 +02:00
synth_filter_init.c avcodec/synth_filter: split off remaining code from dcadec files 2016-01-25 14:57:38 -03:00
synth_filter_neon.S Merge commit '2425d7329f' 2017-04-26 16:28:57 +02:00
vc1dsp_init_aarch64.c Merge commit 'e4a94d8b36' 2017-03-21 15:20:45 -03:00
videodsp.S
videodsp_init.c
vorbisdsp_init.c
vorbisdsp_neon.S
vp8dsp.h Merge commit 'e39a9212ab' 2019-03-14 16:18:42 -03:00
vp8dsp_init_aarch64.c Merge commit 'e39a9212ab' 2019-03-14 16:18:42 -03:00
vp8dsp_neon.S Merge commit '7e42d5f0ab' 2019-03-14 16:22:29 -03:00
vp9dsp_init.h vp9: re-split the decoder/format/dsp interface header files. 2017-03-28 18:04:26 -04:00
vp9dsp_init_10bpp_aarch64.c aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:36:05 +02:00
vp9dsp_init_12bpp_aarch64.c aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:36:05 +02:00
vp9dsp_init_16bpp_aarch64_template.c aarch64/vp9dsp: add missing header includes 2017-03-28 23:02:09 -03:00
vp9dsp_init_aarch64.c aarch64/vp9dsp: add missing header includes 2017-03-28 23:02:09 -03:00
vp9itxfm_16bpp_neon.S aarch64: vp9 16bpp: Fix assembling with Xcode 6.2 and older 2017-06-21 09:08:14 +03:00
vp9itxfm_neon.S aarch64: vp9: Fix assembling with Xcode 6.2 and older 2017-06-21 09:08:13 +03:00
vp9lpf_16bpp_neon.S aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:36:11 +02:00
vp9lpf_neon.S aarch64: vp9lpf: Use dup+rev16+uzp1 instead of dup+lsr+dup+trn1 2017-03-11 13:14:50 +02:00
vp9mc_16bpp_neon.S aarch64: vp9 16bpp: Fix assembling with Xcode 6.2 and older 2017-06-21 09:08:14 +03:00
vp9mc_neon.S aarch64: vp9: Fix assembling with Xcode 6.2 and older 2017-06-21 09:08:13 +03:00