ffmpeg/libavcodec/arm
Martin Storsjö 1e5d87eec3 arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter
This work is sponsored by, and copyright, Google.

This is pretty much similar to the 8 bpp version, but in some senses
simpler. All input pixels are 16 bits, and all intermediates also fit
in 16 bits, so there's no lengthening/narrowing in the filter at all.

For the full 16 pixel wide filter, we can only process 4 pixels at a time
(using an implementation very much similar to the one for 8 bpp),
but we can do 8 pixels at a time for the 4 and 8 pixel wide filters with
a different implementation of the core filter.

Examples of relative speedup compared to the C version, from checkasm:
                                   Cortex    A7     A8     A9    A53
vp9_loop_filter_h_4_8_10bpp_neon:          1.83   2.16   1.40   2.09
vp9_loop_filter_h_8_8_10bpp_neon:          1.39   1.67   1.24   1.70
vp9_loop_filter_h_16_8_10bpp_neon:         1.56   1.47   1.10   1.81
vp9_loop_filter_h_16_16_10bpp_neon:        1.94   1.69   1.33   2.24
vp9_loop_filter_mix2_h_44_16_10bpp_neon:   2.01   2.27   1.67   2.39
vp9_loop_filter_mix2_h_48_16_10bpp_neon:   1.84   2.06   1.45   2.19
vp9_loop_filter_mix2_h_84_16_10bpp_neon:   1.89   2.20   1.47   2.29
vp9_loop_filter_mix2_h_88_16_10bpp_neon:   1.69   2.12   1.47   2.08
vp9_loop_filter_mix2_v_44_16_10bpp_neon:   3.16   3.98   2.50   4.05
vp9_loop_filter_mix2_v_48_16_10bpp_neon:   2.84   3.64   2.25   3.77
vp9_loop_filter_mix2_v_84_16_10bpp_neon:   2.65   3.45   2.16   3.54
vp9_loop_filter_mix2_v_88_16_10bpp_neon:   2.55   3.30   2.16   3.55
vp9_loop_filter_v_4_8_10bpp_neon:          2.85   3.97   2.24   3.68
vp9_loop_filter_v_8_8_10bpp_neon:          2.27   3.19   1.96   3.08
vp9_loop_filter_v_16_8_10bpp_neon:         3.42   2.74   2.26   4.40
vp9_loop_filter_v_16_16_10bpp_neon:        2.86   2.44   1.93   3.88

The speedup vs C code measured in checkasm is around 1.1-4x.
These numbers are quite inconclusive though, since the checkasm test
runs multiple filterings on top of each other, so later rounds might
end up with different codepaths (different decisions on which filter
to apply, based on input pixel differences).

Based on START_TIMER/STOP_TIMER wrapping around a few individual
functions, the speedup vs C code is around 2-4x.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-01-24 22:35:59 +02:00
..
aac.h Merge commit '637606de2d' 2012-12-08 14:19:55 +01:00
aacpsdsp_init_arm.c Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
aacpsdsp_neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
ac3dsp_arm.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
ac3dsp_armv6.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
ac3dsp_init_arm.c Merge commit '4958f35a2e' 2013-12-09 04:12:40 +01:00
ac3dsp_neon.S Merge commit '4958f35a2e' 2013-12-09 04:12:40 +01:00
asm-offsets.h Merge commit '6a13505c06' 2014-04-30 00:23:01 +02:00
audiodsp_arm.h Merge commit '9a9e2f1c8a' 2014-06-22 17:58:28 +02:00
audiodsp_init_arm.c Merge commit '9a9e2f1c8a' 2014-06-22 17:58:28 +02:00
audiodsp_init_neon.c Merge commit '9a9e2f1c8a' 2014-06-22 17:58:28 +02:00
audiodsp_neon.S Merge commit '9a9e2f1c8a' 2014-06-22 17:58:28 +02:00
blockdsp_arm.h blockdsp: remove high bitdepth parameter 2015-10-02 04:38:40 +02:00
blockdsp_init_arm.c blockdsp: remove high bitdepth parameter 2015-10-02 04:38:40 +02:00
blockdsp_init_neon.c blockdsp: reindent after parameter removal 2015-10-03 23:34:56 +02:00
blockdsp_neon.S Merge commit 'e74433a8e6' 2014-06-19 04:54:38 +02:00
cabac.h avcodec/arm/cabac: fix inline cabac reader with the UNCHECKED bitstream reader 2014-03-15 01:08:45 +01:00
dca.h avcodec/dca: remove old decoder 2016-01-31 17:09:38 +01:00
fft_fixed_init_arm.c Merge commit '97aec6e75e' 2016-04-12 15:43:09 +01:00
fft_fixed_neon.S Merge commit 'f963f80399' 2014-12-09 11:58:13 +01:00
fft_init_arm.c Merge commit '4c297249ac' 2016-04-12 15:43:34 +01:00
fft_neon.S Merge commit 'f963f80399' 2014-12-09 11:58:13 +01:00
fft_vfp.S Merge commit 'f963f80399' 2014-12-09 11:58:13 +01:00
flacdsp_arm.S Merge remote-tracking branch 'qatar/master' 2012-09-16 14:55:00 +02:00
flacdsp_init_arm.c lavc/flac: Fix encoding and decoding with high lpc. 2015-05-17 02:08:58 +02:00
fmtconvert_init_arm.c Merge commit '90b1b9350c' 2016-01-02 11:21:36 +01:00
fmtconvert_neon.S Merge commit '90b1b9350c' 2016-01-02 11:21:36 +01:00
fmtconvert_vfp.S Merge commit 'f0389eb777' 2013-08-29 16:10:39 +02:00
g722dsp_init_arm.c Merge commit '702458538d' 2015-02-16 02:16:29 +01:00
g722dsp_neon.S Merge commit '702458538d' 2015-02-16 02:16:29 +01:00
h264chroma_init_arm.c Merge commit '79dad2a932' 2013-02-07 13:09:35 +01:00
h264cmc_neon.S avcodec: fix vc1dsp dependencies 2016-09-25 13:11:45 +02:00
h264dsp_init_arm.c lavc/arm: Use the neon vertical chroma loop filter also for H.264 4:2:2. 2015-01-31 10:05:24 +01:00
h264dsp_neon.S Merge remote-tracking branch 'qatar/master' 2013-01-24 15:47:47 +01:00
h264idct_neon.S Merge commit '5bcbb516f2' 2014-02-08 00:48:26 +01:00
h264pred_init_arm.c Merge commit '256ef19844' 2015-07-18 02:13:22 +02:00
h264pred_neon.S
h264qpel_init_arm.c Merge commit '7fb993d338' 2014-07-25 13:05:08 +02:00
h264qpel_neon.S Merge remote-tracking branch 'qatar/master' 2013-01-24 15:47:47 +01:00
hevcdsp_arm.h hevcdsp: fix compilation for arm and aarch64 2015-03-12 20:01:01 +01:00
hevcdsp_deblock_neon.S hevcdsp: HEVC deblocking ARM NEON register clobber fix 2015-02-16 13:27:41 +01:00
hevcdsp_idct_neon.S avcodec/arm/hevcdsp_idct_neon: drop ".code 32" 2015-02-25 02:30:35 +01:00
hevcdsp_init_arm.c hevcdsp: fix compilation for arm and aarch64 2015-03-12 20:01:01 +01:00
hevcdsp_init_neon.c hevcdsp: fix compilation for arm and aarch64 2015-03-12 20:01:01 +01:00
hevcdsp_qpel_neon.S avcodec/hevcdsp: ARM NEON optimized qpel functions 2015-02-25 18:39:51 +01:00
hpeldsp_arm.h Merge commit '7151c5d04a' 2014-01-14 14:38:10 +01:00
hpeldsp_arm.S Merge commit '831a118078' 2014-03-13 23:59:56 +01:00
hpeldsp_armv6.S Merge commit '61985ad72c' 2014-03-09 01:16:21 +01:00
hpeldsp_init_arm.c Merge commit '322a1dda97' 2014-03-22 22:53:33 +01:00
hpeldsp_init_armv6.c Merge commit '7384b7a713' 2013-04-20 14:19:08 +02:00
hpeldsp_init_neon.c Merge commit '7384b7a713' 2013-04-20 14:19:08 +02:00
hpeldsp_neon.S arm: hpeldsp: Move half-pel assembly from dsputil to hpeldsp 2013-04-19 23:19:08 +03:00
idct.h Merge commit '4de8b60684' 2014-07-21 01:56:22 +02:00
idctdsp_arm.h Merge commit 'e3fcb14347' 2014-07-01 15:22:11 +02:00
idctdsp_arm.S avcodec/idctdsp: change {put,add}_pixels_clamped to ptrdiff_t line_size 2014-09-24 21:43:19 -03:00
idctdsp_armv6.S Merge commit 'e3fcb14347' 2014-07-01 15:22:11 +02:00
idctdsp_init_arm.c Merge commit '7c6eb0a1b7' 2015-07-27 22:10:35 +02:00
idctdsp_init_armv5te.c Merge commit '4de8b60684' 2014-07-21 01:56:22 +02:00
idctdsp_init_armv6.c Merge commit '7c6eb0a1b7' 2015-07-27 22:10:35 +02:00
idctdsp_init_neon.c avcodec/idctdsp: change {put,add}_pixels_clamped to ptrdiff_t line_size 2014-09-24 21:43:19 -03:00
idctdsp_neon.S Merge commit 'e3fcb14347' 2014-07-01 15:22:11 +02:00
int_neon.S Merge commit '054013a0fc' 2014-05-30 00:59:15 +02:00
jrevdct_arm.S Drop DCTELEM typedef 2013-01-22 18:32:56 -08:00
lossless_audiodsp_init_arm.c apedsp: move to llauddsp 2014-06-05 20:31:59 +02:00
lossless_audiodsp_neon.S apedsp: move to llauddsp 2014-06-05 20:31:59 +02:00
Makefile arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:35:59 +02:00
mathops.h Merge commit '637606de2d' 2012-12-08 14:19:55 +01:00
mdct_fixed_neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
mdct_neon.S Merge commit '5bcbb516f2' 2014-02-08 00:48:26 +01:00
mdct_vfp.S armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6) 2014-07-18 01:34:08 +03:00
me_cmp_armv6.S Merge commit '2d60444331' 2014-07-17 23:27:40 +02:00
me_cmp_init_arm.c Merge commit '9c12c6ff95' 2014-11-24 12:13:00 +01:00
mlpdsp_armv5te.S Merge commit '4c81613df4' 2014-12-10 00:51:26 +01:00
mlpdsp_armv6.S Merge commit '41ed7ab45f' 2016-06-21 21:55:34 +02:00
mlpdsp_init_arm.c Merge remote-tracking branch 'qatar/master' 2014-03-26 21:23:09 +01:00
mpegaudiodsp_fixed_armv6.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
mpegaudiodsp_init_arm.c Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
mpegvideo_arm.c Merge commit '835f798c7d' 2014-08-15 20:11:56 +02:00
mpegvideo_arm.h Merge commit '835f798c7d' 2014-08-15 20:11:56 +02:00
mpegvideo_armv5te.c Merge commit '41ed7ab45f' 2016-06-21 21:55:34 +02:00
mpegvideo_armv5te_s.S Merge remote-tracking branch 'qatar/master' 2012-08-01 23:33:06 +02:00
mpegvideo_neon.S Merge commit '5bcbb516f2' 2014-02-08 00:48:26 +01:00
mpegvideoencdsp_armv6.S Merge commit 'c166148409' 2014-07-07 15:36:58 +02:00
mpegvideoencdsp_init_arm.c Merge commit 'c166148409' 2014-07-07 15:36:58 +02:00
neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
neontest.c avcodec: fix arguments on xmm/neon clobber test wrappers 2016-10-02 02:15:47 -03:00
pixblockdsp_armv6.S Merge commit 'f46bb608d9' 2014-07-10 01:22:14 +02:00
pixblockdsp_init_arm.c avcodec: Change get_pixels() to ptrdiff_t linesize 2014-08-06 15:50:54 +02:00
rdft_init_arm.c arm/rdft_init: fix license header 2016-04-12 15:01:19 -03:00
rdft_neon.S Merge remote-tracking branch 'qatar/master' 2012-10-03 13:35:02 +02:00
rv34dsp_init_arm.c Merge commit 'a846dccb29' 2013-02-07 13:35:49 +01:00
rv34dsp_neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
rv40dsp_init_arm.c Merge commit '7fb993d338' 2014-07-25 13:05:08 +02:00
rv40dsp_neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
sbrdsp_init_arm.c Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
sbrdsp_neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
simple_idct_arm.S Merge commit '41ed7ab45f' 2016-06-21 21:55:34 +02:00
simple_idct_armv5te.S Merge remote-tracking branch 'qatar/master' 2012-08-01 23:33:06 +02:00
simple_idct_armv6.S Merge commit '88bd7fdc82' 2013-01-23 17:44:56 +01:00
simple_idct_neon.S Merge commit '88bd7fdc82' 2013-01-23 17:44:56 +01:00
startcode.h Merge commit 'db7f1c7c5a' 2014-08-05 12:46:10 +02:00
startcode_armv6.S h264: Move start code search functions into separate source files. 2014-08-04 22:22:54 +02:00
synth_filter_init_arm.c avcodec/synth_filter: split off remaining code from dcadec files 2016-01-25 14:57:38 -03:00
synth_filter_neon.S Merge remote-tracking branch 'qatar/master' 2012-10-03 13:35:02 +02:00
synth_filter_vfp.S Merge commit '7e18a727d2' 2014-07-18 13:17:29 +02:00
vc1dsp.h Merge commit '832e190632' 2013-12-20 23:12:16 +01:00
vc1dsp_init_arm.c Fix compile error on arm4/arm5 platform 2014-09-23 21:11:05 +02:00
vc1dsp_init_neon.c Merge commit '896a5bff64' 2014-06-03 18:19:21 +02:00
vc1dsp_neon.S Merge commit '896a5bff64' 2014-06-03 18:19:21 +02:00
videodsp_arm.h videodsp: Fix project name 2012-12-22 00:58:08 +01:00
videodsp_armv5te.S arm: use a local label instead of the function symbol in ff_prefetch_arm 2015-07-20 23:10:29 +02:00
videodsp_init_arm.c Merge commit '620289a20e' 2013-02-06 13:27:24 +01:00
videodsp_init_armv5te.c Merge commit '620289a20e' 2013-02-06 13:27:24 +01:00
vorbisdsp_init_arm.c Merge commit '620289a20e' 2013-02-06 13:27:24 +01:00
vorbisdsp_neon.S Merge commit 'fef906c77c' 2013-01-20 14:13:16 +01:00
vp3dsp_init_arm.c Merge commit '3dc6272bed' 2014-04-05 18:54:15 +02:00
vp3dsp_neon.S Merge remote-tracking branch 'qatar/master' 2014-01-08 05:44:56 +01:00
vp6dsp_init_arm.c Merge commit '8506ff97c9' 2013-08-24 11:04:11 +02:00
vp6dsp_neon.S Merge commit '8506ff97c9' 2013-08-24 11:04:11 +02:00
vp8.h arm: asm decode_block_coeffs_internal is vp8 specific 2014-04-04 10:39:29 +02:00
vp8_armv6.S Merge remote-tracking branch 'qatar/master' 2012-09-21 14:44:32 +02:00
vp8dsp.h Merge commit 'ac4b32df71' 2014-04-04 14:46:10 +02:00
vp8dsp_armv6.S Merge commit '5f74bd31a9' 2016-11-17 15:05:07 +01:00
vp8dsp_init_arm.c Merge commit 'ac4b32df71' 2014-04-04 14:46:10 +02:00
vp8dsp_init_armv6.c Merge commit 'ac4b32df71' 2014-04-04 14:46:10 +02:00
vp8dsp_init_neon.c Merge commit 'ac4b32df71' 2014-04-04 14:46:10 +02:00
vp8dsp_neon.S Merge commit 'e8b96a7701' 2016-11-14 15:21:49 +01:00
vp9dsp_init.h arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9dsp_init_10bpp_arm.c arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9dsp_init_12bpp_arm.c arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9dsp_init_16bpp_arm_template.c arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:35:59 +02:00
vp9dsp_init_arm.c arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9itxfm_16bpp_neon.S arm: Add NEON optimizations for 10 and 12 bit vp9 itxfm 2017-01-24 22:35:56 +02:00
vp9itxfm_neon.S arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 2017-01-14 21:13:30 +01:00
vp9lpf_16bpp_neon.S arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:35:59 +02:00
vp9lpf_neon.S arm: vp9: Add NEON loop filters 2016-11-15 15:10:03 -05:00
vp9mc_16bpp_neon.S arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9mc_neon.S arm: vp9mc: Fix vertical alignment of operands 2017-01-14 21:13:37 +01:00
vp56_arith.h Merge commit '637606de2d' 2012-12-08 14:19:55 +01:00