James Almer
42111e8543
avcodec: fix arguments on xmm/neon clobber test wrappers
...
Signed-off-by: James Almer <jamrial@gmail.com>
2016-10-02 02:15:47 -03:00
James Almer
449f263f9f
avcodec: add missing xmm/neon clobber test wrappers for the new encode API
...
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-10-01 14:08:50 -03:00
Xiaolei Yu
5a70e56f2f
avcodec: fix vc1dsp dependencies
2016-09-25 13:11:45 +02:00
James Almer
293484fa5e
avcodec: add missing xmm/neon clobber test wrappers for the new decode API
...
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-07-03 18:04:30 -03:00
Clément Bœsch
dfd0c0f981
lavc/neontest: fix constness in arm/aarch64 avcodec_open2() wrappers
2016-06-25 13:41:13 +02:00
Clément Bœsch
8ef57a0d61
Merge commit ' 41ed7ab45f'
...
* commit '41ed7ab45f ':
cosmetics: Fix spelling mistakes
Merged-by: Clément Bœsch <u@pkh.me>
2016-06-21 21:55:34 +02:00
James Almer
c8c14d0ffc
aarch64/synth_filter: fix compilation
...
Signed-off-by: James Almer <jamrial@gmail.com>
2016-05-10 23:33:12 -03:00
Derek Buitenhuis
ca5ec2bf51
Merge commit ' 01621202aa'
...
* commit '01621202aa ':
build: miscellaneous cosmetics
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-05-09 16:25:28 +01:00
Vittorio Giovara
41ed7ab45f
cosmetics: Fix spelling mistakes
...
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-05-04 18:16:21 +02:00
Derek Buitenhuis
87b8e95008
Merge commit ' cdb1665f70'
...
* commit 'cdb1665f70 ':
aarch64: Make transpose_4x4H do a regular transpose
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-04-24 12:51:42 +01:00
Derek Buitenhuis
197fa698c6
Merge commit ' 97aec6e75e'
...
* commit '97aec6e75e ':
fft: arm: Drop unnecessary #include, add missing ones
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-04-12 15:43:09 +01:00
Diego Biurrun
01621202aa
build: miscellaneous cosmetics
...
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
2016-04-07 15:26:08 +02:00
Martin Storsjö
cdb1665f70
aarch64: Make transpose_4x4H do a regular transpose
...
Previously, ff_h264_idct_add_neon (originally in the arm version) used
a non-regular transpose in order to be able to use more instructions
that deal with registers as 128 bit register pairs. The aarch64
translation doesn't do it to the same extent, but brought along the
same structure since it was a straight translation.
This reshuffles ff_h264_idct_add_neon, bringing it closer to
the C implementation, making the transpose_4x4H macro do a regular
transpose, usable for other algorithms as well.
Previously, the third and fourth output from transpose_4x4H were
swapped, and prior to cc29d96d5a , the same inputs as well. In
addition to just swapping the outputs, also renumber the intermediate
registers for better readability (making the register order match
transpose_4x8B).
This runs with the same number of cycles as before.
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-03-26 21:25:56 +02:00
Diego Biurrun
1a094af638
fft: Split MDCT bits off from FFT
2016-03-01 10:18:28 +01:00
Diego Biurrun
97aec6e75e
fft: arm: Drop unnecessary #include, add missing ones
2016-02-26 14:34:58 +01:00
foo86
ae5b2c5250
avcodec/dca: add new decoder based on libdcadec
2016-01-31 17:09:38 +01:00
foo86
4608996772
avcodec/dca: remove old decoder
...
Remove all files and functions which are not going to be reused,
and disable all functions and FATE tests temporarily which will be.
2016-01-31 17:09:38 +01:00
James Almer
209f50e16b
avcodec/synth_filter: split off remaining code from dcadec files
...
Signed-off-by: James Almer <jamrial@gmail.com>
2016-01-25 14:57:38 -03:00
Hendrik Leppkes
d03da3e240
Merge commit ' 2008f76054'
...
* commit '2008f76054 ':
dca: remove unused decode_hf function and quant_d tables
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 13:17:48 +01:00
Hendrik Leppkes
e97e2588ca
Merge commit ' a0fc780a20'
...
* commit 'a0fc780a20 ':
arm64: int32_to_float_fmul neon asm
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 11:21:16 +01:00
Hendrik Leppkes
10e075c138
Merge commit ' 705f5e5e15'
...
* commit '705f5e5e15 ':
arm64: port synth_filter_float_neon from arm
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 11:14:28 +01:00
Hendrik Leppkes
de3a33784c
Merge commit ' c33c1fa8af'
...
* commit 'c33c1fa8af ':
arm64: convert dcadsp neon asm from arm
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 11:10:24 +01:00
Alexandra Hájková
2008f76054
dca: remove unused decode_hf function and quant_d tables
...
They were superseded with their integer equivalents. Rename integer
decode_hf to decode_hf.
2015-12-24 13:58:18 +01:00
Janne Grunau
cc29d96d5a
arm64: fix inverted register order in transpose_4x4H
...
Fix related register order issue in ff_h264_idct_add_neon.
Found-by: zjh8890 <243186085@qq.com>
2015-12-21 13:44:20 +01:00
Janne Grunau
2dba0407fd
avcodec/arm64: fix inverted register order in transpose_4x4H
...
Fix related register order issue in ff_h264_idct_add_neon.
Found-by: zjh8890 <243186085@qq.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-12-19 03:58:46 +01:00
Michael Niedermayer
95b59bfb9d
Revert "avcodec/aarch64/neon.S: Update neon.s for transpose_4x4H"
...
The change was not correct and broke H264
This reverts commit cd83f899c9 .
2015-12-17 21:26:37 +01:00
Janne Grunau
a0fc780a20
arm64: int32_to_float_fmul neon asm
...
3% faster dts decoding on a cortex-a57.
cortex-a57 cortex-a53
int32_to_float_fmul_array8_c: 1270.9 4475.6
int32_to_float_fmul_array8_neon: 328.6 569.2
int32_to_float_fmul_scalar_c: 928.5 4119.6
int32_to_float_fmul_scalar_neon: 309.1 524.1
2015-12-14 16:45:02 +01:00
Janne Grunau
705f5e5e15
arm64: port synth_filter_float_neon from arm
...
~25% faster dts decoding overall. The checkasm CPU cycles numbers are
not that useful since synth_filter_float() calls FFTContext.imdct_half().
cortex-a57 cortex-a53
synth_filter_float_c: 1866.2 3490.9
synth_filter_float_neon: 915.0 1531.5
With fftc.imdct_half forced to imdct_half_neon:
cortex-a57 cortex-a53
synth_filter_float_c: 1718.4 3025.3
synth_filter_float_neon: 926.2 1530.1
2015-12-14 16:45:01 +01:00
Janne Grunau
c33c1fa8af
arm64: convert dcadsp neon asm from arm
...
~2% faster dts decoding overall.
cortex-a57 cortex-a53
dca_decode_hf_c: 474.8 1659.9
dca_decode_hf_neon: 225.2 301.1
dca_lfe_fir0_c: 913.2 1537.7
dca_lfe_fir0_neon: 286.8 451.9
dca_lfe_fir1_c: 848.7 1711.5
dca_lfe_fir1_neon: 387.1 506.4
2015-12-14 16:45:01 +01:00
zjh8890
c18176bd55
avcodec/aarch64/neon.S: Update neon.s for transpose_4x4H
...
The transpose_4x4H is wrong which cost me much time to find this bug. The orders of r2 and r3 are wrong,
this bug waste me much time while I make aarch64 arm instruction which used the function.
2015-12-12 14:20:01 +01:00
Michael Niedermayer
5d5f8b29b4
Merge commit ' f56d8d8dd7'
...
* commit 'f56d8d8dd7 ':
h264: aarch64: intra prediction optimisations
Conflicts:
libavcodec/h264pred.c
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-21 01:39:30 +02:00
Janne Grunau
f56d8d8dd7
h264: aarch64: intra prediction optimisations
2015-07-20 23:10:29 +02:00
Janne Grunau
c2de2cf0d2
arm64: constify src in h264qpel dsp function definitions
2015-06-24 08:41:32 +02:00
Michael Niedermayer
7b32b35bf5
Merge commit ' 3d5d46233c'
...
* commit '3d5d46233c ':
opus: Factor out imdct15 into a standalone component
Conflicts:
configure
libavcodec/opus_celt.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-02 20:43:13 +01:00
Diego Biurrun
3d5d46233c
opus: Factor out imdct15 into a standalone component
...
It will be reused by the AAC decoder.
2015-02-02 16:07:33 +01:00
Carl Eugen Hoyos
4faea46bd9
lavc/aarch64: Do not use the neon horizontal chroma loop filter for H.264 4:2:2.
2015-01-31 10:05:10 +01:00
Michael Niedermayer
92d47e2aa3
Merge commit ' 780cd20b00'
...
* commit '780cd20b00 ':
aarch64: Use .data.rel.ro for const data with relocations
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-09 12:08:29 +01:00
Martin Storsjö
780cd20b00
aarch64: Use .data.rel.ro for const data with relocations
...
This reverts commit c00365b46d
in addition to using a different section.
Signed-off-by: Martin Storsjö <martin@martin.st>
2014-12-09 11:43:31 +02:00
Michael Niedermayer
f3cba01cce
Merge commit ' c00365b46d'
...
* commit 'c00365b46d ':
aarch64: Make the function pointer tables position independent
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-16 01:05:31 +01:00
Martin Storsjö
c00365b46d
aarch64: Make the function pointer tables position independent
...
This allows running the code on android, where 64 bit binaries with
text relocations aren't allowed to be loaded.
Signed-off-by: Martin Storsjö <martin@martin.st>
2014-11-16 01:07:24 +02:00
Michael Niedermayer
e16b7338d8
avcodec/aarch64/h264qpel_init_aarch64: mark src as const
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-30 12:48:31 +02:00
Michael Niedermayer
7fd60d1e7a
Merge commit ' ac6b95dbc0'
...
* commit 'ac6b95dbc0 ':
aarch64: add ',' between assembler macro arguments where missing
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-04 04:06:13 +02:00
Janne Grunau
ac6b95dbc0
aarch64: add ',' between assembler macro arguments where missing
...
llvm's integrated assembler does not accept spaces as macro argument
delimiter when targeting darwin. Using a explicit delimiter is a good
idea in principle since it makes case like 'macro 4 -2' vs 'macro 4 - 2'
clear.
2014-08-04 00:17:21 +02:00
Michael Niedermayer
32cf26cc6a
Merge commit ' f23d26a686'
...
* commit 'f23d26a686 ':
h264: avoid using uninitialized memory in NEON chroma mc
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-23 20:35:33 +02:00
Janne Grunau
f23d26a686
h264: avoid using uninitialized memory in NEON chroma mc
...
Adapt commit 982b596ea6 for the arm and
aarch64 NEON asm. 5-10% faster on Cortex-A9.
2014-06-23 16:32:15 +02:00
Michael Niedermayer
30cdf384d1
Merge commit ' d3f5b94762'
...
* commit 'd3f5b94762 ':
aarch64: opus NEON iMDCT and FFT
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-15 21:13:53 +02:00
Janne Grunau
d3f5b94762
aarch64: opus NEON iMDCT and FFT
...
Opus celt decoding 11% faster and the iMDCT over 2.5 times faster on
Apple's A7.
2014-05-15 18:17:02 +02:00
Michael Niedermayer
76581ab833
Merge commit ' 9aa4592076'
...
* commit '9aa4592076 ':
aarch64: assembler in clang-3.4 ignores the division by two
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-13 20:34:32 +02:00
Janne Grunau
9aa4592076
aarch64: assembler in clang-3.4 ignores the division by two
...
Values are positive powers of two, so just replace it with right shift.
2014-05-13 19:44:09 +02:00
Michael Niedermayer
cc17ff8826
Merge commit ' 3956a5e0ea'
...
* commit '3956a5e0ea ':
aarch64: NEON vorbis_inverse_coupling
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-22 23:51:19 +02:00