Christopher Degawa
8990c5869e
get_cabac_inline_x86: Don't inline if 32-bit clang on windows
...
Fixes https://trac.ffmpeg.org/ticket/8903
relevant https://github.com/msys2/MINGW-packages/discussions/9258
Signed-off-by: Christopher Degawa <ccom@randomderp.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2021-08-19 22:29:23 +03:00
Andreas Rheinhardt
afc95a10ac
avcodec/h264dsp, h264idct: Fix lengths of array parameters
...
Fixes many -Warray-parameter warnings from GCC 11.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-08-08 17:44:57 +02:00
Andreas Rheinhardt
25c8507818
Remove/replace some unnecessary avcodec.h inclusions
...
Also remove other unnecessary headers and include headers directly while
at it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-07-22 15:29:46 +02:00
Andreas Rheinhardt
4608f7cc6a
Remove unnecessary mem.h inclusions
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-07-22 14:47:57 +02:00
Andreas Rheinhardt
2c05ee092b
avutil/internal, swresample/audioconvert: Remove cpu.h inclusions
...
These inclusions are not necessary, as cpu.h is already included
wherever it is needed (via direct inclusion or via the arch-specific
headers).
Also remove other unnecessary cpu.h inclusions from ordinary
non-headers.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-07-22 14:33:45 +02:00
Andreas Rheinhardt
7c1f347b18
avcodec: Remove deprecated old encode/decode APIs
...
Deprecated in commits 7fc329e2dd
and 31f6a4b4b8 .
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2021-04-27 10:43:12 -03:00
Andreas Rheinhardt
f3c197b129
Include attributes.h directly
...
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-04-19 14:34:10 +02:00
Paul B Mahol
b69c91bbee
avcodec/x86: add cfhdenc SIMD
2021-02-27 17:09:44 +01:00
James Almer
f1a894f9d3
avcodec: add missing FF_API_OLD_ENCDEC wrappers to xmm clobber functions
...
Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-26 19:26:31 -03:00
Andreas Rheinhardt
585b764f95
avcodec/x86/constants: Remove unused ff_pw_17
...
Unused since 80944df720 .
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-24 09:49:03 +01:00
Andreas Rheinhardt
7825cc392a
avcodec/x86/diracdsp_init: Reuse macro
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-24 09:38:12 +01:00
Andreas Rheinhardt
0f317eb8e7
avcodec/x86/diracdsp_init: Simplify macro
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-24 09:36:13 +01:00
Andreas Rheinhardt
68bd6c7dac
avcodec/x86/diracdsp_init: Make functions only used here static
...
This allowed to remove forward declarations. Because compilers expect
declarations for all functions they encounter even when it is within
blocks disabled via "if (0 && foo)", one has to use a real #if in
ff_diracdsp_init_x86.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-24 09:17:40 +01:00
Andreas Rheinhardt
3a80b1ac12
avcodec/x86/diracdsp_init: Remove unused MMX functions
...
Unused since a1f3b18bf5 , yet as nonstatic
functions the compiler can't detect this, so that these functions aren't
stripped and no warning is emitted.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-24 08:58:57 +01:00
Andreas Rheinhardt
4f3d8cb554
avcodec/cabac_functions, x86/cabac: Include stddef.h
...
Fixes checkheaders after 8c01eb0a31 .
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-04 05:17:33 +01:00
Lynne
9e05421dbe
ac3enc_fixed: drop unnecessary fixed-point DSP code
2021-01-14 01:44:20 +01:00
Anton Khirnov
e15371061d
lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bump
...
They are not properly namespaced and not intended for public use.
2021-01-01 14:14:57 +01:00
Anton Khirnov
c8c2dfbc37
lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h
...
That is a more appropriate place for it.
2021-01-01 14:11:01 +01:00
Andreas Rheinhardt
ead3134150
avcodec/mpegaudiodsp: Make ff_mpadsp_init() thread-safe
...
The only thing missing for this is to make ff_mpadsp_init_x86()
thread-safe; it currently isn't because a static table is initialized
every time ff_mpadsp_init() is called (when ARCH_X86 is true). Solve
this by initializing this table only once, namely together with the
ordinary not-arch specific tables. This also allows to reuse their AVOnce.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2020-11-24 11:35:03 +01:00
James Almer
1a35fffaf2
x86/cfhddsp: zero extend int arguments
...
if taken from stack, they may have garbage in the upper bits otherwise.
Also, there are only 8 arguments, so don't attempt to load 11.
Fixes SIGSEV crashes in some targets.
Reviewed-by: durandal_1707
Signed-off-by: James Almer <jamrial@gmail.com>
2020-08-28 20:09:25 -03:00
Paul B Mahol
4aac742505
avcodec/x86/cfhddsp: try to fix build on x32
2020-08-26 23:39:58 +02:00
Paul B Mahol
389cc142fb
avcodec/cfhd: add x86 SIMD
...
Overall speed changes for 1920x1080, yuv422p10le, 60fps from: 0.19x to 0.343x
2020-08-26 21:13:38 +02:00
James Almer
2c844c9828
x86/h264_deblock: fix warning about trailing empty parameter
...
Fixes part of ticket #8771
Signed-off-by: James Almer <jamrial@gmail.com>
2020-07-12 11:30:23 -03:00
Martin Storsjö
353aecbb28
pixblockdsp, avdct: Add get_pixels_unaligned
...
Use this in vf_spp.c, where the get_pixels operation is done on
unaligned source addresses.
Hook up the x86 (mmx and sse) versions of get_pixels to this
function pointer, as those implementations seem to support unaligned
use.
This fixes fate-filter-spp on armv7.
Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-13 13:20:08 +03:00
Linjie Fu
8b8492452d
lavc/x86/hevc_add_res: Fix coeff overflow in ADD_RES_SSE_16_32_8
...
Fix overflow for coeff -32768 in function ADD_RES_SSE_16_32_8 with no
performance drop.(SSE2/AVX/AVX2)
./checkasm --test=hevc_add_res --bench
Mainline:
- hevc_add_res.add_residual [OK]
hevc_add_res_32x32_8_sse2: 127.5
hevc_add_res_32x32_8_avx: 127.0
hevc_add_res_32x32_8_avx2: 86.5
Add overflow test case:
- hevc_add_res.add_residual [FAILED]
After:
- hevc_add_res.add_residual [OK]
hevc_add_res_32x32_8_sse2: 126.8
hevc_add_res_32x32_8_avx: 128.3
hevc_add_res_32x32_8_avx2: 86.8
Signed-off-by: Xu Guangxin <guangxin.xu@intel.com>
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2020-03-27 10:57:40 +01:00
Linjie Fu
e9abef437f
lavc/x86/hevc_add_res: Fix overflow in ADD_RES_SSE_8_8
...
Fix overflow for coeff -32768 in function ADD_RES_SSE_8_8 with
no performance drop.
./checkasm --test=hevc_add_res --bench
Mainline:
- hevc_add_res.add_residual [OK]
hevc_add_res_8x8_8_sse2: 15.5
Add overflow test case:
- hevc_add_res.add_residual [FAILED]
After:
- hevc_add_res.add_residual [OK]
hevc_add_res_8x8_8_sse2: 15.5
Signed-off-by: Xu Guangxin <guangxin.xu@intel.com>
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2020-03-27 10:57:40 +01:00
Linjie Fu
0da14ed09e
lavc/x86/hevc_add_res: Fix overflow in ADD_RES_MMX_4_8
...
Fix overflow for coeff -32768 in function ADD_RES_MMX_4_8 with no
performance drop.
./checkasm --test=hevc_add_res --bench
Mainline:
- hevc_add_res.add_residual [OK]
hevc_add_res_4x4_8_mmxext: 15.5
Add overflow test case:
- hevc_add_res.add_residual [FAILED]
After:
- hevc_add_res.add_residual [OK]
hevc_add_res_4x4_8_mmxext: 15.0
Signed-off-by: Xu Guangxin <guangxin.xu@intel.com>
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2020-03-27 10:57:40 +01:00
Michael Niedermayer
24af459d1e
avcodec/x86/diracdsp: Fix high bits on Windows x86_64
...
Found-by: james
2020-01-31 00:04:22 +01:00
Michael Niedermayer
0694b60b7b
avcodec/x86/diracdsp: Fix incorrect src addressing in dequant_subband_32()
...
Fixes: Segfault (not reproducable with asm, which made this hard to debug)
Fixes: decoding errors
Fixes: 19854/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DIRAC_fuzzer-5729372837511168
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-01-30 18:47:21 +01:00
Peter Ross
fd17218558
vp4: prevent unaligned memory access in loop filter
...
VP4 applies a loop filter during motion compensation, causing the block offset
will often by unaligned. This produces a bus error on some platforms, namely
ARMv7 NEON.
This patch adds a unaligned version of the loop filter function pointer
to VP3DSPContext.
Reported-by: Mike Melanson <mike@multimedia.cx>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-10-30 10:06:38 +01:00
James Almer
1faedb9a11
x85/opusdsp: enable the functions on all FMA3 CPUs
...
It's not using ymm registers, so limiting it to CPUs with fast AVX
is not necessary.
Signed-off-by: James Almer <jamrial@gmail.com>
2019-09-11 20:50:45 -03:00
James Almer
80444e23ac
x86/opusdps: clear the high bits from some gprs
...
Fixes checkasm on systems like win64.
Reviewed-by: Lynne
Signed-off-by: James Almer <jamrial@gmail.com>
2019-09-11 20:42:31 -03:00
James Almer
58d167bcd5
avcodec/Makefile: add missing pngdsp dependency to the lscr decoder
...
Signed-off-by: James Almer <jamrial@gmail.com>
2019-05-14 16:47:56 -03:00
James Almer
b41d8ab2e6
x86/v210dec: use named registers
...
Signed-off-by: James Almer <jamrial@gmail.com>
2019-05-03 01:20:18 -03:00
James Almer
abf1aa87ab
x86/v210dec: don't reserve more xmm regs than needed
...
Prevents pointless register saving on win64 for the sse3 and avx
versions of the function.
Signed-off-by: James Almer <jamrial@gmail.com>
2019-05-03 01:09:50 -03:00
James Almer
b0e29357ba
x86/v210dec: remove duplicate load instruction
...
Signed-off-by: James Almer <jamrial@gmail.com>
2019-05-03 01:08:34 -03:00
James Darnley
46f1718cd9
avcodec/x86/v210: fix operands of vpblendd used in new avx2 code
...
Assembly failed when using yasm rather than nasm.
2019-05-02 21:20:54 +02:00
Michael Stoner
ebd6fb23c5
libavcodec Adding ff_v210_planar_unpack AVX2
...
Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck
AVX2 is 1.4x faster than AVX
2019-05-02 19:21:37 +02:00
Lynne
4b7166c9d5
x86/opusdsp: replace loads with shuffles
...
Has a slight speedup.
Can't be carried over to aarch64, since it has no shufps-like instruction.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-04-26 20:39:38 -03:00
Lynne
b43b8d337d
x86/opusdsp: fix WIN64 return value
...
Signed-off-by: James Almer <jamrial@gmail.com>
2019-04-01 11:06:34 -03:00
Lynne
605e330310
x86/opusdsp: implement FMA3 accelerated postfilter and deemphasis
...
58893 decicycles in deemphasis_c, 130548 runs, 524 skips
9475 decicycles in deemphasis_fma3, 130686 runs, 386 skips -> 6.21x speedup
24866 decicycles in postfilter_c, 65386 runs, 150 skips
5268 decicycles in postfilter_fma3, 65505 runs, 31 skips -> 4.72x speedup
Total decoder speedup: ~14%
Deemphasis SIMD based on the following unrolling:
const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1;
float state = coeff;
for (int i = 0; i < len; i += 4) {
y[0] = x[0] + c1*state;
y[1] = x[1] + c2*state + c1*x[0];
y[2] = x[2] + c3*state + c1*x[1] + c2*x[0];
y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0];
state = y[3];
y += 4;
x += 4;
}
2019-04-01 00:22:00 +02:00
Lynne
5468c1d075
celt_pvq_init: only build when CONFIG_OPUS_ENCODER is enabled
...
The entire function was defined away before.
2019-03-31 23:36:43 +02:00
Lynne
4a2c651620
x86/opus_dsp: rename to celt_pvq
...
Its only used in the encoder and in CELT's PVQ.
2019-03-31 23:35:00 +02:00
James Almer
d5d699ab6e
avcodec/h264dsp: change loop filter stride argument to ptrdiff_t
2019-02-20 15:27:43 -03:00
Martin Vignali
9a22e6fa1d
avcodec/proresdsp indent after prev commit
2018-12-02 12:55:35 +01:00
Martin Vignali
c097a32e93
avcodec/proresdec : rename dsp part for 10b and check dspinit for supported bits per raw sample
...
based on patch by Kieran Kunhya
2018-12-02 12:55:31 +01:00
Rostislav Pehlivanov
29eb1c51d7
mdct15: simplify x86 exptab permutation
...
Removes an unneeded copy and does the 5-point permute in-place.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2018-05-07 23:44:40 +01:00
Rostislav Pehlivanov
a72d0fb973
mdct15: simplify the fft15 x86 SIMD
...
Saves 1 gpr and 2 instructions and simplifies the macros a bit.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2018-05-07 23:27:41 +01:00
Kieran Kunhya
f9d3841ae6
mpeg4video: Add support for MPEG-4 Simple Studio Profile.
...
This is a profile supporting > 8-bit video and has a higher quality DCT
2018-04-02 13:06:23 +01:00
Aurelien Jacobs
f1e490b1ad
sbcenc: add MMX optimizations
...
This was originally based on libsbc, and was fully integrated into ffmpeg.
Rough speed test:
C version: speed= 592x
MMX version: speed= 785x
2018-03-07 22:26:53 +01:00