Commit graph

2600 commits

Author SHA1 Message Date
Andreas Rheinhardt
f52b4a6e69 avcodec/snow: Move dsp helper functions to snow_dwt.h
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-02 12:23:16 +02:00
Andreas Rheinhardt
6f7bf64dbc avcodec: Remove DCT, FFT, MDCT and RDFT
They were replaced by TX from libavutil; the tremendous work
to get to this point (both creating TX as well as porting
the users of the components removed in this commit) was
completely performed by Lynne alone.

Removing the subsystems from configure may break some command lines,
because the --disable-fft etc. options are no longer recognized.

Co-authored-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-01 02:25:09 +02:00
Andreas Rheinhardt
d9464f3e34 avcodec/mpegaudiodsp: Init dct32 directly
This avoids using dct.c and will allow removing it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-01 01:53:32 +02:00
Andreas Rheinhardt
27562bf022 avcodec/x86/mpegvideoenc_template: Disable dead code
Since bfb28b5ce8 the permutation
type FF_IDCT_PERM_SIMPLE is ARCH_X86_32-only. So use this
knowledge to disable code for it when not on ARCH_X86_32.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-15 13:08:55 +02:00
Andreas Rheinhardt
7b0b9a25ed avcodec/proresdsp: Pass necessary parameter directly
Only avctx->bits_per_raw_sample is used.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-11 00:26:34 +02:00
Andreas Rheinhardt
947d51f32a avcodec/x86/hpeldsp_vp3: Merge into hpeldsp
Once upon a time, 413abbe164
added versions of some put_no_rnd_pixels functions for use
in VP3 and Theora (with an explicit check so that they are
only used for VP3 and Theora). When this was moved to hpeldsp
(from dsputil) in 3ced55d51c,
the check was replaced by a check for the bitexact flag
(and a CONFIG_VP3_DECODER compile-time check), so that
these functions were now used for other codecs as well.

Later commit 1dfc3cf89d
split off the "VP3-specific bits into a separate file",
yet these bits were not really VP3-specific bits at all
any more. (The error was repeated in commit
0a39c9ac0b.) This commit
has not been reverted, because this would make future
changes from Libav (from where it originated) harder,
yet Libav is no more, so this commit effectively reverts
1dfc3cf89d.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-07 00:24:39 +02:00
Rémi Denis-Courmont
effadce6c7 avcodec/x86/mathops: clip constants used with shift instructions within inline assembly
Fixes assembling with binutil as >= 2.41

Signed-off-by: James Almer <jamrial@gmail.com>
2023-07-20 16:51:53 -03:00
Christopher Degawa
182663a58a get_cabac_inline_x86: Don't inline the assembly function on 32 bit
While the inline cabac assembly has worked correctly in i386 builds
historically, modern compiler updates has started showing issues
with it, when the function gets inlined into larger contexts that
fail to provide the amount of free registers as this function
requires.

This was an issue with Clang on Windows on i386, which was fixed
in c6d284b945324a7bc70ea8b9056040c8148aa835. However, recently
the same issues also have started showing up with GCC (both for
Windows and Linux). Whether the issue appears seems dependent on
a lot of optimizer tuning (e.g. the issue appears or goes away
depenent on the combinaton of -march= and -mtune= options),
potentially due to the compiler making different decisions on
how much to inline.

Fixes: https://trac.ffmpeg.org/ticket/8903

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-04-02 00:34:10 +03:00
Lynne
bbe95f7353
x86: replace explicit REP_RETs with RETs
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)

x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.

The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.

In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
2023-02-01 04:23:55 +01:00
James Darnley
6af453ca38 avcodec/x86: add avx512icl function for v210dec
Ice Lake (Xeon Silver 4316): 2.01x faster (1147±36.8 vs. 571±38.2 decicycles) compared with avx2
2022-12-20 15:02:45 +01:00
James Darnley
f30b4c2f47 avcodec/x86/v210: add some comments to the improved avx2 function 2022-12-20 15:02:45 +01:00
Andreas Rheinhardt
262e7439c6 avcodec/x86/Makefile: Don't build empty files
simple_idct.asm is 32 bit-only since
bfb28b5ce8,
whereas simple_idct10.asm is x64-only. So don't build
the ultimately unneeded and empty files, as some linkers
complain about this: "ranlib: file:
libavcodec/libavcodec.a(simple_idct.o) has no symbols"
(this is from an Xcode toolchain as reported by Ronald S. Bultje).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-12-13 16:16:40 +01:00
James Darnley
5dfb4f9690 avcodec/x86/v210enc: change '0b' binary constant prefix to 'b' suffix
For compatability with yasm from 0.7.0
2022-12-03 16:44:24 +01:00
James Darnley
690b7890f0 avcodec/x86/v210enc: remove unneeded instruction 2022-12-01 18:19:03 +01:00
James Darnley
c67a2b14a2 avcodec/x86/v210enc: expand and correct comments 2022-12-01 18:19:03 +01:00
James Darnley
651cb867b1 avcodec/v210enc: add new 10-bit function for avx512 avx512icl
avx512 on Skylake-X (Xeon D-2123IT):
1.19x faster (970±91.2 vs. 817±104.4 decicycles) compared with avx2

avx512icl on Ice Lake (Xeon Silver 4316):
2.52x faster (1350±5.3 vs. 535±9.5 decicycles) compared with avx2
2022-12-01 18:19:03 +01:00
James Darnley
bda53d2dde avcodec/x86/v210enc: replace register use with named register 2022-12-01 18:19:03 +01:00
Andreas Rheinhardt
4228f8ad49 avcodec/x86/cavsdsp: Remove unused 3DNow-macro
Forgotten in 3221aba879.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-09 17:39:00 +01:00
Lynne
b85e106d5f
libavcodec: remove mdct15
It's not needed nor used by anything anymore, lavu/tx is faster,
and better in every way. RIP.
2022-11-06 14:39:41 +01:00
Lynne
e0661fc805
dca_core: convert to lavu/tx
Thanks to Martin Storsjö <martin@martin.st> for fixing and testing the
arm32 and aarch64 changes.
2022-11-06 14:39:36 +01:00
James Darnley
c3d36e1b3d avcodec/v210enc: add new function for avx2 avx512 avx512icl
Negligible speed difference for avx2 on Zen 2 (Ryzen 5700X) and
Broadwell (Xeon E5-2620 v4):
    1690±4.3 decicycles vs. 1693±78.4
    1439±31.1 decicycles vs 1429±16.7

Moderate speedup with avx512 on Skylake-X (Xeon D-2123IT):
1.22x faster (793±0.8 vs. 649±5.5 decicycles) compared with avx2

Better speedup with avx512icl on Ice Lake (Xeon Silver 4316):
1.77x faster (784±1.8 vs. 442±11.6 decicycles) compared with avx2

Co-authors:
    Henrik Gramner <henrik@gramner.com>
    Kieran Kunhya <kierank@obe.tv>
2022-11-04 19:37:46 +01:00
Andreas Rheinhardt
4209216ee8 avcodec/mpegvideodsp: Make MpegVideoDSP MPEG-4 only
It is only used by gmc/gmc1 which is only used by the MPEG-4
decoder, so move it to Mpeg4DecContext and rename it
to Mpeg4VideoDSP. Also compile it iff the MPEG-4 decoder is compiled.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-20 07:56:17 +02:00
Andreas Rheinhardt
e84348a8ab avcodec/svq1enc: Add SVQ1EncDSPContext, make codec context private
Currently, SVQ1EncContext is defined in a header that is also
included by the arch-specific code that initializes the one
and only dsp function that this encoder uses directly.

But the arch-specific functions to set this dsp function
do not need anything from SVQ1EncContext. This commit therefore
adds a small SVQ1EncDSPContext whose only member is said
function pointer and renames svq1enc.h to svq1encdsp.h
to avoid exposing unnecessary internals to these init
functions (and the whole mpegvideo with it).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-14 16:14:24 +02:00
Carl Eugen Hoyos
60e87faf7f lavc/x86/simple_idct: Fix linking shared libavcodec with MS link.exe
link.exe hangs on empty simple_idct.o

Fixes ticket #9909.
2022-10-10 02:42:44 +02:00
Andreas Rheinhardt
1741adb1c7 avcodec/huffyuvencdsp: Pass pix_fmt directly when initing dsp
It is the only thing that is actually used.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 09:15:39 +02:00
Andreas Rheinhardt
76d8f0dd14 avcodec/ac3dsp: Remove unused parameter
Forgotten in fd98594a88.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-29 23:37:13 +02:00
Andreas Rheinhardt
4393331250 avcodec/dirac_dwt: Avoid conversions between function pointers and void*
Pointers to void can be converted to any pointer to incomplete or
object type and back; but they are nevertheless not completely generic
pointers: There is no provision in the C standard that guarantees their
convertibility with function pointers. C90 lacks a generic function
pointer, C99 made every function pointer a generic function pointer and
still disallows the convertibility with void *. Both GCC as well as
Clang warn about this when using -pedantic.

Therefore use unions to avoid these conversions.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-28 23:37:12 +02:00
James Almer
0922c6b01b x86/lpc: use fused negative multiply-add instructions where useful
Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-22 18:17:26 -03:00
James Almer
0627e6d74c avcodec/lpc: zero the middle odd sample in the output
Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-22 18:17:26 -03:00
James Almer
c8c4a162fc avcodec/lpc: use ptrdiff_t for length parameters
Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-22 18:17:26 -03:00
James Almer
48615f0a78 x86/aacpsdsp: add ps_hybrid_analysis_fma3
This replace the sse3 version, which was not really faster than the sse one.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-22 13:27:43 -03:00
James Almer
2bcf86d53d x86/aacpsdsp: precompute constant factors
Inspired by the optimization done to the C version by Rémi Denis-Courmont.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-22 13:27:43 -03:00
Martin Storsjö
c9aa6164d4 x86/lpc: Fix parameter sign extension, unbreaking checkasm-lpc on x86_64 windows
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-22 11:13:33 +03:00
Lynne
b67776e12f
x86/lpc: fix even scalar loop overreads/writes
Passes checkasm with valgrind, tested to sizes of more than 4000 samples.
2022-09-22 04:27:19 +02:00
Lynne
dea944b838
x86/lpc: fix odd scalar loop overreads/writes 2022-09-22 03:07:41 +02:00
Andreas Rheinhardt
9beba05311 avcodec/fmtconvert: Remove unused AVCodecContext parameter
Unused since d74a8cb7e4.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-21 20:26:40 +02:00
Andreas Rheinhardt
fd72d8aea3 avcodec/blockdsp: Remove unused AVCodecContext parameter
Possible since be95df12bb.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-21 20:24:40 +02:00
Andreas Rheinhardt
57f3ca20dc avcodec/cavsdsp: Remove unused function parameter
Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-21 20:24:22 +02:00
Lynne
3ade6a8644
x86/lpc: implement a new Welch windowing function
Old one was written with the assumption only even inputs would be given.
This very messy replacement supports even and odd inputs, and supports
AVX2 for extra speed. The buffers given are usually quite big (4k samples),
so the speedup is worth it.
The new SSE version is still faster than the old inline asm version by 33%.

Also checkasm is provided to make sure this monstrosity works.

This fixes some FATE tests.
2022-09-21 07:12:39 +02:00
Rémi Denis-Courmont
b52034270a lavc/vorbisdsp: use ptrdiff_t rather than intptr_t
... for a difference between pointers.
2022-09-19 13:51:00 -03:00
Paul B Mahol
37a503ac87 avcodec/x86/audiodsp: add scalarproduct avx2 2022-09-13 17:43:16 +02:00
Andreas Rheinhardt
a54e53a1c4 avcodec/vp8dsp: Constify src in vp8_mc_func
Reviewed-by: Peter Ross <pross@xvid.org>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-11 20:57:51 +02:00
Andreas Rheinhardt
ad12e31b03 avcodec/x86/flacdsp_init: Remove double ';'
Inside a function, the second ';' in ";;" is just a null statement,
but it is actually illegal outside of functions. Compilers
nevertheless accept it without warning, except when in -pedantic
mode when e.g. Clang emits a -Wextra-semi warning. Therefore
remove the unnecessary ';'.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-05 20:54:57 +02:00
Paul B Mahol
1e202d89c9 avcodec/x86/flacdsp: fix bug in decorrelation
Fixes #9297
2022-09-05 12:27:49 +02:00
Andreas Rheinhardt
0bb0c26799 avutil/mem_internal: Fix headers
Including avassert.h is unnecessary since commit
786be70e28.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-24 03:43:52 +02:00
Martin Storsjö
dc55e63578 x86: Don't hardcode the height to 8 in sad8_xy2_mmx
The height is hardcoded in some of the me_cmp functions, but not
in all of them. But in the case of all other functions, it's hardcoded
in the same place in SIMD functions as in the C reference functions,
while this one function differs from the behaviour of the C code.

(Before 542765ce3e, there were a
couple other sad8_*_mmx functions with similar hardcoded height.)

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-17 00:00:50 +03:00
Andreas Rheinhardt
6c4595190e avcodec/flacdsp: Split encoder-only parts into a ctx of its own
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-05 03:28:45 +02:00
Andreas Rheinhardt
3a869cd5cd avcodec/flacdsp: Remove unused function parameter
Forgotten in e609cfd697.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-05 03:28:45 +02:00
Andreas Rheinhardt
333b32af8e avcodec/h264chroma: Constify src in h264_chroma_mc_func
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-05 03:02:13 +02:00
Andreas Rheinhardt
b3bbbb14d0 avcodec/hevcdsp: Constify src pointers
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-05 02:54:04 +02:00