Commit graph

352 commits

Author SHA1 Message Date
Andreas Rheinhardt
eccf130fdb {lib{avcodec,swscale}/x86/,}Makefile: Kill MMX-OBJS
Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-30 22:20:13 +01:00
Andreas Rheinhardt
ba94177242 avcodec/x86/Makefile: Only compile ASM init files when X86ASM is enabled
To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.

This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.

Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-30 22:20:13 +01:00
Andreas Rheinhardt
65b4feb782 avcodec/x86/Makefile: Remove redundant WebP decoder->vp8dsp dependencies
Redundant since 35b02732b9.

Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-30 22:20:13 +01:00
Andreas Rheinhardt
18e08101eb avcodec/x86/Makefile: Don't use MMX-OBJS for fdct.o
MMX has been removed in d402ec6be9.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-04 11:41:29 +01:00
Andreas Rheinhardt
74a88c0c11 avcodec/x86/cavsdsp: Add SSE2 mc20 horizontal motion compensation
Basically a direct port of the MMXEXT one. The main difference
is of course that one can process eight pixels (unpacked to words)
at a time, leading to speedups.

avg_cavs_qpel_pixels_tab[0][2]_c:                      700.1 ( 1.00x)
avg_cavs_qpel_pixels_tab[0][2]_mmxext:                 158.1 ( 4.43x)
avg_cavs_qpel_pixels_tab[0][2]_sse2:                    86.0 ( 8.14x)
avg_cavs_qpel_pixels_tab[1][2]_c:                      171.9 ( 1.00x)
avg_cavs_qpel_pixels_tab[1][2]_mmxext:                  39.4 ( 4.36x)
avg_cavs_qpel_pixels_tab[1][2]_sse2:                    21.7 ( 7.92x)
put_cavs_qpel_pixels_tab[0][2]_c:                      525.7 ( 1.00x)
put_cavs_qpel_pixels_tab[0][2]_mmxext:                 148.5 ( 3.54x)
put_cavs_qpel_pixels_tab[0][2]_sse2:                    75.2 ( 6.99x)
put_cavs_qpel_pixels_tab[1][2]_c:                      129.5 ( 1.00x)
put_cavs_qpel_pixels_tab[1][2]_mmxext:                  36.7 ( 3.53x)
put_cavs_qpel_pixels_tab[1][2]_sse2:                    19.0 ( 6.81x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-08 20:40:08 +02:00
Andreas Rheinhardt
92ae9d1ffc configure: Remove vc1dsp->qpeldsp dependency
It only needs it for some x86 fpel functions; instead
add a direct dependency for that.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-04 07:06:32 +02:00
Henrik Gramner
0b5d46ee1c vp9: Add 8bpc AVX2 asm for inverse transforms 2025-09-19 23:12:59 +00:00
Kacper Michajłow
5ff2500514
avcodec/x86/Makefile: add missing x86/proresdsp.o for prores raw 2025-08-15 20:45:20 +02:00
Kacper Michajłow
a9e7b5aa07
avcodec/Makefile: add missing dependency for prores raw decoder
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-08-14 04:43:16 +02:00
Andreas Rheinhardt
9b409ea1e6 configure: Factor mpegvideoencdsp out of mpegvideoenc
This will allow to relax the dependency on mpegvideoenc
for several codecs.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-06-21 22:08:52 +02:00
Henrik Gramner
eda0ac7e5f avcodec/x86/vp9: Add AVX-512ICL for 16x16 and 32x32 10bpc inverse transforms 2025-05-26 15:26:11 +02:00
Henrik Gramner
fd18ae88ae avcodec/x86/vp9: Add AVX-512ICL for 16x16 and 32x32 8bpc inverse transforms 2025-05-19 15:56:27 +02:00
Mark Thompson
d03c99441d lavc/apv: AVX2 transquant for x86-64
Typical checkasm result on Alder Lake:

decode_transquant_8_c:                                 464.2 ( 1.00x)
decode_transquant_8_avx2:                               86.2 ( 5.38x)
decode_transquant_10_c:                                481.6 ( 1.00x)
decode_transquant_10_avx2:                              83.5 ( 5.77x)
2025-04-27 15:52:30 +01:00
Nuo Mi
0a6388d1da avcodec/hevcdec: remove hevc prefix for x86 asm files 2024-12-22 21:00:06 +08:00
Andreas Rheinhardt
df2416ca97 Remove remnants of prores_lgpl decoder
Forgotten in 5c6a3604f0.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-07 23:53:26 +02:00
James Almer
6b6eb7d74e x86/Makefile: fix hevc and vvc dependency of h2656dsp.o
And remove tabs while at it.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-02-01 16:02:50 -03:00
Wu Jianhua
7d9f1f5485 avcodec/x86/hevc_mc: move put/put_uni to h26x/h2656_inter.asm
This enable that the asm optimization can be reused by VVC

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
2024-02-01 19:54:28 +08:00
Andreas Rheinhardt
6f7bf64dbc avcodec: Remove DCT, FFT, MDCT and RDFT
They were replaced by TX from libavutil; the tremendous work
to get to this point (both creating TX as well as porting
the users of the components removed in this commit) was
completely performed by Lynne alone.

Removing the subsystems from configure may break some command lines,
because the --disable-fft etc. options are no longer recognized.

Co-authored-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-01 02:25:09 +02:00
Andreas Rheinhardt
d9464f3e34 avcodec/mpegaudiodsp: Init dct32 directly
This avoids using dct.c and will allow removing it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-01 01:53:32 +02:00
Andreas Rheinhardt
947d51f32a avcodec/x86/hpeldsp_vp3: Merge into hpeldsp
Once upon a time, 413abbe164
added versions of some put_no_rnd_pixels functions for use
in VP3 and Theora (with an explicit check so that they are
only used for VP3 and Theora). When this was moved to hpeldsp
(from dsputil) in 3ced55d51c,
the check was replaced by a check for the bitexact flag
(and a CONFIG_VP3_DECODER compile-time check), so that
these functions were now used for other codecs as well.

Later commit 1dfc3cf89d
split off the "VP3-specific bits into a separate file",
yet these bits were not really VP3-specific bits at all
any more. (The error was repeated in commit
0a39c9ac0b.) This commit
has not been reverted, because this would make future
changes from Libav (from where it originated) harder,
yet Libav is no more, so this commit effectively reverts
1dfc3cf89d.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-07 00:24:39 +02:00
Andreas Rheinhardt
262e7439c6 avcodec/x86/Makefile: Don't build empty files
simple_idct.asm is 32 bit-only since
bfb28b5ce8,
whereas simple_idct10.asm is x64-only. So don't build
the ultimately unneeded and empty files, as some linkers
complain about this: "ranlib: file:
libavcodec/libavcodec.a(simple_idct.o) has no symbols"
(this is from an Xcode toolchain as reported by Ronald S. Bultje).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-12-13 16:16:40 +01:00
Lynne
b85e106d5f
libavcodec: remove mdct15
It's not needed nor used by anything anymore, lavu/tx is faster,
and better in every way. RIP.
2022-11-06 14:39:41 +01:00
Andreas Rheinhardt
4209216ee8 avcodec/mpegvideodsp: Make MpegVideoDSP MPEG-4 only
It is only used by gmc/gmc1 which is only used by the MPEG-4
decoder, so move it to Mpeg4DecContext and rename it
to Mpeg4VideoDSP. Also compile it iff the MPEG-4 decoder is compiled.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-20 07:56:17 +02:00
Lynne
3ade6a8644
x86/lpc: implement a new Welch windowing function
Old one was written with the assumption only even inputs would be given.
This very messy replacement supports even and odd inputs, and supports
AVX2 for extra speed. The buffers given are usually quite big (4k samples),
so the speedup is worth it.
The new SSE version is still faster than the old inline asm version by 33%.

Also checkasm is provided to make sure this monstrosity works.

This fixes some FATE tests.
2022-09-21 07:12:39 +02:00
Andreas Rheinhardt
6c4595190e avcodec/flacdsp: Split encoder-only parts into a ctx of its own
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-05 03:28:45 +02:00
Paul B Mahol
b69c91bbee avcodec/x86: add cfhdenc SIMD 2021-02-27 17:09:44 +01:00
Paul B Mahol
389cc142fb avcodec/cfhd: add x86 SIMD
Overall speed changes for 1920x1080, yuv422p10le, 60fps from: 0.19x to 0.343x
2020-08-26 21:13:38 +02:00
James Almer
58d167bcd5 avcodec/Makefile: add missing pngdsp dependency to the lscr decoder
Signed-off-by: James Almer <jamrial@gmail.com>
2019-05-14 16:47:56 -03:00
Lynne
605e330310 x86/opusdsp: implement FMA3 accelerated postfilter and deemphasis
58893 decicycles in deemphasis_c,  130548 runs,    524 skips
9475 decicycles in deemphasis_fma3,  130686 runs,    386 skips -> 6.21x speedup

24866 decicycles in postfilter_c,   65386 runs,    150 skips
5268 decicycles in postfilter_fma3,   65505 runs,     31 skips -> 4.72x speedup

Total decoder speedup: ~14%

Deemphasis SIMD based on the following unrolling:
const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1;
float state = coeff;

for (int i = 0; i < len; i += 4) {
    y[0] = x[0] + c1*state;
    y[1] = x[1] + c2*state + c1*x[0];
    y[2] = x[2] + c3*state + c1*x[1] + c2*x[0];
    y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0];

    state = y[3];
    y += 4;
    x += 4;
}
2019-04-01 00:22:00 +02:00
Lynne
5468c1d075 celt_pvq_init: only build when CONFIG_OPUS_ENCODER is enabled
The entire function was defined away before.
2019-03-31 23:36:43 +02:00
Lynne
4a2c651620 x86/opus_dsp: rename to celt_pvq
Its only used in the encoder and in CELT's PVQ.
2019-03-31 23:35:00 +02:00
Aurelien Jacobs
f1e490b1ad sbcenc: add MMX optimizations
This was originally based on libsbc, and was fully integrated into ffmpeg.

Rough speed test:
C version:    speed= 592x
MMX version:  speed= 785x
2018-03-07 22:26:53 +01:00
Martin Vignali
9b8c1224d7 libavcodec/exr : add X86 SIMD for reorder_pixels
Signed-off-by: James Almer <jamrial@gmail.com>
2017-09-17 17:53:57 -03:00
Ivan Kalvachev
7205513f8f SIMD opus pvq_search implementation
Explanation on the workings and methods used by the
Pyramid Vector Quantization Search function
could be found in the following Work-In-Progress mail threads:
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212146.html
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212816.html
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213030.html
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213436.html

Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
2017-08-18 17:18:32 +01:00
Paul B Mahol
4ed7c2bbc3 avcodec/utvideodec: add SIMD for restore_rgb_planes
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-06-27 09:54:10 +02:00
Rostislav Pehlivanov
e1120b1c54 mdct15: add assembly optimizations for the 15-point FFT
c:    1802 decicycles in fft15,16774635 runs,   2581 skips
avx:   865 decicycles in fft15,16776378 runs,    838 skips

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2017-06-23 23:45:37 +01:00
Diego Biurrun
fd502f4f5f build: Generalize yasm/nasm-related variable names
None of them are specific to the YASM assembler.

(Cherry-picked from libav commit 39e208f4d4)

Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-21 17:00:29 -03:00
James Darnley
8e89f6fd37 avcodec/x86: move simple_idct to external assembly 2017-05-30 13:20:42 +02:00
Ronald S. Bultje
c9d98c5649 cavs: convert idct from inline asm to yasm. 2017-04-06 10:03:27 -04:00
Clément Bœsch
40ac226014 lavc/x86/hevc: rename hevc_res_add to hevc_add_res
This will simplify incoming merge.
2017-03-24 11:45:23 +01:00
Clément Bœsch
c66bd8f3ff Merge commit 'b57e38f52c'
* commit 'b57e38f52c':
  ac3dsp: x86: Replace inline asm for in-decoder downmixing with standalone asm

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-22 12:49:29 +01:00
James Almer
ca8a3978e5 Merge commit '1dfc3cf89d'
* commit '1dfc3cf89d':
  x86: hpeldsp: Split off VP3-specific bits into a separate file

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 14:49:29 -03:00
James Almer
cf9ef83960 huffyuvencdsp: move shared functions to a new lossless_videoencdsp context
Signed-off-by: James Almer <jamrial@gmail.com>
2017-01-12 22:53:04 -03:00
Rostislav Pehlivanov
d2ae5f77c6 aacenc: add SIMD optimizations for abs_pow34 and quantization
Performance improvements:

quant_bands:
with:     681 decicycles in quant_bands, 8388453 runs,    155 skips
without: 1190 decicycles in quant_bands, 8388386 runs,    222 skips
Around 42% for the function

Twoloop coder:

abs_pow34:
with/without: 7.82s/8.17s
Around 4% for the entire encoder

Both:
with/without: 7.15s/8.17s
Around 12% for the entire encoder

Fast coder:

abs_pow34:
with/without: 3.40s/3.77s
Around 10% for the entire encoder

Both:
with/without: 3.02s/3.77s
Around 20% faster for the entire encoder

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: James Almer <jamrial@gmail.com>
2016-10-18 21:41:18 +01:00
Justin Ruggles
b57e38f52c ac3dsp: x86: Replace inline asm for in-decoder downmixing with standalone asm
Adds a wrapper function for downmixing which detects channel count changes
and updates the selected downmix function accordingly.

Simplification and porting to current x86inc infrastructure by Diego Biurrun.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-10-01 00:46:25 +02:00
Anton Khirnov
12004a9a7f audiodsp/x86: yasmify vector_clipf_sse 2016-09-22 09:47:52 +02:00
Anton Khirnov
89466de4ae vp9/x86: rename vp9dsp to vp9mc
It only contains the MC SIMD, other SIMD will go into different files.
2016-08-03 10:57:50 +02:00
James Almer
efc9d5c4bc x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4}
Signed-off-by: James Almer <jamrial@gmail.com>
2016-08-02 15:48:04 -03:00
Diego Biurrun
1dfc3cf89d x86: hpeldsp: Split off VP3-specific bits into a separate file 2016-07-20 18:33:25 +02:00
James Almer
fca3c3b619 hevc: Add AVX2 DC IDCT
Originally written by Pierre Edouard Lepere <pierre-edouard.lepere@insa-rennes.fr>.
Integrated to Libav by Josh de Kock <josh@itanimul.li>.

Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
2016-07-18 15:27:13 +02:00