Commit graph

23 commits

Author SHA1 Message Date
Henrik Gramner
10a061ba99 vp9: Add AVX-512ICL asm for 8bpc subpel mc 2025-08-28 12:45:52 +00:00
Ronald S. Bultje
6e13c30a8f vp9: don't overread by 4 pixels in ff_vp9_avg4_mmxext().
If the block is at the end of the allocated buffer and there is no
padding, this will over-read, which may cause crashes. Reported by
Firefox.
2022-06-01 14:31:32 -04:00
Clément Bœsch
8286c359ad Merge commit 'e99ecda550'
* commit 'e99ecda550':
  checkasm: add vp9 MC tests.
  vp9mc/x86: sse2 MC assembly.
  vp9mc/x86: add AVX and AVX2 MC
  vp9mc/x86: rename ff_* to ff_vp9_*
  vp9mc/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext
  vp9mc/x86: simplify a few inits.
  vp9mc/x86: add 16px functions (64bit only).

Noop (aside from a formatting comment in vp9mc.asm). We already have all
of this. We should consider making a final diff between the two projects
when the dust comes down.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-16 20:25:39 +01:00
Clément Bœsch
a4f5e79f7c Merge commit '89466de4ae'
* commit '89466de4ae':
  vp9/x86: rename vp9dsp to vp9mc

File was already renamed, only the top description is updated.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-16 20:10:47 +01:00
Ronald S. Bultje
9790b44a89 vp9mc/x86: sse2 MC assembly.
Also a slight change to the ssse3 code, which prevents a theoretical
overflow in the sharp filter.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-03 11:04:51 +02:00
James Almer
67922b4ee4 vp9mc/x86: add AVX and AVX2 MC
Roughly 25% faster MC than ssse3 for blocksizes 32 and 64.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-03 11:00:08 +02:00
Clément Bœsch
3cda179f18 vp9mc/x86: rename ff_* to ff_vp9_*
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-03 10:57:55 +02:00
James Almer
8be8444d01 vp9mc/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext
pavgb is an sse integer instruction, so the mmxext flag is enough

Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-03 10:57:55 +02:00
Clément Bœsch
6ab642d69d vp9mc/x86: simplify a few inits.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-03 10:57:55 +02:00
Ronald S. Bultje
3a09494939 vp9mc/x86: add 16px functions (64bit only).
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-03 10:57:55 +02:00
Anton Khirnov
89466de4ae vp9/x86: rename vp9dsp to vp9mc
It only contains the MC SIMD, other SIMD will go into different files.
2016-08-03 10:57:50 +02:00
James Almer
4bb6cb4c7d x86/vp9mc: fix string concatenation of fullpel function names
Fixes compilation with NASM

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-20 12:32:27 -03:00
Ronald S. Bultje
344d519040 vp9: add subpel MC SIMD for 10/12bpp. 2015-09-16 21:11:34 -04:00
Ronald S. Bultje
77f359670f vp9: add fullpel (avg) MC SIMD for 10/12bpp. 2015-09-16 21:11:34 -04:00
Ronald S. Bultje
6354ff0383 vp9: add fullpel (put) MC SIMD for 10/12bpp. 2015-09-16 21:11:34 -04:00
Ronald S. Bultje
cae893f692 vp9/x86: sse2 MC assembly.
Also a slight change to the ssse3 code, which prevents a theoretical
overflow in the sharp filter.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-15 02:34:05 +01:00
James Almer
6b2caa321f x86/vp9: add AVX and AVX2 MC
Roughly 25% faster MC than ssse3 for blocksizes 32 and 64.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-22 22:35:03 -03:00
Christophe Gisquet
4e128ab0b1 x86: vpx/h264/hevc/mpeg2: share constants
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 18:36:31 +02:00
Clément Bœsch
c4148a6668 x86/vp9mc: add vp9 namespace. 2014-03-29 18:13:15 +01:00
James Almer
26800e3864 vp9/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext
pavgb is an sse integer instruction, so the mmxext flag is enough

Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-18 17:08:25 +01:00
Clément Bœsch
9cc8fa63dd vp9/x86: simplify a few mc inits. 2014-01-16 07:48:27 +01:00
Ronald S. Bultje
18175baa54 vp9/x86: 16px MC functions (64bit only).
Cycle counts for large MCs (old -> new on ped1080p.webm, mx!=0&&my!=0):
16x8:    876 ->   870  (0.7%)
16x16:  1444 ->  1435  (0.7%)
16x32:  2784 ->  2748  (1.3%)
32x16:  2455 ->  2349  (4.5%)
32x32:  4641 ->  4084 (13.6%)
32x64:  9200 ->  7834 (17.4%)
64x32:  8980 ->  7197 (24.8%)
64x64: 17330 -> 13796 (25.6%)
Total decoding time goes from 9.326sec to 9.182sec.
2013-12-26 21:05:10 -05:00
Ronald S. Bultje
8729964b99 vp9: split x86 assembly in two files.
(And in future, loopfilter or intra pred could be put in their own
respective files also.)
2013-12-07 12:39:35 -05:00