ff_{avg,put}_h264_qpel8or16_hv2_lowpass_ssse3()
currently is almost the disjoint union of the codepaths
for sizes 8 and 16. This size is a compile-time constant
at every callsite. So split the function and avoid
the runtime branch.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
tmpstride is unused. This also allows to remove said parameter
from lots of functions in h264_qpel.c.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
They are constant since the size 16 version is no longer emulated
via the size 8 version.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is only used by h264_qpel.c and only with height four
(which is unrolled) and uses a loop in order to handle
multiples of four as height. Remove the loop and the height
parameter and move the function to h264_qpel_8bit.asm.
This leads to a bit of code duplication, but this is simpler
than all the %if checks necessary to achieve the same outcome
in fpel.asm.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The repetition count is always one since
2cf9e733c6.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
avg_h264_qpel only supports 16x16,8x8 and 4x4 blocksizes,
so it is currently unnecessarily large.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The 2x2 put functions are only used by Snow and Snow uses
only the eight bit versions. The rest is dead code. Disabling
it saved 41277B here.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It only needs it for some x86 fpel functions; instead
add a direct dependency for that.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Affected standalone builds of the VC-1 parser.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This can be easily achieved by moving code only used by the MPEG-4
decoder behind #if CONFIG_MPEG4_DECODER.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ff_avg_pixels{4,8,16}_l2_mmxext() are always called with height
equal to their blocksize. And ff_{put,avg}_pixels4_l2_mmxext()
are furthermore always called with both strides being equal.
So remove these redundant function parameters.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is more correct given that qpel_mc_func already uses ptrdiff_t;
it also allows to avoid movsxdifnidn.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The ff_avg_pixels{4,8,16}_l2_mmxext() functions are only ever
used in the last step (the one that actually writes to the dst buffer)
where the number of lines to process is always equal to the
dimensions of the block, whereas ff_put_pixels{8,16}_mmxext()
are also used in intermediate calculations where the number of
lines can be 9 or 17.
The code in qpel.asm uses common macros for both and processes
more than one line per loop iteration; it therefore checks
for whether the number of lines is odd and treats this line separately;
yet this special handling is only needed for the put functions,
not the avg functions. It has therefore been %if'ed away for these.
The check is also not needed for ff_put_pixels4_l2_mmxext() which
is only used by H.264 which always processes four lines. Because
ff_{avg,put}_pixels4_l2_mmxext() processes four lines in a single loop
iteration, not only the odd-height handling, but the whole loop
could be removed.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
liblc3 supports arbitrary strides, so one can simply use a stride
of zero to make it read the same zero value again and again.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The code optimizes throughput by letting the encoder work on frame N
until frame N+1 is ready for submission, but this hurts low-delay uses
by delaying output by one frame. Don't delay output beyond what is
necessary when AV_CODEC_FLAG_LOW_DELAY is used.
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
This is required placement by standard [[maybe_unused]] attribute, works
the same for __attribute__((unused)).
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
This affects the {avg,put}_no_rnd_pixels16_{x,y}2 MMX and
(put-only) MMXEXT versions. Removing these functions saved
1184B here.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These currently only exist as MMX versions.
The added functions occupy 320B here. So far, they are only for
the x2 and y2 (i.e. right and down, not down-right) directions.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These currently only exist as MMX and (not bitexact) MMXEXT versions.
The added functions occupy 288B here. So far, they are only for
the x2 and y2 (i.e. right and down, not down-right) directions.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only the blocksizes 16 and 8 are implemented, yet the motion estimation
code touches the blocksize 4 entries. But really nothing touches
the blocksize 2 entries, so that we can reduce the put_no_rnd_pixels_tab
array size to [3][4].
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMX(EXT) functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMX and 3DNOW functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.
Also merge the mpegvideoenc_qns_template.c file into the main file.
The 3DNOW functions removed in this commit were the last in the
codebase.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMX(EXT) functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMX(EXT) functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The mc00 versions (i.e. the qdsp functions with no subpixel
interpolation) are just wrappers around their fpel versions.
There are SSE2 versions of these, yet the qpel code only
uses the MMX(EXT) versions. This commit changes this and
also removes the MMX(EXT) versions.
This also allowed to remove ff_avg_pixels16_mmxext,
ff_put_pixels16_mmx.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
CPUs which support SSE2, but not in a fast way (so that
they get the additional AV_CPU_FLAG_SSE2SLOW) are ancient
nowadays (2007 and older), so ignore the distinction between
the two and remove MMX and MMXEXT functions that are now
overridden by SSE2 functions.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
CPUs which support SSE2, but not in a fast way (so that
they get the additional AV_CPU_FLAG_SSE2SLOW) are ancient
nowadays (2007 and older), so ignore the distinction between
the two and remove MMX and MMXEXT functions that are now
overridden by SSE2 functions.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Forgotten in a51279bbde because
I only looked for MMX(EXT) functions overridden by SSE2.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The value of sizeof() is of type size_t which means that
an expression like
src1[i * src_stride1 + 4 * (int)sizeof(pixel)]
will use a very large offset if src_stride1 is sufficiently negative.
It works in practice (because it is correct modulo SIZE_MAX),
but UBSan treats it as error:
libavcodec/hpel_template.c:104:1: runtime error: addition of unsigned offset to 0x7ffdfa0391d8 overflowed to 0x7ffdfa0391cc
Fix this by casting sizeof(pixel) to int.
(This has been uncovered by a checkasm test for the hpeldsp
which will be added in a later commit.)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This commit fixes two issues in the documentation:
a) The documentation for {put,avg}_pixels_tab only mentions
widths 16 and 8, although it explicitly mentions that there
are four horizontal blocksizes. This part of the patch
basically reverts e5771f4f37.
b) The restrictions on height don't match the reality. While
most users abide by it, some do not:
i) vp56.c copies a 16x12 block.
ii) indeo3 can copy an arbitrary multiple of four lines
for block widths 4, 8 and 16.
iii) SVQ3 can use block sizes luma block sizes 16x16, 8x16,
16x8, 8x8, 4x8, 8x4 and 4x4 and the corresponding
8x8, 4x8, 8x4, 4x4, 2x4, 4x2 and 2x2 chroma block sizes.
This implies that for widths 2 and 4 height can be two
and is guaranteed to be at least even. For all other widths,
height can be a multiple of four.
Furthermore, a comment for the SVQ3 blocksizes has been added.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>