Before this commit, the (32-bit only) simple idct came in three
versions: A pure MMX IDCT and idct-put and idct-add versions
which use SSE2 at the put and add stage, but still use pure MMX
for the actual IDCT.
This commit ports said IDCT to SSE2; this was entirely trivial
for the IDCT1-5 and IDCT7 parts (where one can directly use
the full register width) and was easy for IDCT6 and IDCT8
(involving a few movhps and pshufds). Unfortunately, DC_COND_INIT
and Z_COND_INIT still use only the lower half of the registers.
This saved 4658B here; the benchmarking option of the dct test tool
showed a 15% speedup.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Unnecessary since the Xvid IDCT no longer uses MMX registers at all.
(Notice that the simple MMX IDCT issues emms and is therefore ABI
compliant.)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The test has been removed in bfb28b5ce8
when MMX idctdsp functions overridden by SSE2 were removed;
ff_simple_idct_mmx() has been completely disabled in this patch
for x64 and so the test should have been disabled on x64 instead
of removing it.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
E.g. check that the ordinary entries are valid (e.g. no negative sample
rates, must have pix fmt descriptor) and also that the sentinels
are valid where they contain redundant information.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For decoders, the pixel format is negotiated via the get_format callback
(if there is a choice at all), so decoders should not set pix_fmts.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Add av_free() to free s.temp_dwt_buffer and s.temp_idwt_buffer at the end of the function to avoid memory leak.
Fixes: 5d48e4eafa ("Merge commit 'a6a750c7ef240b72ce01e9653343a0ddf247d196'")
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Add check for the return value of avcodec_alloc_context3() to avoid potential NULL pointer dereference.
Fixes: 5d48e4eafa ("Merge commit 'a6a750c7ef240b72ce01e9653343a0ddf247d196'")
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Add av_packet_free() to free avpkt_clone and avpkt in the error paths to avoid potential memory leak.
Fixes: da3c69a5a9 ("Added test for libavcodec/avpacket.c")
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Add av_free() to free extra_data if av_packet_add_side_data() fails.
Fixes: da3c69a5a9 ("Added test for libavcodec/avpacket.c")
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The iirfilter is only used in its test tool since
01ecb7172b which
stopped using it in AAC, its only user.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is unnecessary, as we already have the entries sorted by
probability and therefore implicitly by length. All we need
on top of that to build the tree is the number of entries
of a given length.
Doing so gives a 3.6% speedup of ff_mjpeg_encode_huffman_close()
here; it also saves about 640B of .text here.
The new code puts values with higher probability to the left
of the tree. The old code did not and therefore
the FATE checksums needed to be updated. Due to MJPEG's
0xFF unescaping file sizes as well as file checksums
needed to be updated; the decoded picture hashes stayed
the same. Given that codes on the left of the tree have
on average fewer bits set than codes on the right, the
file sizes mostly improve (all except vsynth3-mjpeg-444).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Not every function will be set, so zero the context
to initialize everything.
This also allows to remove an initialization in dvenc.c.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Found while reviewing: CID1500309 Unintentional integer overflow
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This issue cannot happen with the current function parameters
Fixes: CID1500309 Unintentional integer overflow
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Helps: CID1518967 Unchecked return value
Helps: CID1518968 Unchecked return value
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The code is imported from libjpeg-turbo-3.0.1. The neon registers used
have been changed to avoid modifying v8-v15.
Reviewed-by: Martin Storsjö <martin@martin.st>
Before commit f025b8e110,
every frame-threaded decoder used ThreadFrames, even when
they did not have any inter-frame dependencies at all.
In order to distinguish those decoders that need the AVBuffer
for progress communication from those that do not (to avoid
the allocation for the latter), the former decoders were marked
with the FF_CODEC_CAP_ALLOCATE_PROGRESS internal codec cap.
Yet distinguishing these two can be done in a more natural way:
Don't use ThreadFrames when not needed and split ff_thread_get_buffer()
into a core function that calls the user's get_buffer2 callback
and a wrapper around it that also allocates the progress AVBuffer.
This has been done in 02220b88fc
and since that commit the ALLOCATE_PROGRESS cap was nearly redundant.
The only exception was WebP and VP8. WebP can contain VP8
and uses the VP8 decoder directly (i.e. they share the same
AVCodecContext). Both decoders are frame-threaded and VP8
has inter-frame dependencies (in general, not in valid WebP)
and therefore the ALLOCATE_PROGRESS cap. In order to avoid
allocating progress in case of a frame-threaded WebP decoder
the cap and the check for the cap has been kept in place.
Yet now the VP8 decoder has been switched to use ProgressFrames
and therefore there is just no reason any more for this check
and the cap. This commit therefore removes both.
Also change the value of FF_CODEC_CAP_USES_PROGRESSFRAMES
to leave no gaps.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Frame-threaded decoders with inter-frame dependencies
use the ThreadFrame API for syncing. It works as follows:
During init each thread allocates an AVFrame for every
ThreadFrame.
Thread A reads the header of its packet and allocates
a buffer for an AVFrame with ff_thread_get_ext_buffer()
(which also allocates a small structure that is shared
with other references to this frame) and sets its fields,
including side data. Then said thread calls ff_thread_finish_setup().
From that moment onward it is not allowed to change any
of the AVFrame fields at all any more, but it may change
fields which are an indirection away, like the content of
AVFrame.data or already existing side data.
After thread A has called ff_thread_finish_setup(),
another thread (the user one) calls the codec's update_thread_context
callback which in turn calls ff_thread_ref_frame() which
calls av_frame_ref() which reads every field of A's
AVFrame; hence the above restriction on modifications
of the AVFrame (as any modification of the AVFrame by A after
ff_thread_finish_setup() would be a data race). Of course,
this av_frame_ref() also incurs allocations and therefore
needs to be checked. ff_thread_ref_frame() also references
the small structure used for communicating progress.
This av_frame_ref() makes it awkward to propagate values that
only become known during decoding to later threads (in case of
frame reordering or other mechanisms of delayed output (like
show-existing-frames) it's not the decoding thread, but a later
thread that returns the AVFrame). E.g. for VP9 when exporting video
encoding parameters as side data the number of blocks only
becomes known during decoding, so one can't allocate the side data
before ff_thread_finish_setup(). It is currently being done afterwards
and this leads to a data race in the vp9-encparams test when using
frame-threading. Returning decode_error_flags is also complicated
by this.
To perform this exchange a buffer shared between the references
is needed (notice that simply giving the later threads a pointer
to the original AVFrame does not work, because said AVFrame will
be reused lateron when thread A decodes the next packet given to it).
One could extend the buffer already used for progress for this
or use a new one (requiring yet another allocation), yet both
of these approaches have the drawback of being unnatural, ugly
and requiring quite a lot of ad-hoc code. E.g. in case of the VP9
side data mentioned above one could not simply use the helper
that allocates and adds the side data to an AVFrame in one go.
The ProgressFrame API meanwhile offers a different solution to all
of this. It is based around the idea that the most natural
shared object for sharing information about an AVFrame between
decoding threads is the AVFrame itself. To actually implement this
the AVFrame needs to be reference counted. This is achieved by
putting a (ownership) pointer into a shared (and opaque) structure
that is managed by the RefStruct API and which also contains
the stuff necessary for progress reporting.
The users get a pointer to this AVFrame with the understanding
that the owner may set all the fields until it has indicated
that it has finished decoding this AVFrame; then the users are
allowed to read everything. Every decoder may of course employ
a different contract than the one outlined above.
Given that there is no underlying av_frame_ref(), creating
references to a ProgressFrame can't fail. Only
ff_thread_progress_get_buffer() can fail, but given that
it will replace calls to ff_thread_get_ext_buffer() it is
at places where errors are already expected and properly
taken care of.
The ProgressFrames are empty (i.e. the AVFrame pointer is NULL
and the AVFrames are not allocated during init at all)
while not being in use; ff_thread_progress_get_buffer() both
sets up the actual ProgressFrame and already calls
ff_thread_get_buffer(). So instead of checking for
ThreadFrame.f->data[0] or ThreadFrame.f->buf[0] being NULL
for "this reference frame is non-existing" one should check for
ProgressFrame.f.
This also implies that one can only set AVFrame properties
after having allocated the buffer. This restriction is not deep:
if it becomes onerous for any codec, ff_thread_progress_get_buffer()
can be broken up. The user would then have to get a buffer
himself.
In order to avoid unnecessary allocations, the shared structure
is pooled, so that both the structure as well as the AVFrame
itself are reused. This means that there won't be lots of
unnecessary allocations in case of non-frame-threaded decoding.
It might even turn out to have fewer than the current code
(the current code allocates AVFrames for every DPB slot, but
these are often excessively large and not completely used;
the new code allocates them on demand). Pooling relies on the
reset function of the RefStruct pool API, it would be impossible
to implement with the AVBufferPool API.
Finally, ProgressFrames have no notion of owner; they are built
on top of the ThreadProgress API which also lacks such a concept.
Instead every ThreadProgress and every ProgressFrame contains
its own mutex and condition variable, making it completely independent
of pthread_frame.c. Just like the ThreadFrame API it is simply
presumed that only the actual owner/producer of a frame reports
progress on said frame.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There are lots of files that don't need it: The number of object
files that actually need it went down from 2011 to 884 here.
Keep it for external users in order to not cause breakages.
Also improve the other headers a bit while just at it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Instead, use forward declarations; and in order not to affect
any user include these headers for them, but not internally.
This has the advantage of removing implicit inclusions of these
headers from almost all files providing codecs.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
They were replaced by TX from libavutil; the tremendous work
to get to this point (both creating TX as well as porting
the users of the components removed in this commit) was
completely performed by Lynne alone.
Removing the subsystems from configure may break some command lines,
because the --disable-fft etc. options are no longer recognized.
Co-authored-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is not necessary at all. So remove it.
This also breaks an inclusion cycle mem.h->avutil.h->common.h->mem.h.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is the more proper place for them given that this is
the only API using them.
Also use a forward-declaration of AVCodecContext in fdctdsp.h
to avoid including avcodec.h in jfdct(fst|int).c.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>