This also means that if a plane*slice has only 1 color nothing
is stored after the remap table
This also corrects the RCT offset to the exact value after remap
not a fixed 65536
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This reduces needed memory and also removes the 65536 maximum for remap
on the decoder side
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
That is instead of a fixed 65536, we now allocate only as many as there
are pixels.
We also allocate only for the encoder and only when remapping is enabled
and only for 32bit per sample
This should reduce memory consumption, the 2nd array will be
dealt with in a future commit
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes failures in some systems since 171060d5dc.
This can be further improved by only allocating the arrays when needed.
Signed-off-by: James Almer <jamrial@gmail.com>
This commit adds support for hardware accelerated decoding to
the decoder.
The previous commits already refactored the decoder, this commit
simply adds calls to hooks to decode.
This also makes remap optional (which is a good idea even if we decide to keep flip fixed)
Effect on compression (using 2 rawlsb, golomb rice, large context model with ACES_OT_VWG_SampleFrames
-rw-r----- 1 michael michael 499101306 Mär 11 14:58 float-303503-try3d-m2.nut
-rw-r----- 1 michael michael 503700199 Mär 11 14:57 float-303503-try3d-m1.nut
-rw-r----- 1 michael michael 518150578 Mär 11 14:57 float-303503-try3d-m0.nut
(the test above used the rawlsb patch, which is not applied yet)
Reviewed-by: Jerome Martinez <jerome@mediaarea.net>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This allows switching it on conditionally and also for non float,
it may improve compression for RGB data that was paletted
or other synthetic images
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
float16 (and more so float32) have many odd values
half the values are negative, many are larger than "1.0"
and many values are very close to 0.
Storing the 16bits as is, looses compression because of the mixture
of dense and sparse regions and also many completely unused ones.
This simply remaps the 65536 values so no unused values remain
This improves compression by about 1.5% for the ACES_OT_VWG_SampleFrames testset
(this testset contains all kind of funny values including many images
with negative rgb values)
The space needed for the map is insignificant compared to the
compression gained
This patch also flips half the float range as it can be done
using the same table.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This fixes corner cases (requires version 4 or a spec update)
Fixes: Ticket5548
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
That variable is shared between frame threads in the same defective way
described in the previous commit. Fix it by adding a RefStruct-managed
arrays of flags that is propagated across frame threads in the standard
manner.
Remove now-unused FFV1Context.fsrc
Frame threading in the FFV1 decoder works in a very unusual way - the
state that needs to be propagated from the previous frame is not decoded
pixels(¹), but each slice's entropy coder state after decoding the slice.
For that purpose, the decoder's update_thread_context() callback stores
a pointer to the previous frame thread's private data. Then, when
decoding each slice, the frame thread uses the standard progress
mechanism to wait for the corresponding slice in the previous frame to
be completed, then copies the entropy coder state from the
previously-stored pointer.
This approach is highly dubious, as update_thread_context() should be
the only point where frame-thread contexts come into direct contact.
There are no guarantees that the stored pointer will be valid at all, or
will contain any particular data after update_thread_context() finishes.
More specifically, this code can break due to the fact that keyframes
reset entropy coder state and thus do not need to wait for the previous
frame. As an example, consider a decoder process with 2 frame threads -
thread 0 with its context 0, and thread 1 with context 1 - decoding a
previous frame P, current frame F, followed by a keyframe K. Then
consider concurrent execution consistent with the following sequence of
events:
* thread 0 starts decoding P
* thread 0 reads P's slice header, then calls
ff_thread_finish_setup() allowing next frame thread to start
* main thread calls update_thread_context() to transfer state from
context 0 to context 1; context 1 stores a pointer to context 0's private
data
* thread 1 starts decoding F
* thread 1 reads F's slice header, then calls
ff_thread_finish_setup() allowing the next frame thread to start
decoding
* thread 0 finishes decoding P
* thread 0 starts decoding K; since K is a keyframe, it does not
wait for F and reallocates the arrays holding entropy coder state
* thread 0 finishes decoding K
* thread 1 reads entropy coder state from its stored pointer to context
0, however it finds state from K rather than from P
This execution is currently prevented by special-casing FFV1 in the
generic frame threading code, however that is supremely ugly. It also
involves unnecessary copies of the state arrays, when in fact they can
only be used by one thread at a time.
This commit addresses these deficiencies by changing the array of
PlaneContext (each of which contains the allocated state arrays)
embedded in FFV1SliceContext into a RefStruct object. This object can
then be propagated across frame threads in standard manner. Since the
code structure guarantees only one thread accesses it at a time, no
copies are necessary. It is also re-created for keyframes, solving the
above issue cleanly.
Special-casing of FFV1 in the generic frame threading code will be
removed in a later commit.
(¹) except in the case of a damaged slice, when previous frame's pixels
are used directly
In all cases except decoding version 1 it's either not used, or contains
a copy of a table from quant_tables, which we can just as well use
directly.
When decoding version 1, we can just as well decode into
quant_tables[0], which would otherwise be unused.
FFV1 decoder and encoder currently use the same struct - FFV1Context -
both as codec private data and per-slice context. For this purpose
FFV1Context contains an array of pointers to per-slice FFV1Context
instances.
This pattern is highly confusing, as it is not clear which fields are
per-slice and which per-codec.
Address this by adding a new struct storing only per-slice data. Start
by moving slice_{x,y,width,height} to it.
Avoids implicit av_frame_ref() and therefore allocations
and error checks. It also avoids explicitly allocating
the AVFrames (done implicitly when getting the buffer).
It also fixes a data race: The AVFrame's sample_aspect_ratio
is currently updated after ff_thread_finish_setup()
and this write is unsynchronized with the read in av_frame_ref().
Removing the implicit av_frame_ref() fixed this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Both the FFV1 decoder and encoder use a template of their own
to generate code multiple times. They also use a common template,
used by both decoder and encoder templates which is currently
instantiated in ffv1.h (and therefore also in ffv1.c, which
doesn't need it at all).
All these templates have the prerequisite that two macros
are defined, namely RENAME() and TYPE. The codec-specific
templates call the functions generated via the common template
via the RENAME() macro and therefore the macros used for
the common template must coincide with the macros used for
the codec-specific templates. But then it is better to not
instantiate the common template in ffv1.h, but in the codec
specific templates.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>