SwsContext.{src,dst}_format is int (but uses enum AVPixelFormat)
values, so the accesses need to be performed using an int
for -fshort-enums support.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The 'sws_dither' and 'alphablend' options access 'SwsDither' and
'SwsAlphaBlend' enum fields as integers. This is unsafe when the
code is compiled with -fshort-enums, as the enum size might be
smaller than an int.
Since the 'dither' and 'alpha_blend' struct members are part of the
public API, their types cannot be easily changed.
To ensure safe integer access and maintain ABI compatibility across
different compiler settings, a MAX_ENUM value is added to force the
enums to a 32-bit underlying type.
This would make this code compatible with forcing VEX encodings
for the SSE4 codepath and is also more correct, because avx_enabled
would be enabled for AVX, although the instructions used in these
codepaths require AVX2.
Reviewed-by: Niklas Haas <ffmpeg@haasn.dev>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: multiple integer overflows
Fixes: out of array access
The PoC modifies filter parameters generally inaccessable to an attacker
Found-by: Zhenpeng (Leo) Lin from depthfirst
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: multiple integer overflows
Fixes: out of array access
Regression since: a408d03ee6
The PoC modifies filter parameters generally inaccessable to an attacker
Found-by: Zhenpeng (Leo) Lin from depthfirst
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The current logic implicitly pulled the new value range out of SwsComps using
ff_sws_apply_op_q(), but this was quite ill-formed and not very robust. In
particular, it only worked because of the implicit assumption that the value
range was always set to 0b1111...111.
This actually poses a serious problem for 32-bit packed formats, whose
value range actually does not fit into AVRational. In the past, it only
worked because the value would implicitly overflow to -1, which SWS_OP_UNPACK
would then correctly extract the bits out from again.
In general, it's cleaner (and sufficient) to just explicitly reset the value
range on SWS_OP_UNPACK again.
The current behavior of assuming the value range implicitly on SWS_OP_READ
has a number of serious drawbacks and shortcomings:
- It ignored the effects of SWS_OP_RSHIFT, such as for p010 and related
MSB-aligned formats. (This is actually a bug)
- It adds a needless dependency on the "purely informative" src/dst fields
inside SwsOpList.
- It is difficult to reason about when acted upon by SWS_OP_SWAP_BYTES, and
the existing hack of simply ignoring SWAP_BYTES on the value range is not
a very good solution here.
Instead, we need a more principled way for the op list generating code
to communicate extra metadata about the operations read to the optimizer.
I think the simplest way of doing this is to allow the SwsComps field attached
to SWS_OP_READ to carry additional, user-provided information about the values
read.
This requires changing ff_sws_op_list_update_comps() slightly to not completely
overwrite SwsComps on SWS_OP_READ, but instead merge the implicit information
with the explictly provided one.
I think this is ultimately a better home, since the semantics of this are
not really tied to optimization itself; and because I want to make it an
explicitly suported part of the user-facing API (rather than just an
internal-use field).
The secondary motivating reason here is that I intend to use internal
helpers of `ops.c` inside the next commit. (Though this is a weak reason
on its own, and not sufficient to justify this move by itself.)
Fixes: signed integer overflow: -1159988356 + -1082982400 cannot be represented in type 'int'
Fixes: 461519938/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-4777382021234688
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This function was assuming that the bits are MSB-aligned, but they are
LSB-aligned in both practice (and in the actual backend).
Also update the documentation of SwsPackOp to make this clearer.
Fixes an incorrect omission of a clamp after decoding e.g. rgb4, since
the max value range was incorrectly determined as 0 as a result of unpacking
the MSB bits instead of the LSB bits:
bgr4 -> gray:
[ u8 XXXX -> +XXX] SWS_OP_READ : 1 elem(s) packed >> 1
[ u8 .XXX -> +++X] SWS_OP_UNPACK : {1 2 1 0}
[ u8 ...X -> +++X] SWS_OP_SWIZZLE : 2103
[ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> f32
[f32 ...X -> .++X] SWS_OP_LINEAR : dot3 [...]
[f32 .XXX -> .++X] SWS_OP_DITHER : 16x16 matrix + {0 3 2 5}
+ [f32 .XXX -> .++X] SWS_OP_MIN : x <= {255 _ _ _}
[f32 .XXX -> +++X] SWS_OP_CONVERT : f32 -> u8
[ u8 .XXX -> +++X] SWS_OP_WRITE : 1 elem(s) planar >> 0
(X = unused, + = exact, 0 = zero)
Instead of blindly interleaving re-ordering and minimizing optimizations,
separate this loop into several passes - the first pass will minimize the
operation list in-place as much as possible, and the second pass will apply any
desired re-orderings. (We also want to try pushing clear back before any other
re-orderings, as this can trigger more phase 1 optimizations)
This restructuring leads to significantly more predictable and stable behavior,
especially when introducing more operation types going forwards. Does not
actually affect the current results, but matters with some upcoming changes
I have planned.
This requires updating the order of the dither matrix offsets as well. In
particular, this can only be done cleanly if the dither matrix offsets are
compatible between duplicated channels; but this should be guaranteed after
the previous commits.
As a side note, this also fixes a bug where we pushed SWS_OP_SWIZZLE past
SWS_OP_DITHER even for very low bit depth output (e.g. rgb4), which led to
a big loss in accuracy due to loss of per-channel dither noise.
Improves loss of e.g. gray -> rgb444 from 0.00358659 to 0.00261414,
now beating legacy swscale in these cases as well.
Otherwise, this is invalid; as the result of applying the next operation
after channel duplication may not commute with the result of duplicating
the result of applying the next operation on only one channel.
In practice, this was using the last seen channel's coefficients.
Note that this did not actually affect anything in practice, because the only
relevant ops (MIN/MAX) were always generated with identical coefficients for
identical channel ranges.
However, it will matter moving forwards, as e.g. dither ops may not be
commuted freely if their matrix offsets differ per channel.
The current usage is ambiguous between "affects each component equally" and
"affects each component independently" - and arguably, the current behavior
was a bug (since SWS_OP_DITHER should not commute with a SWIZZLE, at least
from a bit-exactness PoV).
However, when trying to define cleaner replacements for these concepts, I
realized there are too many special cases; and given that we only have two
use sites, I decided to just split them directly into "commute" functions
for those particular usage cases.
As an added benefit, this moves the commutation logic out of the already-long
ff_sws_ops_list_optimize().
On the surface, this trades a tiny bit of PSNR for not introducing chroma
noise into grayscale images. However, the main reason for this change is
actually motivated by a desire to avoid regressing the status quo of
duplicating swizzles being able to be commuted past dither ops.
Instead of computing y + N with a hard-coded index offset, calculate the
relative offset as a 16-bit integer in C and add that to the pointer directly.
Since we no longer mask the resulting combined address, this may result in
overread, but that's fine since we over-provisioned the array in the previous
commit.
I want to change the way offsets are calculated inside the dither asm;
and the cleanest way to solve that problem is to just over-allocate the
entire dither matrix based on the maximum offset range we expected.
This is an exceptionally unlikely (in fact, currently impossible) case to
actually hit, and not worth micro-optimizing for. More specifically, having
this special case prevents me from easily adding per-row offsets.
This check was wrong; 1 << 0 = 1. The intent was to skip allocating a 1x1
matrix by assuming it is a constant 0.5. But as written, the check never
actually executed.
This did not affect the runtime performance, nor did it leak memory; but it
did mean we didn't hit the intended `assert`.
Since we only need 8 bytes to store the dither matrix pointer, we actually
still have 8 bytes left-over. That means we could either store the 8-bit
row offset directly, or alternatively compute a 16-bit pointer offsets.
I have chosen to do the former for the C backend, in the interest of
simplicity.
The one downside of this approach is that it would fail on hypothetical
128-bit platforms; although I seriously hope that this code does not live
long enough to see the need for 128-bit addressable memory.
To improve decorrelation between components, we offset the dither matrix
slightly for each component. This is currently done by adding a hard-coded
offset of {0, 3, 2, 5} to each of the four components, respectively.
However, this represents a serious challenge when re-ordering SwsDitherOp
past a swizzle, or when splitting an SwsOpList into multiple sub-operations
(e.g. for decoupling luma from subsampled chroma when they are independent).
To fix this on a fundamental level, we have to keep track of the offset per
channel as part of the SwsDitherOp metadata, and respect those values at
runtime.
This commit merely adds the metadata; the update to the underlying backends
will come in a follow-up commit. The FATE change is merely due to the
added offsets in the op list print-out.
Unfortunately, this is exceptionally difficult to handle in the general case,
when packed/bitstream formats come into play - the actual interpretation of
the offset, shift etc. are so difficult to deal with in a general case that
I think it's simpler to continue falling back to a static table of variants
for these exceptions. They are fortunately small in number.
This is the only function that actually has the ability to return an
error, so just move the pixel type assignment here and add a check to
ensure a valid pixel type is found.
Turns out these are not, in fact, purely informative - but the optimizer
can take them into account. This should be documented properly.
I tried to think of a way to avoid needing this in the optimizer, but any
way I could think of would require shoving this to SwsReadWriteOp, which I
am particularly unwilling to do.