Check only on arches that need said check.
(Btw: I do not see how h_loop_filter benefits from alignment
at all and why h_loop_filter_unaligned exists.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
No longer necessary now that the x86 loop filter functions are
bitexact.
Reviewed-by: Sean McGovern <gseanmcg@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The old code operated on bytes and did lots of tricks
due to their limited range; it did not completely succeed,
which is why the old versions were not used when bitexact
output was requested.
In contrast, the new version is much simpler: It operates
on signed 16 bit words whose range is more than sufficient.
This means that these functions don't need a check for bitexactness
(and can be used in FATE).
Old benchmarks (for this, the AV_CODEC_FLAG_BITEXACT check has been
removed from checkasm):
h_loop_filter_c: 29.8 ( 1.00x)
h_loop_filter_mmxext: 32.2 ( 0.93x)
h_loop_filter_unaligned_c: 29.9 ( 1.00x)
h_loop_filter_unaligned_mmxext: 31.4 ( 0.95x)
v_loop_filter_c: 39.3 ( 1.00x)
v_loop_filter_mmxext: 14.2 ( 2.78x)
v_loop_filter_unaligned_c: 38.9 ( 1.00x)
v_loop_filter_unaligned_mmxext: 14.3 ( 2.72x)
New benchmarks:
h_loop_filter_c: 29.2 ( 1.00x)
h_loop_filter_sse2: 28.6 ( 1.02x)
h_loop_filter_unaligned_c: 29.0 ( 1.00x)
h_loop_filter_unaligned_sse2: 26.9 ( 1.08x)
v_loop_filter_c: 38.3 ( 1.00x)
v_loop_filter_sse2: 11.0 ( 3.47x)
v_loop_filter_unaligned_c: 35.5 ( 1.00x)
v_loop_filter_unaligned_sse2: 11.2 ( 3.18x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This SSSE3 function uses MMX registers (of course without emms
at the end) and processes eight bytes of input by unpacking
it into two MMX registers. This is very suboptimal given
that one can just use XMM registers to process eight words.
This commit switches them to using XMM registers.
Old benchmarks:
avg_pixels_tab[1][3]_c: 114.5 ( 1.00x)
avg_pixels_tab[1][3]_ssse3: 43.6 ( 2.62x)
put_pixels_tab[1][3]_c: 83.6 ( 1.00x)
put_pixels_tab[1][3]_ssse3: 34.0 ( 2.46x)
New benchmarks:
avg_pixels_tab[1][3]_c: 115.3 ( 1.00x)
avg_pixels_tab[1][3]_ssse3: 24.6 ( 4.69x)
put_pixels_tab[1][3]_c: 83.8 ( 1.00x)
put_pixels_tab[1][3]_ssse3: 19.7 ( 4.24x)
Reviewed-by: Kieran Kunhya <kieran@kunhya.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Given that one has to deal with 16 byte intermediates it is
unsurprising that SSE2 wins against MMX; the MMX version has
therefore been removed (as well as the now unused inline_asm.h).
The new function is even 32B smaller than the old MMX one.
Old benchmarks:
put_no_rnd_pixels_tab[1][3]_c: 84.1 ( 1.00x)
put_no_rnd_pixels_tab[1][3]_mmx: 41.1 ( 2.05x)
New benchmarks:
put_no_rnd_pixels_tab[1][3]_c: 84.0 ( 1.00x)
put_no_rnd_pixels_tab[1][3]_ssse3: 22.1 ( 3.80x)
Reviewed-by: Kieran Kunhya <kieran@kunhya.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also remove the now superseded MMX versions (the new functions have the
exact same codesize as the removed ones).
Old benchmarks:
avg_no_rnd_pixels_tab[0][3]_c: 233.7 ( 1.00x)
avg_no_rnd_pixels_tab[0][3]_mmx: 121.5 ( 1.92x)
put_no_rnd_pixels_tab[0][3]_c: 171.4 ( 1.00x)
put_no_rnd_pixels_tab[0][3]_mmx: 82.6 ( 2.08x)
New benchmarks:
avg_no_rnd_pixels_tab[0][3]_c: 233.3 ( 1.00x)
avg_no_rnd_pixels_tab[0][3]_sse2: 45.0 ( 5.18x)
put_no_rnd_pixels_tab[0][3]_c: 172.1 ( 1.00x)
put_no_rnd_pixels_tab[0][3]_sse2: 40.9 ( 4.21x)
Reviewed-by: Kieran Kunhya <kieran@kunhya.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Hint: The parts of this patch in decode_block_progressive()
and decode_block_refinement() rely on the fact that GET_VLC
returns -1 on error, so that it enters the codepaths for
actually coded block coefficients.
Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The v_lowpass wrappers (which are instantiated by this macro)
are only used in the put (and not the avg) form for SSSE3
(the avg form is only used for mc02, which doesn't exist
for SSSE3). Clang warns about the unused functions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When we parse a MakerNote, we first try to parse it as an IFD and if
that fails, we try to re-parse it as a binary blob. This is because
MakerNote is not well-documented in its nature.
However, if we fail to parse it the first time, we should not av_log
error messages about the parse failure, so instead we log these as
AV_LOG_DEBUG.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Reported-by: Ramiro Polla <ramiro.polla@gmail.com>
Fixes: 439711052/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-4956250308935680
Fixes: out of array access
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: James Almer <jamrial@gmail.com>
The various game engines implement the following blit types, from the decoded
result to the main canvas:
- normal (opaque) blit (c37/c47/c48)
- masked blit (c37/c48)
- interpolated-frame blit (c48)
Here an artificial frame is generated by looking up the pixels
from both buffers and picking a color from the interpolation table
for the artificial frame.
This is only supported in the decoder of "Making Magic".
Implement and hook up these 3 schemes for each of the 3 compresstion types,
and switch codec20 to a call to the opaque blit function.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Making Magic makes use of codec48 flag bit 0, which, when set,
means NOT to swap both buffers on even sequence numbers.
This fixes most of the artifacts in the Making Magic videos.
It's not complete though, bits 1 and 4 still need to be handled.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
- align the incoming widths to 4(c37) / 8(c47/78) pixels. LucasArts
game videos have these aligned.
- since these codecs use their 2/3 buffers for themselves, adjust the
stride to the aligned width, keeping it even, which gets rid of
an unaligned store in c48_4to8() found by the fuzzer with an
odd stride.
- clear the whole diff buffer, not just the area described by w/h.
- adjust the RLE "decoded_size" to the product of the aligned width
and reported height.
These changes are the result of various fuzzer-found issues; all my
test videos still work fine.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Add left/top offsets and clipping to codec20 (raw images),
use it for the copying of codec37/47/48 images to main buffer.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Codec37/47/48 have their own buffers; left/top are applied after
the decoding is done when copying to the main buffer. Don't add left/top
to their width/height when doing checks against the established buffer sizes.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
This implements XPAL the same way the DOS/Windows players do, with an
additional 768-entry table holding the palette left-shifted by 7 bits,
and adding the deltapal values to this.
This results in a perfectly smooth day-to-night transition in the last
30 seconds of the Outlaws RAE.SAN (ending) video, while before there
were visible brightness "pulses" when a new palette was loaded.
It also fixes color banding in the The Dig Intro (sq1.san), in the
scene showing the shuttle launch pad and the night sky.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
the c37 mvtable has only 255 pairs, change index 255 to zero to
avoid reading outside the table boundaries.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
When checking for oversized frames, check not only for the width
and height being larger, but also the area not outgrowing the
allocated buffer.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
When decode_init() is called for ANIM content, zero the dimensions
set in avctx width/height. Only SANM files have image dimensions in
their header, while ANIM do not.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Reimplement opcodes 0xFF and 0xFD the same way the c48 decoder
in the "Mysteries of the Sith" game engine does it:
The source pixel(s) and various pixels from inside the same and above
block of the second to last image rendered to the destination buffer
are used together with the interpolation table to generate a 4x4 pattern,
which is then expanded by doubling each pixel horizontally and vertically
to produce the final 8x8 block.
This fixes visible artifacts in frames 25-50 of the S1L1OCS.SAN
video of Mysteries of the Sith.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
It was initially implemented as 4 4x1 blocks, reimplement it as 4 2x2 blocks.
Fixes a few The Dig videos, esp. black dots on the asteroid in the
intro scene.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Codec48 opcodes F9 and FC take per-subblock indices into the motion vector
table from the source stream, however the table has only 255 entries.
Luckily, index 255 is index 0 of the following table, which means no
motion vector, the same as index 0 of the current table.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Create a separate GetByteContext from the general one, to be able
to limit the size of the FOBJ to the size described in the tag size.
Otherwise each fobj could theoretically use all the remaining data
in the FRME (which also contains audio, subtitles, ...).
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
some videos have a FTCH at the start of the video, to restore the
last image produced by the previous game file. This leads to
ugly messages like these:
[sanm @ 0x7f18cc001980] [IMGUTILS @ 0x7f18d7ffe8e0] Picture size 0x0 is invalid
[sanm @ 0x7f18cc001980] video_get_buffer: image parameters invalid
[sanm @ 0x7f18cc001980] get_buffer() failed
Fix this by not setting the got_frame_ptr when there is nothing to
restore/fetch. Seen with a lot of RA1 and the RA2 Level 11/12 videos.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Rename the generic motion_vectors to c47_mv, as this vector table
was initially introduced with codec47 which predates bl16 by 1-2 years,
and bl16 is a development of codec47 (with a bit of c48 thrown in).
No functional change.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Both of these encode a quarter-sized keyframe, with missing pixels
interpolated from the immediate neighbours.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Restructe the SANM (or BL16 as LucasArts calls it) decoder to make it
look like the others, as it is basically a development of old_codec47
for rgb565 values.
No functional changes.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
codec47 carries a 4-byte small codebook in its header. Read those
4 bytes into context member instead of awkwardly redirecting the
bytestream pointer every time it needs to be accessed.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
The mv check introduced with d5bdb0b705 broke MotS videos:
- their height (300 lines) is 37,5 blocks; unfortunately the videos try to
access up to 1 block more.
Extend the mv check to the aligned_height, which fixes most artifacts.
- don't return an error when an mv is invalid; rather skip the (subblock).
Gets rid of almost all artifacts.
Some artifacts still remain, esp in space scenes where the original
encoder apparently fetched black pixels from outside of the aligned
height. An increase of the buffer size by 8 lines will fix that later.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
- don't draw outside the buffers
- don't wrap around when coordinates go over the edge
this is especially noticeable in the e.g. O1OPEN.ANM, C1C3PO.ANM
RA1 files with planets wrapping around.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
- don't draw outside the buffers
- don't wrap around when coordinates go over the edge
this is especially noticeable in the e.g. O1OPEN.ANM, C1C3PO.ANM
RA1 files with planets wrapping around.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>