The previous chroma stride formula (width >> log2_chroma_w) is correct
for planar yuv but wrong for semi-planar nv12/nv21, where the UV plane
is interleaved at width bytes per row (width/2 UV pairs of 2 bytes
each). Use av_image_get_linesize() so the test feeds a valid stride to
libswscale regardless of input format; for the existing planar suites
the value is unchanged.
With the stride fixed, add nv12 and nv21 to check_yuv2rgb() so the
upcoming NEON 16bpp paths get bench coverage. ff_get_unscaled_swscale
does not wire a C yuv2rgb fast path for these inputs, so the suites
report bench-only (no correctness reference); they still run clobber
detection and cycle counts.
Signed-off-by: DROOdotFOO <drew@axol.io>
tref types can have more than one value, as is the case of tmcd in
fcp_export8-236.mov, where the single video track references all timecode
tracks.
Handle them in a generic and extensible way.
Signed-off-by: James Almer <jamrial@gmail.com>
And set it also for non-variable frame size encoders.
FATE changes are the result of passing a frame_size to flac and wavenc
encoders, instead of letting them choose one.
Signed-off-by: James Almer <jamrial@gmail.com>
Both worksaround a issue the following commit reveals (encoding with 4096
frame_size fails on aarch64 for unknown reasons), and tests setting
frame_size now that it's allowed (and ensuring the CLI doesn't overwrite it).
Signed-off-by: James Almer <jamrial@gmail.com>
Unfortunately a bit slower than the MMX version due to
the impossibility to use memory operands in paddw.
The situation would reverse if ff_dctB_mmx() would have
to issue emms.
dctB_c: 3.7 ( 1.00x)
dctB_mmx: 3.3 ( 1.13x)
dctB_sse2: 3.6 ( 1.03x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Using the configured scaler from the SwsContext implicitly. This does affect
the output of libswscale/tests/sws_ops.c, which now prints about 4x as much
data (taking roughly 4x as long, but still within a second on my machine).
We can make this process a lot faster by forcing SWS_SCALE_POINT as the
scaler, which skips calculating any actual filter weights in favor of
generating a trivial 1-tap filter.
Signed-off-by: Niklas Haas <git@haasn.dev>
$(FFMPEG) expands to "ffmpeg.exe" on Windows/MSYS2, and the bare
$(FFMPEG) call falls through to PATH lookup, picking up an externally
installed ffmpeg instead of the freshly built binary in $target_path.
That stale binary lacked the rejection added in a45fe72c9d, causing
msys2-clang64/clangarm64/ucrt64 slots to silently produce 250x2
instead of failing at 500x0.
Wrap with fate-run.sh's run() so $target_exec and $target_path are
resolved correctly on all platforms, matching the convention used by
e.g. fate-id3v2-invalid-tags, and avoiding the ffmpeg() helper's
unrelated default flags.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
The 16-bit kernel is dispatched for every non-8-bit pixel format
(9/10/12/16-bit content, all stored in uint16_t). It's supposed to
undo the Q16 scaling that set_filter_param() applies to `amount`:
fp->amount = amount * 65536.0;
but the shift written in the kernel is `>> (8+nbits)`, which for the
nbits=16 instantiation of the macro comes out to `>> 24` instead of
`>> 16`. Because of this, on any non-8-bit input, unsharp applies ~1/256
of the user's requested strength and is effectively a no-op. The
8-bit kernel (nbits=8) happens to be correct because 8+8 == 16.
This commit also widens the intermediate product to int64 before the
shift, to avoid a potential overflow. Take a 16-bit pixel at the
edge of a sharp white/black region, with the user-facing `amount`
set to its declared maximum of 5.0.
*srx = 65535
blur = 32768
diff = *srx - blur = 32767
amount_q16 = 5.0 * 65536 = 327680
Then the kernel computes:
product = diff * amount_q16
= 32767 * 327680 = 10,737,090,560 (~1.07e10)
which overflows INT32_MAX. Widening to int64 keeps the
multiplication in range; the subsequent `>> 16` brings it back to
sample range and the final cast to int32 is then safe. The widening
is a semantic no-op for 8/9/10/12-bit content where the product
always fits in int32 (worst case at 12-bit: 4095 * 327680 ~ 1.34e9).
Introduced by ee792ebe08 (2019-11-08, "avfilter/vf_unsharp: add 10bit
support"). The fate-filter-unsharp-yuv420p10 reference added in the
same series was generated from the broken kernel and is regenerated
here. fate-filter-unsharp (8-bit) is unaffected.
Repro:
python3 -c "import numpy as np; y=np.tile(np.where(np.arange(128)//8 & 1, 512, 256).astype('<u2'), (128,1)); c=np.full((64,64), 512, '<u2'); open('in.yuv','wb').write(y.tobytes()+c.tobytes()*2)"
ffmpeg -f rawvideo -pix_fmt yuv420p10le -s 128x128 -i in.yuv \
-lavfi "split=2[a][b];[b]unsharp=la=1[bs];[a][bs]psnr" \
-f null - 2>&1 | grep PSNR
Before: `PSNR y:66.50 ...` -- the filter is effectively a no-op,
so the sharpened output matches the input almost exactly.
After: `PSNR y:28.27 ...` -- the filter actually sharpens, so
output and input differ as expected.
Signed-off-by: Nil Fons Miret <nilf@netflix.com>
Made-with: Cursor
An installation of frei0r-plugins is required to run the tests,
which is usually seperate from the build headers. Some systems
have it packaged (e.g. apt install frei0r-plugins). An upstream
release extracted to FREI0R_PATH also works.
Signed-off-by: Stefan Breunig <stefan-ffmpeg-devel@breunig.xyz>
Test av_ts_make_string with NOPTS, zero, positive, negative, and
INT64 boundary values, av_ts2str macro, av_ts_make_time_string2
with various timebases, and av_ts_make_time_string pointer
variant.
Coverage for libavutil/timestamp.c: 0.00% -> 100.00%
Test av_tdrdi_alloc with 1 and 3 displays, and the inline
av_tdrdi_get_display accessor. Verifies that the returned
pointer matches entries_offset + idx * entry_size, tests
write/read-back of display width exponent/mantissa and view ID
fields, and OOM paths via av_max_alloc.
Coverage for libavutil/tdrdi.c: 0.00% -> 100.00%
Test av_dynamic_hdr_vivid_alloc and
av_dynamic_hdr_vivid_create_side_data. Verifies zero defaults,
write/read-back of system_start_code, num_windows, and
color transform params (min/avg/var/max RGB), frame side
data attachment, and OOM paths via av_max_alloc.
Coverage for libavutil/hdr_dynamic_vivid_metadata.c: 0.00% -> 100.00%
Test av_buffer_alloc, av_buffer_allocz, av_buffer_create with
custom free callback, AV_BUFFER_FLAG_READONLY, av_buffer_ref,
av_buffer_is_writable, av_buffer_get_ref_count,
av_buffer_make_writable, av_buffer_realloc (including from NULL),
av_buffer_replace (including with NULL), av_buffer_pool
init/get/uninit cycle, av_buffer_pool_init2 with custom alloc
and pool_free callbacks, av_buffer_pool_buffer_get_opaque, and
OOM paths via av_max_alloc.
Coverage for libavutil/buffer.c: 0.00% -> 90.19%
Remaining uncovered lines are mutex init failures and
secondary allocation failure paths.
Add a regression test covering issue #22817: cascaded scale=...:-2
filters on extreme aspect ratios previously produced zero output
dimensions silently. The test expects ffmpeg to fail fast.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
This allows JPEG XL images to be recognized as valid attachments.
Since JPEG is already widely used for cover art, JXL's support for
lossless JPEG transcodes can decrease the total size of music collections.
This fixes JXL cover art rendering in applications like mpv which rely
on FFmpeg for demuxing.
Signed-off-by: jade <heartstopp1ng@proton.me>
Test 3-tap for 8x8/16x16/32x32 (both filtered_left and
filtered_top outputs). Test strong smoothing for filtered_top
and in-place left modification.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
This change should improve performance on Skylake and later
Intel CPUs (which have only half the ports for saturated adds/subs
for mmx register compared to xmm register): llvm-mca predicts
a 25% performance improvement on Skylake.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Due to a discrepancy between SSE2 and the C version coefficients
for idct_put and idct_add are restricted to a range not causing
overflows.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows to use pavgb to reduce the amount of instructions used
to calculate the average; processing two rows via movhps allows
to reduce the amount of pxor and pavgb even further and turned
out to be beneficial.
This patch also avoids a load as the constant used here can be easily
generated at runtime.
Old benchmarks:
put_no_rnd_pixels_l2_c: 13.3 ( 1.00x)
put_no_rnd_pixels_l2_mmx: 11.6 ( 1.15x)
New benchmarks:
put_no_rnd_pixels_l2_c: 13.4 ( 1.00x)
put_no_rnd_pixels_l2_sse2: 7.5 ( 1.77x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Instead of implicitly testing for NaN values. This is mostly a straightforward
translation, but we need some slight extra boilerplate to ensure the mask
is correctly updated when e.g. commuting past a swizzle.
Signed-off-by: Niklas Haas <git@haasn.dev>
These can randomly trigger the alpha/zero fast paths, resulting in spurious
tests or randomly diverging performance if the backend happens to implement
that particular fast path.
Signed-off-by: Niklas Haas <git@haasn.dev>
This was not actually testing integer path. Additionally, for integer
scales, there is a special fast path for expansion from bits to full range,
which we should separate from the random value test.
Most of these filters don't test anything meaningfully different relative to
each other; the only filters that really have special significant are POINT
(for now) and maybe BILINEAR down the line.
Apart from that, SINC, combined with the src size loop, already tests both
extreme cases (large and small filters), with large, oscillating unwindonwed
weights.
The other filters are not adding anything of substance to this, while massively
slowing down the runtime of this test. We can, of course, change this if the
backends ever get more nuanced handling.
checkasm: all 855 tests passed (down from 1575)
Signed-off-by: Niklas Haas <git@haasn.dev>
The current code was a bit clumsy in that it always picked the first
available backend when choosing the new function. This meant that some x86
paths were not being tested at all, whenever the memcpy backend (which has
higher priority) could serve the request.
This change makes it so that each backend is explicitly tested against only
implementations provided by that same backend.
checkasm: all 1575 tests passed (up from 1305)
As an aside, it also lets us benchmark the memcpy backend directly against
the C reference backend.
Signed-off-by: Niklas Haas <git@haasn.dev>
These don't actually exist at runtime, and will soon be removed from the
backends as well.
This commit is intentionally a bit incomplete; as I will rewrite this
based on the auto-generated macros in the upcoming ops_micro series.
Signed-off-by: Niklas Haas <git@haasn.dev>
They have been superseded by SSSE3; the SSE2 version was even disabled
(and segfaults if enabled).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>