This allows catching whether the functions write outside of
the designated rectangle, and if run with "checkasm -v", it also
prints out on which side of the rectangle the overwrite was.
Signed-off-by: Martin Storsjö <martin@martin.st>
This backports similar functionality from dav1d, from commits
35d1d011fda4a92bcaf42d30ed137583b27d7f6d and
d130da9c315d5a1d3968d278bbee2238ad9051e7.
This allows detecting writes out of bounds, on all 4 sides of
the intended destination rectangle.
The bounds checking also can optionally allow small overwrites
(up to a specified alignment), while still checking for larger
overwrites past the intended allowed region.
Signed-off-by: Martin Storsjö <martin@martin.st>
This makes it easier to implement custom error printouts in tests.
This is a port of dav1d's commit
13a7d78655f8747c2cd01e8a48d44dcc7f60a8e5 into ffmpeg's checkasm.
Signed-off-by: Martin Storsjö <martin@martin.st>
This was changed 8 years ago with the introduction of the linux-perf path,
with seemingly no justification at the time. Likely a developer oversight
from testing.
This bug not only made --runs completely ineffective, but also meant that we
didn't actually correctly filter out outliers.
Fixes: e0d56f097f
Many of the fields of MpegEncContext (which is also used by decoders)
are actually only used by encoders. Therefore this commit adds
a new encoder-only structure and moves all of the encoder-only
fields to it except for those which require more explicit
synchronisation between the main slice context and the other
slice contexts. This synchronisation is currently mainly provided
by ff_update_thread_context() which simply copies most of
the main slice context over the other slice contexts. Fields
which are moved to the new MPVEncContext no longer participate
in this (which is desired, because it is horrible and for the
fields b) below wasteful) which means that some fields can only
be moved when explicit synchronisation code is added in later commits.
More explicitly, this commit moves the following fields:
a) Fields not copied by ff_update_duplicate_context():
dct_error_sum and dct_count; the former does not need synchronisation,
the latter is synchronised in merge_context_after_encode().
b) Fields which do not change after initialisation (these fields
could also be put into MPVMainEncContext at the cost of
an indirection to access them): lambda_table, adaptive_quant,
{luma,chroma}_elim_threshold, new_pic, fdsp, mpvencdsp, pdsp,
{p,b_forw,b_back,b_bidir_forw,b_bidir_back,b_direct,b_field}_mv_table,
[pb]_field_select_table, mb_{type,var,mean}, mc_mb_var, {min,max}_qcoeff,
{inter,intra}_quant_bias, ac_esc_length, the *_vlc_length fields,
the q_{intra,inter,chroma_intra}_matrix{,16}, dct_offset, mb_info,
mjpeg_ctx, rtp_mode, rtp_payload_size, encode_mb, all function
pointers, mpv_flags, quantizer_noise_shaping,
frame_reconstruction_bitfield, error_rate and intra_penalty.
c) Fields which are already (re)set explicitly: The PutBitContexts
pb, tex_pb, pb2; dquant, skipdct, encoding_error, the statistics
fields {mv,i_tex,p_tex,misc,last}_bits and i_count; last_mv_dir,
esc_pos (reset when writing the header).
d) Fields which are only used by encoders not supporting slice
threading for which synchronisation doesn't matter: esc3_level_length
and the remaining mb_info fields.
e) coded_score: This field is only really used when FF_MPV_FLAG_CBP_RD
is set (which implies trellis) and even then it is only used for
non-intra blocks. For these blocks dct_quantize_trellis_c() either
sets coded_score[n] or returns a last_non_zero value of -1
in which case coded_score will be reset in encode_mb_internal().
Therefore no old values are ever used.
The MotionEstContext has not been moved yet.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It allows the callee to clobber the MMX state,
yet since 1e3dc705df this is no longer
done. So use the stricter declare_func instead.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Previously, we read elements from ff_aac_pow34sf_tab; however
that table is initialized to zero; one needs to call
ff_aac_float_common_init() to make sure that the table is
initialized.
However, given the range of the input values, a large number of
entries in ff_aac_pow34sf_tab would give results outside of the
range for signed 32 bit integers. As the largest aac_cb_maxval
entry is 16, it seems more reasonable to produce values within
an order of mangitude of that value.
(When hitting INT_MIN, implementations may end up with different
results depending on whether the value is negated as a float or
as an int. This corner case is irrelevant in practice as this
is way outside of the expected value range here.)
Coincidentally, this fixes linking checkasm with Apple's older
linker. (In Xcode 15, Apple switched to a new linker. The one in
older toolchains seems to have a bug where it won't figure out to
load object files from a static library, if the only symbol
referenced in the object file is a "common" symbol, i.e. one for
a zero-initialized variable. This issue can also be reproduced with
newer Apple toolchains by passing -Wl,-ld_classic to the linker.)
Signed-off-by: Martin Storsjö <martin@martin.st>
This code only checks hcScale. In practice this is not an issue because
the function pointers should always be identical to hyScale for the same
filter size.
Add an assertion just to make sure this assumption never regresses.
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
This corresponds to commit 9278a14cf406f8edb5052c42b83750112bf5b515
in dav1d.
Omitting the C-only functions doesn't speed up benchmarking
anyway (as those has to be benchmarked before we know if we have
any corresponding assembly functions), and being able to benchmark
those functions without corresponding assembly can be valuable in
a number of cases.
Signed-off-by: Martin Storsjö <martin@martin.st>
Share the checkasm_check_pixel macro from hevc_pel in checkasm.h,
to allow other tests to use the same. (To use it in other tests,
those tests need to have a similar setup for high bitdepth pixels,
with a local variable named "bit_depth".)
This simplifies the code for checking the output, and can print
the failing output (including a map of matching/mismatching
elements) if checkasm is run with the -v/--verbose option.
Signed-off-by: Martin Storsjö <martin@martin.st>
There is an issue with the constants used in YUV to YUV range conversion,
where the upper bound is not respected when converting to mpeg range.
With this commit, the constants are calculated at runtime, depending on
the bit depth. This approach also allows us to more easily understand how
the constants are derived.
For bit depths <= 14, the number of fixed point bits has been set to 14
for all conversions, to simplify the code.
For bit depths > 14, the number of fixed points bits has been raised and
set to 18, to allow for the conversion to be accurate enough for the mpeg
range to be respected.
The convert functions now take the conversion constants (coeff and offset)
as function arguments.
For bit depths <= 14, coeff is unsigned 16-bit and offset is 32-bit.
For bit depths > 14, coeff is unsigned 32-bit and offset is 64-bit.
x86_64:
chrRangeFromJpeg8_1920_c: 2127.4 2125.0 (1.00x)
chrRangeFromJpeg16_1920_c: 2325.2 2127.2 (1.09x)
chrRangeToJpeg8_1920_c: 3166.9 3168.7 (1.00x)
chrRangeToJpeg16_1920_c: 2152.4 3164.8 (0.68x)
lumRangeFromJpeg8_1920_c: 1263.0 1302.5 (0.97x)
lumRangeFromJpeg16_1920_c: 1080.5 1299.2 (0.83x)
lumRangeToJpeg8_1920_c: 1886.8 2112.2 (0.89x)
lumRangeToJpeg16_1920_c: 1077.0 1906.5 (0.56x)
aarch64 A55:
chrRangeFromJpeg8_1920_c: 28835.2 28835.6 (1.00x)
chrRangeFromJpeg16_1920_c: 28839.8 32680.8 (0.88x)
chrRangeToJpeg8_1920_c: 23074.7 23075.4 (1.00x)
chrRangeToJpeg16_1920_c: 17318.9 24996.0 (0.69x)
lumRangeFromJpeg8_1920_c: 15389.7 15384.5 (1.00x)
lumRangeFromJpeg16_1920_c: 15388.2 17306.7 (0.89x)
lumRangeToJpeg8_1920_c: 19227.8 19226.6 (1.00x)
lumRangeToJpeg16_1920_c: 15387.0 21146.3 (0.73x)
aarch64 A76:
chrRangeFromJpeg8_1920_c: 6324.4 6268.1 (1.01x)
chrRangeFromJpeg16_1920_c: 6339.9 11521.5 (0.55x)
chrRangeToJpeg8_1920_c: 9656.0 9612.8 (1.00x)
chrRangeToJpeg16_1920_c: 6340.4 11651.8 (0.54x)
lumRangeFromJpeg8_1920_c: 4422.0 4420.8 (1.00x)
lumRangeFromJpeg16_1920_c: 4420.9 5762.0 (0.77x)
lumRangeToJpeg8_1920_c: 5949.1 5977.5 (1.00x)
lumRangeToJpeg16_1920_c: 4446.8 5946.2 (0.75x)
NOTE: all simd optimizations for range_convert have been disabled.
they will be re-enabled when they are fixed for each architecture.
NOTE2: the same issue still exists in rgb2yuv conversions, which is not
addressed in this commit.
WASI mssing signal and siglongjmp support. This patch workaround
build error and add simd128 flag. Please note that many tests use
large array on stack, so you need to increase the stack size when
build checkasm, e.g., --extra-ldflags='-Wl,-z,stack-size=10485760'
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
This is a purely cosmetic commit aimed at replacing accesses to
SwsInternal.opts by direct access to SwsContext wherever convenient.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is a preliminary step to separating these into a new struct. This
commit contains no functional changes, it is a pure search-and-replace.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This commit also reduces the number of times ff_sws_init_scale() gets
called (only once per bit depth), and the number of times randomize_buffers()
gets called (only if the function must be checked).
Benchmarks are only performed on bit depths 8 and 16 (since they are
different functions, and not only different constants).
Reduce input sizes to 8 (to test that the function works with widths
smaller than the vector length) and 1920 (raising the largest input
size to improve benchmark results).
And preserve the public SwsContext as separate name. The motivation here
is that I want to turn SwsContext into a public struct, while keeping the
internal implementation hidden. Additionally, I also want to be able to
use multiple internal implementations, e.g. for GPU devices.
This commit does not include any functional changes. For the most part, it is
a simple rename. The only complications arise from the public facing API
functions, which preserve their current type (and hence require an additional
unwrapping step internally), and the checkasm test framework, which directly
accesses SwsInternal.
For consistency, the affected functions that need to maintain a distionction
have generally been changed to refer to the SwsContext as *sws, and the
SwsInternal as *c.
In an upcoming commit, I will provide a backing definition for the public
SwsContext, and update `sws_internal()` to dereference the internal struct
instead of merely casting it.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Depending on the magnitude of the output values, the potential
errors can be larger.
This fixes errors in the lls tests on x86_32 for some seeds,
observed with GCC 11 (on Ubuntu 22.04, with the distro compiler,
with -m32).
Signed-off-by: Martin Storsjö <martin@martin.st>
The unaligned width test cases fail on i386; we have an assembly
function of rgb24toyv12 which is enabled only within
"#if ARCH_X86_32 && HAVE_7REGS", which seems to fail these new
test cases for unaligned widths.
As that assembly function has existed for a long time in that form,
the issue probably isn't very recent, thus skip testing these cases
for now.
Once the assembly function has been fixed, these test cases can
be readded.
Signed-off-by: Martin Storsjö <martin@martin.st>
Since c0666d8b, rgb24toyv12 is broken for width non-aligned to 16.
Add a simple wrapper to handle the non-aligned part.
Co-authored-by: johzzy <hellojinqiang@gmail.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>