The checkasm tool originated in x264. It was later rewritten and
modernized for FFmpeg (and relicensed to LGPL). For the dav1d
project, it was relicensed again to 2-clause BSD (with permission
from the relevant authors).
The FFmpeg and dav1d implementations of checkasm have since evolved
independently (with some amount of ported code between the two,
with relicensing permission where relevant).
To synchronize the development, and to make it possible to easily
adopt checkasm in other projects, it has been split out into a
standalone project/library on its own, developed at
https://code.videolan.org/videolan/checkasm/.
That version has all the features of checkasm in both FFmpeg and
dav1d, and has got a number of extra improvements on top:
- More/fixed tests (e.g. properly clobbering high bits of 32-bit registers
on most platforms),
- Vastly improved overall performance / runtime for benchmarking, due
primarily to the ability to scale the runtime of each test to that test's
complexity.
- Much more robust statistical analysis of benchmarking results; including
robust outlier rejection, an estimation of the histogram, and the ability
to report the variance / stddev in addition to the (trimmed) mean.
- Interactive HTML and JSON output formats in addition to CSV/TSV.
- More readable and user-friendly output across the board, especially for
failures and data dumps (e.g. also showing errors inside padding bytes).
- Better cross-platform support, including dynamic fallback of timer
implementations on ARM platforms, a better RISC-V and AArch64 harness,
and more.
On AArch64, it tests which timer out of pmccntr_el0, linux perf,
macos kperf, cntvct_el0 is available, without the user needing to
configure things, and falling back on clock_gettime if neither of
them can be used. This means one automatically gets the best
available timer, if userspace access to pmccntr_el0 has been
unlocked with a kernel module, or if one has permission to use
the perf API, or if the cntvct_el0 is exact enough to be useful.
On AArch64 macOS, there is now a test harness that catches clobbered
registers and stack clobbering, like on other platforms.
- An option for setting affinity, for benchmarking on heterogenous
core systems. (On Linux, this is already easily done through
taskset, but on Windows, the checkasm built in option makes it
possible there as well, and portable.)
- Printing of the tested CPU core name, where possible.
To integrate this external implementation of checkasm into FFmpeg,
without having to build libcheckasm as an external library, the upstream
sources are added as a git subtree, and integrated into the FFmpeg
build system as a foreign source.
For the long and storied history of how we arrived at this solution,
see: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/22546
The relevant config headers for checkasm are generated by configure,
and the sources are built as part of the main ffmpeg build. The
upstream sources, while they use meson as primary build system,
are structured to make it easy to build as part of a foreign build
system.
The existing testcases are mostly kept untouched (only three minor
changes are required, in crc.c, sw_ops.c and vp8dsp.c), while the
majority of the logic from checkasm.c, checkasm.h and the arch
specific assembly files are removed, replaced with the external
implementation.
Co-Authored-By: Martin Storsjö <martin@martin.st>
Signed-off-by: Niklas Haas <git@haasn.dev>
To reproduce this commit, run:
$ git subtree add --squash --prefix=tests/checkasm/ext \
https://code.ffmpeg.org/FFmpeg/checkasm.git master
To update at a later point in time, replace `add` by `pull`
Pre-emptively exclude the external checkasm sources. Split off from the
following merge commit to make the history easier to follow.
Signed-off-by: Niklas Haas <git@haasn.dev>
Not only is this duplicating code, but it also hard-codes a reference to
`checkasm_lfg`, which I want to eliminate in the interest of being able to
switch out the checkasm implementation.
The test data size is quite large, so re-setting up unused data is eating up
quite a significant amount of CPU time.
This commit cuts execution time of sw_ops in half.
Signed-off-by: Niklas Haas <git@haasn.dev>
Outputting an UNSPEC layout will make most callers guess the speaker layout, and
more likely than not get it wrong.
Now that we can freely export custom order layouts, lets use them.
Signed-off-by: James Almer <jamrial@gmail.com>
The heuristics run to detect PES streams are much laxer than mp3/ac3 ones,
which check for valid headers, so it should not have a higher score than the
latter.
Fixes misdetection of some mp3 files with big id3v2 tags at the beginning.
Signed-off-by: James Almer <jamrial@gmail.com>
When AV_PKT_DATA_HEVC_CONF is present on an HEVC track, write
an hvcE BlockAdditionMapping alongside the existing dvcC/dvvC one,
carrying the raw HEVCDecoderConfigurationRecord for the enhancement layer.
Handle MATROSKA_BLOCK_ADD_ID_TYPE_HVCE in mkv_parse_block_addition_mappings
and store the raw HEVCDecoderConfigurationRecord as
AV_PKT_DATA_HEVC_CONF on the stream's coded side data, mirroring
the existing dvcC/dvvC handling.
When AV_PKT_DATA_HEVC_CONF is present on a MODE_MP4 HEVC
track, write it as an hvcE box alongside hvcC and dvcC. Like dvcC,
writing requires -strict unofficial.
The hvcE box carries the HEVCDecoderConfigurationRecord for the Dolby
Vision enhancement layer in ISOM-based containers. Store its raw
contents as AV_PKT_DATA_HEVC_CONF on the stream's coded side data,
mirroring the existing dvcC/dvvC handling.
Should fix buffer overflows as reported by clang-asan and use of uninitialized
values as reported by valgrind.
Signed-off-by: James Almer <jamrial@gmail.com>
Add a CNG (comfortnoise) round-trip FATE test using the existing enc_dec_pcm + framemd5 pattern and include its generated reference output.
and a 2nd test that compares MD5 of the encoded stream
Tested on x86-32 & 64, arm, mips qemu
Co-Authored-with: AI
These codecs cannot self-report layout in the bitstream, and
are known or expected to use a libavutil-compatible channel
order, and as such can use mov_ch_layouts_wav for tag lookup.
Test the five public functions not already covered by
tests/color_utils: av_csp_luma_coeffs_from_avcsp,
av_csp_primaries_desc_from_id, av_csp_primaries_id_from_desc,
av_csp_approximate_trc_gamma, and av_csp_approximate_eotf_gamma.
Iterates every AVCOL_SPC, AVCOL_PRI, and AVCOL_TRC value including
the extended ranges, round-trips primaries via desc_eq so the
canonical first-match (e.g. smpte170m for smpte240m) is accepted,
checks that a garbage desc returns AVCOL_PRI_UNSPECIFIED, and that
out-of-range enum values return NULL or 0.0 as documented. The
trc/eotf gamma values come from static lookup tables so the
floating point output is bitexact across platforms.
Coverage for libavutil/csp.c: 88.50% -> 94.46%
Test av_ambient_viewing_environment_alloc with and without the size
out-parameter, and av_ambient_viewing_environment_create_side_data.
Verifies the {0, 1} rational defaults set by get_defaults(),
write/read-back of the three AVRational fields, frame side data
attachment, and OOM paths via av_max_alloc.
Coverage for libavutil/ambient_viewing_environment.c: 60.00% -> 100.00%
The previous chroma stride formula (width >> log2_chroma_w) is correct
for planar yuv but wrong for semi-planar nv12/nv21, where the UV plane
is interleaved at width bytes per row (width/2 UV pairs of 2 bytes
each). Use av_image_get_linesize() so the test feeds a valid stride to
libswscale regardless of input format; for the existing planar suites
the value is unchanged.
With the stride fixed, add nv12 and nv21 to check_yuv2rgb() so the
upcoming NEON 16bpp paths get bench coverage. ff_get_unscaled_swscale
does not wire a C yuv2rgb fast path for these inputs, so the suites
report bench-only (no correctness reference); they still run clobber
detection and cycle counts.
Signed-off-by: DROOdotFOO <drew@axol.io>
tref types can have more than one value, as is the case of tmcd in
fcp_export8-236.mov, where the single video track references all timecode
tracks.
Handle them in a generic and extensible way.
Signed-off-by: James Almer <jamrial@gmail.com>
And set it also for non-variable frame size encoders.
FATE changes are the result of passing a frame_size to flac and wavenc
encoders, instead of letting them choose one.
Signed-off-by: James Almer <jamrial@gmail.com>
Both worksaround a issue the following commit reveals (encoding with 4096
frame_size fails on aarch64 for unknown reasons), and tests setting
frame_size now that it's allowed (and ensuring the CLI doesn't overwrite it).
Signed-off-by: James Almer <jamrial@gmail.com>
Unfortunately a bit slower than the MMX version due to
the impossibility to use memory operands in paddw.
The situation would reverse if ff_dctB_mmx() would have
to issue emms.
dctB_c: 3.7 ( 1.00x)
dctB_mmx: 3.3 ( 1.13x)
dctB_sse2: 3.6 ( 1.03x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>