Use PRIu32/PRIX32 format specifiers instead of %d/%u/%X for uint32_t
variables in av_log calls. On some platforms (e.g. NuttX), uint32_t is
typedef'd as unsigned long rather than unsigned int, which triggers
-Wformat warnings despite both types being 4 bytes. Using PRI macros
is the portable way to match the actual underlying type of uint32_t.
Signed-off-by: zengshuang <zengshuang@xiaomi.com>
Ensures samples where a missing Frame Header is handled by a subsequent
Redundant one are parsed correctly.
Signed-off-by: James Almer <jamrial@gmail.com>
(This also stops zero-allocating er_temp_buffer for H.264,
reverting back to the behavior from before commit
0a1dc81723.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Given we rewrite these NALUs to remove the encoded data blocks to export as extradata,
we need to do the inverse to remove SC, GC and AI blocks to export as filtered data in
packes.
Signed-off-by: James Almer <jamrial@gmail.com>
write_lcevc_nalu() is meant only for IDR and NON_IDR NALUs. For everything else, just
copy it unchanged.
Signed-off-by: James Almer <jamrial@gmail.com>
Rewrite ff_hevc_put_hevc_qpel_h16_8_neon and h32 to use byte-domain
widening multiply (umull/umlal/umlsl via calc_qpelb/calc_qpelb2 macros)
instead of the previous int16-domain approach (uxtl + mul/mla).
The byte-domain approach eliminates the uxtl expansion step and halves
the ext stride (1 byte vs 2 bytes per tap), reducing per-row instruction
count from ~32 to ~23. The functions are also inlined, removing bl/ret
call overhead.
This benefits all HV-path callers (hv/uni_hv/bi_hv/uni_w_hv/bi_w_hv)
at widths 16/32/48/64.
checkasm benchmarks on Apple M4 (5-run average):
H-pass standalone (NEON):
h16: 34.0 -> 24.4 cycles (1.39x speedup)
h32: 132.0 -> 95.0 cycles (1.39x speedup)
h64: 521.8 -> 373.9 cycles (1.40x speedup)
HV compound paths geometric mean speedup (NEON, width >= 16):
qpel_hv: 1.144x (4 functions)
qpel_bi_hv: 1.158x (4 functions)
qpel_uni_hv: 1.188x (4 functions)
qpel_uni_w_hv: 1.158x (3 functions)
Overall: 1.162x (15 functions)
VVC qpel h16/h32 are separated into self-contained functions retaining
the int16-domain approach, as VVC filters have arbitrary coefficients
incompatible with the hardcoded sign pattern in calc_qpelb.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
The VVC qpel h16 and h32 functions had a redundant 'mov mx, x30'
instruction. The first one was placed before vvc_load_filter had
finished using mx (the filter pointer argument), making it a dead
store immediately overwritten by the second 'mov mx, x30'.
Remove the first instance and reorder so that 'sub src, src, #3'
comes before 'mov mx, x30', ensuring the filter pointer in mx is
fully consumed by vvc_load_filter before being overwritten with the
link register.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Fixes: out of array access
Fixes: 471509958/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_ADPCM_IMA_MAGIX_DEC_fuzzer-4847227777646592
We ask for a mono sample because the implementation for mono is incomplete
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: division by zero
Fixes: 473579863/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_EXR_DEC_fuzzer-5105281257504768
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 2147483640 + 32 cannot be represented in type 'int'
Fixes: 473569764/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_DEC_fuzzer-5377306970619904
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes non repeatable checksums
This also avoids allocating the mc only buffer when its not used
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
If there is no escape case then reaching that branch is an error
Fixes: shift exponent 32 is too large for 32-bit type 'uint32_t' (aka 'unsigned int')
Fixes: 472335543/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DST_fuzzer-6682453243920384
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: runtime error: shift exponent -1 is negative
Fixes: runtime error: shift exponent 32 is too large for 32-bit type 'int'
Fixes: 471846062/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_DEC_fuzzer-5835290976780288
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 9223372036854775807 + 3546086691638400 cannot be represented in type 'int64_t' (aka 'long')
Fixes: 471723681/clusterfuzz-testcase-minimized-ffmpeg_dem_MXF_fuzzer-4841032488648704
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 1077919680 + 1077936128 cannot be represented in type 'int'
Fixes: 471686763/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_ADPCM_N64_DEC_fuzzer-6493712281829376
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
It is no longer necessary: Most encoders don't use any MMX code
at all any more and the ones that do issue emms on their own.
Instead add an ff_asser1_fpu() to check that it stays that way.
To add some more information, only the halfpel (and fullpel),
qpel and motion estimation APIs use MMX in addition to the
SBC and Snow specific dsp code. halfpel is used by the
mpegvideo encoders, SVQ1 and Snow encoders. The same
encoders in addition to the AC-3 ones and dvvideo use me_cmp.
qpel is only used by the MPEG4 encoder which is part of
mpegvideo. None of these codecs need the generic emms_c (even on
errors):
a) The AC-3 encoders only use a width 16 me_cmp function which
can no longer use MMX since d91b1559e0.
b) dvvideo and SBC emit emms on their own and have no error paths
after the start of the part that can use MMX.
c) SVQ1 calls emms_c() on its own, even on all error paths
that need it.
d) Snow calls emms_c() on its ordinary (success) return path;
it has only one error path in the part of the code that uses MMX,
but even it is fine as ratecontrol_1pass() always calls emms_c()
itself.
e) For mpegvideo, the MMX code is almost confined to the part
of the code reachable from the worker threads (if slice threading
is in use). The exception to this is in skip_check() which always
calls emms_c() itself. Because encode_picture() always calls
emms_c() itself after executing the worker threads and before any
error condition, the floating point state is clean upon exit from
encode_picture().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Needed for the allocations in ff_snow_common_init_after_header()
(as well as for calculate_visual_weight() if
spatial_decomposition_count could change).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is unnecessary since d91b1559e0
(before that the sad_cmp[0] call in get_intra_count() may
have clobbered the floating point state without cleaning it up
itself).
Also remove some commented out emms_c from places where
the floating point state is guaranteed not to have been clobbered
by us.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Contrary to the MMXEXT version this version does not overread at all
(the MMXEXT version processes the input of 2*w bytes in eight byte
chunks and overreads by a further six bytes, because it loads
the next left and left top values at the end of the loop,
i.e. it reads FFALIGN(2*w,8)+6 bytes instead of 2*w).
Benchmarks:
sub_hfyu_median_pred_int16_9bpp_c: 12673.6 ( 1.00x)
sub_hfyu_median_pred_int16_9bpp_mmxext: 1947.7 ( 6.51x)
sub_hfyu_median_pred_int16_9bpp_sse2: 993.9 (12.75x)
sub_hfyu_median_pred_int16_9bpp_aligned_c: 12596.1 ( 1.00x)
sub_hfyu_median_pred_int16_9bpp_aligned_mmxext: 1956.1 ( 6.44x)
sub_hfyu_median_pred_int16_9bpp_aligned_sse2: 989.4 (12.73x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows to only use certain functions using wide registers
if there is enough work to do and if one can even read a whole
register wide without overreading.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For modern cpus (like those supporting AVX2) loads and stores
using the unaligned versions of instructions are as fast
as aligned ones if the address is aligned, so remove
the aligned AVX2 version (and the alignment check) and just
use the unaligned one.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This avoids a check and a shift if >=8 elements are processed;
it adds a check if < 8 elements are processed (which should
be rare).
No change in benchmarks here.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
AVOption with AV_OPT_TYPE_INT assumes the field is int (4 bytes),
but enum size is implementation-defined and may be smaller.
This can cause memory corruption when AVOption writes 4 bytes
to a field that is only 1-2 bytes, potentially overwriting
adjacent struct members.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Add parameter `coder` for users to select entropy coding in D3D12 H264
encoding.
Named constants `cabac` (1) and `cavlc` (0) are supported.
Default is CABAC (1). If the driver does not support CABAC, a warning is
logged and encoding falls back to CAVLC.
Usage:
CABAC (default): `-coder cabac` or `-coder 1`
CAVLC: `-coder cavlc` or `-coder 0`
Sample command line:
```
ffmpeg -hwaccel d3d12va -hwaccel_output_format d3d12 -i input.mp4 -c:v h264_d3d12va -coder cavlc -y output.mp4
```
This patch enables two-pass encoding for libsvtav1 by implementing
support for AV_CODEC_FLAG_PASS1 and AV_CODEC_FLAG_PASS2.
Previously, users requiring two-pass encoding with SVT-AV1 had to use
the standalone SvtAv1EncApp tool. This patch allows 2-pass encoding
directly through FFmpeg.
Based on patch by Fredrik Lundkvist, with review feedback from James
Almer and Andreas Rheinhardt.
See: https://ffmpeg.org/pipermail/ffmpeg-devel/2024-May/327452.html
Changes:
- Use AV_BASE64_DECODE_SIZE macro for buffer size calculation
- Allocate own buffer for rc_stats_buffer (non-ownership pointer)
- Error handling with buffer cleanup
Signed-off-by: Werner Robitza <werner.robitza@gmail.com>
For bi-predicted weighted averages, only the sum
of the two offsets is ever used, so add the two early.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Improves error diagnostics for D3D12 video encoders check feature support by adding
detailed ValidationFlags reporting when driver validation fails.
This made it easy for users to identify which specific feature was
unsupported without manually decoding the flags.
Signed-off-by: younengxiao <steven.xiao@amd.com>
This file is becoming too bloated and hard to read, so split it into separate
files, each having codec specific methods.
This will also speed up compilation when using several concurrent jobs.
Signed-off-by: James Almer <jamrial@gmail.com>