Before this patch, the last packet in the affected fate test would be written
without a BlockDuration element despite the packet's duration being shorter
than the Opus frame size.
Signed-off-by: James Almer <jamrial@gmail.com>
Following up on b613eebe78, if a demuxer that exports complete frames sets a
duration, don't overwrite it from the output of the parser.
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes a stack-buffer overflow (detected by ASAN) when benching
vc1dsp.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The variable 'size' is used as a loop index for the 'sizes' array.
This naming similarity is error-prone and recently led to a typo where
'size[sizes]' was written instead of 'sizes[size]'.
Rename the loop index variable from 'size' to 'idx' across all 10 test
functions to make the code more readable and prevent similar typos.
Additionally, replace the hardcoded loop upper bound '10' with
'FF_ARRAY_ELEMS(sizes)' for better maintainability.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Commit 4d4b301e4a introduced a typo where `size[sizes]` was used
instead of `sizes[size]` in 10 places within checkasm_check_pixel_padded
calls.
Since `sizes` is an array and `size` is the loop index, `size[sizes]`
interprets the array pointer as an index, resulting in undefined behavior
and causing AddressSanitizer to detect buffer overflows during testing.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
If a codec has fixed block_align and frame_size but a given sample has either
priming or remainder frames, a pakt chunk can be written declaring zero packets
and no table, reporting only the samples to be discarded.
Signed-off-by: James Almer <jamrial@gmail.com>
st->duration is not guaranteed to be set, so store the sum of packet durations instead.
Also, set mPrimingFrames and mRemainderFrames to correct values.
Based on a patch by Jun Zhao.
Signed-off-by: James Almer <jamrial@gmail.com>
Take into account priming frames, exported as start time, and remainder frames,
substracted from the stream duration as well as exported as discard padding
side data in the last packet.
Signed-off-by: James Almer <jamrial@gmail.com>
These names are a remnant of dsputil when all the DSP functions
from all codecs were part of DSPcontext.
Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Reviewed-by: Sean McGovern <gseanmcg@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These functions are not used with these parameters;
see the filter_mb_* functions in h264_loopfilter.c.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Implement NEON optimization for HEVC dequant at 8-bit depth.
The NEON implementation uses srshr (Signed Rounding Shift Right) which
does both the add with offset and right shift in a single instruction.
Optimization details:
- 4x4 (16 coeffs): Single load-process-store sequence
- 8x8 (64 coeffs): Fully unrolled, no loop overhead
- 16x16 (256 coeffs): Pipelined load/compute/store to hide memory latency
- 32x32 (1024 coeffs): Pipelined with all available NEON registers
Performance benchmark on Apple M4:
./tests/checkasm/checkasm --test=hevc_dequant --bench
hevc_dequant_4x4_8_c: 11.3 ( 1.00x)
hevc_dequant_4x4_8_neon: 6.3 ( 1.78x)
hevc_dequant_8x8_8_c: 33.9 ( 1.00x)
hevc_dequant_8x8_8_neon: 6.6 ( 5.11x)
hevc_dequant_16x16_8_c: 153.8 ( 1.00x)
hevc_dequant_16x16_8_neon: 9.0 (17.02x)
hevc_dequant_32x32_8_c: 78.1 ( 1.00x)
hevc_dequant_32x32_8_neon: 31.9 ( 2.45x)
Note on Performance Anomaly:
The observation that hevc_dequant_32x32_8_c is faster than 16x16 (78.1 vs 153.8)
is due to Clang auto-vectorizing only for sizes >= 32x32.
Compiler: Apple clang version 17.0.0 (clang-1700.6.3.2)
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Extract the Sample Aspect Ratio (SAR) from render_width_minus_1 and
render_height_minus_1 in the sequence header.
The AV1 specification defines the render dimensions, which can be used
in conjunction with the coded dimensions to determine the pixel aspect
ratio. This ensures consistent aspect ratio handling for AV1 streams
encapsulated in containers like MP4 or MKV, as observed in the updated
FATE tests where SAR changes from 0/1 to 1/1.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
The faiure handling code is C and requires correct stack, global and
thread pointers. This restores them before returning to C. At the same
time, we no longer need to abort() afterwards.
Implement NEON optimization for compute_weights_line.
Also update the function signature to use ptrdiff_t for stack arguments
(max_meaningful_diff, startx, endx). This is done to unify the stack
layout between Apple platforms (which pack 32-bit stack arguments tightly)
and the generic AAPCS64 ABI (which requires 8-byte stack slots for 32-bit
arguments). Using ptrdiff_t ensures 8-byte slots are used on all AArch64
platforms, avoiding ABI mismatches with the assembly implementation.
The x86 AVX2 prototype is updated to match the new signature.
Performance benchmark (AArch64) in MacOS M4:
./tests/checkasm/checkasm --test=vf_nlmeans --bench
compute_weights_line_c: 151.1 ( 1.00x)
compute_weights_line_neon: 62.6 ( 2.42x)
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
If an input packet results in several output packets, the side data will be
exported only by the first output packet, and be missing from the rest.
Signed-off-by: Nicolas Gaullier <nicolas.gaullier@cji.paris>
Signed-off-by: James Almer <jamrial@gmail.com>
Many parsers will request data until they find what will be the start of the
next assembled packet in order to decide where to cut the current one. If this
happens, the loop in demux.c will, in case the demuxer exports already fully
assembled packets as is sometimes the case for MPEG-TS, discard the already
handled first input packet before it tries to move its side data to the output.
The affected FATE tests reflect this change by no longer dropping the side data
from the first input packet, nor exporting every other side data in the wrong
output packet.
Signed-off-by: James Almer <jamrial@gmail.com>
This is in preparation for adding checkasm support for av_crc(),
which will always call the same function, but uses different CRC
tables to distinguish different implementations.
This reuses checkasm_check_func() for this; one could also add
a new function or use unions. This would allow to avoid casting
const away in the crc test to be added. It would also allow
to avoid converting function pointers to void* (which ISO C
does not allow).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
To be able to reuse colors from the original frame, the last value returned by
`p()` is tracked in the eval state, and if it is assigned to a variable, the
original color components are copied to `color_vars`. Thus, commands like
`setcolor` and `colorstop` can use those variables:
setvar pixel (p(0, 0))
...
setcolor pixel
`fate-filter-drawvg-video` now also verifies the `p()` function.
Signed-off-by: Ayose <ayosec@gmail.com>
The arguments for `setvar` and `call` commands can be colors (like `#rrggbb`).
This replaces the previous trick of using `0xRRGGBBAA` values to use colors as
procedure arguments.
The parser stores colors as the value expected by Cairo (a `double[4]`). This
array is allocated on the heap so the size of the union in `VGSArgument` is
not increased (i.e. it is still 8 bytes, instead of 32).
Signed-off-by: Ayose <ayosec@gmail.com>
If AVOptionArrayDef.def is NULL, av_opt_is_set_to_default() should return true
when the field in the object is NULL.
Signed-off-by: James Almer <jamrial@gmail.com>
Some tools like v4l2-ctl dump data without skip padding. If the
padding size is not an integer multiple of the number of pixel
bytes, we cannot handle it by using a larger width, e.g.,, for
RGB24 image with 32 bytes padding in each line.
This function was assuming that the bits are MSB-aligned, but they are
LSB-aligned in both practice (and in the actual backend).
Also update the documentation of SwsPackOp to make this clearer.
Fixes an incorrect omission of a clamp after decoding e.g. rgb4, since
the max value range was incorrectly determined as 0 as a result of unpacking
the MSB bits instead of the LSB bits:
bgr4 -> gray:
[ u8 XXXX -> +XXX] SWS_OP_READ : 1 elem(s) packed >> 1
[ u8 .XXX -> +++X] SWS_OP_UNPACK : {1 2 1 0}
[ u8 ...X -> +++X] SWS_OP_SWIZZLE : 2103
[ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> f32
[f32 ...X -> .++X] SWS_OP_LINEAR : dot3 [...]
[f32 .XXX -> .++X] SWS_OP_DITHER : 16x16 matrix + {0 3 2 5}
+ [f32 .XXX -> .++X] SWS_OP_MIN : x <= {255 _ _ _}
[f32 .XXX -> +++X] SWS_OP_CONVERT : f32 -> u8
[ u8 .XXX -> +++X] SWS_OP_WRITE : 1 elem(s) planar >> 0
(X = unused, + = exact, 0 = zero)
Most EXIF metadata is in IFD0 and most EXIF payloads only contain
one IFD, but it is possible for there to be more IFDs after the
existing trailing one. exiftool and similar software report these IFDs
as IFD1, IFD2, etc. This commit reads those additional IFDs and attaches
them as dummy entries in the top-level IFD ranging from 0xFFFC down to
0xFFED, which are unused by the EXIF spec. The EXIF API is only able to
return and work with a single IFD, so by attaching it as a subdirectory
this metadata can be preserved.
This is done transparently through the read/write process. Upon parsing
an additional IFD1, it will be attached, but it will be written with
av_exif_write after IFD0 rather than as a subdirectory, as intended.
Existing files without more than one IFD, i.e. most files, will be unaffected
by this change, as well as API clients looking to parse specific fields, but
now more metadata is parsed and written, rather than simply being discarded
as trailing data.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
This avoids needing to use the extra_conf variable. That variable
is problematic for setting a value that contains spaces.
This adds options for another tool in the same fashion as other
tools were added in 523d688c2b.