fix issue: issues/23238
Several time-related fields in DASHContext were declared as uint64_t,
causing the arithmetic in calc_cur_seg_no(), calc_min_seg_no(), and
calc_max_seg_no() to be performed with unsigned semantics.
The expression:
(get_current_time_in_sec() - availability_start_time) * fragment_timescale
is uint64_t throughout. When presentationTimeOffset is large (e.g. an
absolute epoch-based timestamp common in DVB-DASH live streams), the
subsequent subtraction:
uint64_t_result - presentation_timeoffset
wraps around to a value near 2^64, because the elapsed wall-clock time
in timescale ticks is far smaller than the absolute presentation time
offset. The enormous quotient ends up truncated to int32_t when passed
to ff_dash_fill_tmpl_params(), producing a negative $Number$ value in
the segment URL and causing repeated HTTP 403 errors.
Fix this by changing the affected fields and the two helper functions
(get_current_time_in_sec, get_utc_date_time_insec) from uint64_t to
int64_t. All values involved are well within the int64_t range (Unix
timestamps in seconds and ISO 8601 durations), and the arithmetic
naturally needs signed semantics because intermediate sub-expressions
like (elapsed - time_shift_buffer_depth) can be negative at stream
start.
Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
The only noticable changes in benchmarks are for
the x2 horizontal no_rnd case where SSE2 and movhps
are beneficial:
Old benchmarks:
avg_pixels_tab[1][1]_c: 42.2 ( 1.00x)
avg_pixels_tab[1][1]_mmxext: 10.8 ( 3.89x)
avg_pixels_tab[1][2]_c: 18.0 ( 1.00x)
avg_pixels_tab[1][2]_mmxext: 6.1 ( 2.96x)
put_no_rnd_pixels_tab[1][1]_c: 29.7 ( 1.00x)
put_no_rnd_pixels_tab[1][1]_mmxext: 12.3 ( 2.41x)
put_no_rnd_pixels_tab[1][2]_c: 20.4 ( 1.00x)
put_no_rnd_pixels_tab[1][2]_mmxext: 12.2 ( 1.67x)
put_pixels_tab[1][1]_c: 29.9 ( 1.00x)
put_pixels_tab[1][1]_mmxext: 7.6 ( 3.92x)
put_pixels_tab[1][2]_c: 16.8 ( 1.00x)
put_pixels_tab[1][2]_mmxext: 6.4 ( 2.63x)
New benchmarks:
avg_pixels_tab[1][1]_c: 42.3 ( 1.00x)
avg_pixels_tab[1][1]_sse2: 10.7 ( 3.95x)
avg_pixels_tab[1][2]_c: 17.8 ( 1.00x)
avg_pixels_tab[1][2]_sse2: 6.3 ( 2.83x)
put_no_rnd_pixels_tab[1][1]_c: 29.6 ( 1.00x)
put_no_rnd_pixels_tab[1][1]_sse2: 10.5 ( 2.81x)
put_no_rnd_pixels_tab[1][2]_c: 20.4 ( 1.00x)
put_no_rnd_pixels_tab[1][2]_sse2: 12.3 ( 1.67x)
put_pixels_tab[1][1]_c: 30.1 ( 1.00x)
put_pixels_tab[1][1]_sse2: 7.6 ( 3.93x)
put_pixels_tab[1][2]_c: 16.8 ( 1.00x)
put_pixels_tab[1][2]_sse2: 6.4 ( 2.64x)
Switching to SSE2 unfortunately increased codesize of the relevant
functions by 160B.
This makes these functions ABI compatible, i.e. they no longer
rely on others calling emms_c to fix the fpu state. It also
implies that many mpegvideo decoders (the exceptions are MPEG-4,
RV30, RV40 and the VC-1 family) now no longer use any mmx registers
at all. So one can remove the emms_c from the MPEG-1/2 decoder.
The same is true for VP3.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
No change in benchmarks here; this already allows
to remove an emms_c from cavsdec.c.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Right now, their exact counterparts have a "_exact" in their names;
switch this around.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
At check_count == CHECK_COUNT the existing branch caps the score at
SCORE_MAX/2 even when every analyzed packet is sync-aligned and when
analyze() already has full confidence. This loses probe to
signature-only image demuxers (e.g. png_pipe at SCORE_MAX - 1) for
streams with a small leading non-TS prefix. Some CDNs prepend a 1x1 PNG
to MPEG-TS payloads to bypass image-only Content-Type filtering, and the
PNG signature otherwise wins the first probe iteration.
Fixes: https://github.com/mpv-player/mpv/issues/11365
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
The resample asm code as it is currently handles 1 sample at a time
The asm code should be redesigned and handle more than 1 sample at a
time. That is the whole purpose of SIMD. There is also multiple samples
available that need identical handling like from several channels or
similar handling from other points in time.
Such redesign would make the resampler faster and would change the
requirements of padding and maybe memory layout. So it seems simpler
to just avoid overwriting in the asm as it is today than to have
the allocation handle specific overallocation for asm code that
ideally should be redesigned
Fixes writing 16bits over the end of the array
This is an alternative fix for https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/23053
Found-by: Ivan Grigorev <ivangrigoriev@meta.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Add a regression test exercising the swr_convert(N) -> swr_convert(2N)
edge case: the second call reuses the internal preout buffer at full
capacity, with no trailing slack from swri_realloc_audio()'s amortized
doubling. internal_sample_fmt is forced to S16P to reach the int16 SIMD
resample path, where ff_resample_common_int16_sse2 overruns its
destination by 2 bytes on the last iteration.
Without a resampler fix this test fails under valgrind/ASAN with a
heap-buffer-overflow (Invalid write of size 4, 2 bytes past the end).
Signed-off-by: Ivan Grigorev <ivangrigoriev@meta.com>
Before, glibc appears to transitively pull in the syscall number
definitions, but musl does not do this. Thus, `__NR_riscv_hwprobe`
is undeclared and an error is emitted.
Fix this by including `<asm/unistd.h>`, which makes the macro
visible on musl.
Part of the yuv2planeX ASAN fix - replace vec_vsx_ld with vec_splats
to avoid reading past the filter array.
Signed-off-by: Scott Boudreaux <scott@elyanlabs.com>
Fix two buffer overreads in the PowerPC yuv2planeX SIMD paths
that cause daily FATE checkasm-sw_scale ASAN failures on both
ppc64 (G5, altivec) and ppc64le (POWER9, VSX):
1. VSX LOAD_FILTER: vec_vsx_ld(joffset, filter) reads 16 bytes
at the given byte offset. When joffset >= filterSize*2 - 14
(e.g. joffset=30 for filterSize=16), this reads up to 14 bytes
past the 32-byte filter array. Fix by replacing the vector
load with vec_splats(f[j]) which only reads the single int16_t
element needed (the result is splatted to all lanes anyway).
2. GET_LS look-ahead overread: yuv2planeX_8_16 calls
yuv2planeX_8 twice per filter tap. Each call's GET_LS macro
speculatively loads the next 16-byte vector for pipelining.
On the second call, this look-ahead reads 16 bytes past the
last valid source element. Fix by tightening the SIMD loop
bound from (dstW - 15) to (dstW - 23), ensuring the farthest
speculative load stays within src[j][0..dstW-1]. The scalar
fallback handles the remaining 16-23 trailing pixels.
The ASAN reports from FATE:
ppc64 (altivec): stack-buffer-overflow in yuv2planeX_8_16_altivec
at swscale_ppc_template.c:56
ppc64le (VSX): unknown-crash in yuv2planeX_8_16_vsx
at swscale_ppc_template.c:52
Signed-off-by: Scott Boudreaux <scott@elyanlabs.com>
If the filesize is known as a result of AVERROR_EOF on a block that ends
before the current seek position, this might end up negative. Error
out cleanly instead of aborting.
Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
flock() also locks against accesses by other threads of the same
process, unlike fcntl().
Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead, read to the output/temporary buffer (write_back path). This is to
lessen the impact of racing the write against other clients trying to race
the same pending block.
Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
Needed for the upcoming commit, but also more robust in general. The memory
waste is negligible.
Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
This value is matched to the typical seek latency in a reasonably capable
7200 rpm disk device, as well as the typical latency of an on-premise HTTP
request.
Note that this change should rarely have a significant effect, because
it only matters when using multiple concurrent processes, and one process
is somehow stuck in I/O (or died). Since we sleep in a loop for 1/16th of
the requested timeout value, this should only increase the effective read
latency by up to ~500 us on top of the actual underlying latency.
The alternative is hammering the same underlying resource with the exact
same requests at the exact same time (e.g. during init).
Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
If this happens, something is almost surely wrong with the cache file
(e.g. mismatched source file), so it's much better to error out rather than
hit silent data corruption.
Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
This is already checked in libavformat, at least in the only demuxer that
creates them, but best not risk an out-of-bounds access in case a new demuxer
doesn't take the proper measures.
Signed-off-by: James Almer <jamrial@gmail.com>
Slice based filter workers compute their per-thread row/sample/channel
boundaries as total * jobnr / nb_jobs. The total * jobnr product is
evaluated in int and overflows signed int for large dimensions and many
slice threads, before the division by nb_jobs brings it back in range.
deinterlace_slice() computed per-thread row boundaries with int
multiplication height * (jobnr + 1). With a tall frame and many filter
threads the product overflows signed int before the division by nb_jobs.
Use int64_t for the intermediate product before converting back to int
row indices.
Found-by: Kery (Qi Kery <qikeyu2001@outlook.com>)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Previously scale_cascaded() assumed the whole source frame arrived in a
single sws_scale() call, and the dispatcher only routed full-frame calls
to it. A partial input slice fell through to ff_swscale() on the parent
dispatcher context, whose scaler state (c->desc) is never initialized in
cascade mode, causing a NULL dereference / crash.
Top-down sliced output is bit-exact with full-frame scaling; bottom-up
matches swscale's pre-existing (non-cascade) slice behaviour for
subsampled intermediate formats.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>