Commit graph

50783 commits

Author SHA1 Message Date
Lynne
ca591e6b50
vulkan_decode: force layered_dpb to 0 when dedicated_dpb is 0
layered_dpb only makes sense when dedicated_dpb is set to 1.
For some mysterious reason, some Nvidia drivers stopped indicating
SEPARATE_REFRENCES, but kept the COINCIDE flag, which broke
the code.
2024-08-11 05:13:14 +02:00
Lynne
6757cdb535
vulkan_video: remove NIH pooled buffer implementation
The code predates ff_vk_get_pooled_buffer().
2024-08-11 05:13:10 +02:00
Osamu Watanabe
d88a988d3d
avcodec/jpeg2000dec: Fix HT decoding
Fixes incorrect handling of MAGB_P value in Ccap15.
Fixes bugs in HT block decoding.

Signed-off-by: Pierre-Anthony Lemieux <pal@palemieux.com>
2024-08-10 09:22:51 -07:00
Osamu Watanabe
48b14732d8
avcodec/jpeg2000dec: Add support for placeholder passes
See Rec. ITU-T T.814 | ISO/IEC 15444-15, Annex B.

Signed-off-by: Pierre-Anthony Lemieux <pal@palemieux.com>
2024-08-10 09:22:44 -07:00
Osamu Watanabe
fe1b196499
avcodec/jpeg2000dec: Add support for CAP and CPF markers
Signed-off-by: Pierre-Anthony Lemieux <pal@palemieux.com>
2024-08-10 09:20:15 -07:00
Fei Wang
cda5f5c5ed lavc/qsv: Use vendor id to create device
New kernel driver "xe" will be supported from Lunar Lake instead of
"i915".

"xe" kernel driver:
https://github.com/torvalds/linux/tree/master/drivers/gpu/drm/xe

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-08-09 13:40:26 +08:00
Kacper Michajłow
9876158ee2
avcodec/wmavoice: use av_clipd for double values
Fixes Clang warning.

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-08-07 00:59:18 +02:00
Kacper Michajłow
1165c14444
avcodec/vp9mvs: fix misaligned access when clearing VP9mv
Fixes runtime error: member access within misaligned address
<addr> for type 'av_alias64', which requires 8 byte alignment.

VP9mv is aligned to 4 bytes, so instead doing 8 bytes clear, let's do
2 times 4 bytes.

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-08-07 00:59:18 +02:00
Andreas Rheinhardt
bfcee368e2 avcodec/cbs_sei: Always zero-initialize SEI payload
Fixes: Use-of-uninitialized value
Fixes: clusterfuzz-testcase-minimized-ffmpeg_BSF_H264_METADATA_fuzzer-5458626041413632

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-08-06 20:25:23 +02:00
Kacper Michajłow
5dfc0cc841
avcodec/parser: ensure input padding is zeroed
Fixes use of uninitialized value, reported by MSAN.

Found by OSS-Fuzz.

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>

Fixes: 70852/clusterfuzz-testcase-minimized-ffmpeg_IO_DEMUXER_fuzzer-5179190066872320
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-08-05 23:17:46 +02:00
Rémi Denis-Courmont
616fdeaea3 lavc/riscv: depend on RVB and simplify accordingly
There is no known (real) hardware with V and without the complete B
extension. B was indeed required in the RISC-V application profile from
2022, earlier than V. There should not be any relevant hardware in the
future either.

In practice, different R-V Vector optimisations in FFmpeg already depend on
every constituent of the B extension anyhow, so it would not work well.
2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
4edfc11a28 lavc/h264dsp: R-V V idct4_add8 (all depths)
These are really just wrappers for idct4_add16intra functions, which are in
turn mostly wrappers for idct4_add and idct4_dc_add functions.

For benchmarks refer to the later two sets.
2024-08-05 21:16:26 +03:00
Timo Rothenpieler
9a2171318d avcodec/nvenc: fix signedness of timing fields 2024-08-03 20:04:31 +02:00
James Almer
4a56b5f3d8 avcodec/cbs_h265: don't attempt to read 0 length elements in sei_3d_reference_displays_info
Fixes: 70458/clusterfuzz-testcase-minimized-ffmpeg_BSF_TRACE_HEADERS_fuzzer-5259339779080192
Fixes: Assertion width > 0 && width <= 32 failed at libavcodec/cbs.c:608

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-03 11:59:14 -03:00
Rémi Denis-Courmont
de7f999481 lavc/videodsp: work-around LLVM-as
For some reason, it can't handle the normal syntax for an address operand
without an offset, so add a dummy zero offset.
2024-08-02 21:24:01 +03:00
Rémi Denis-Courmont
677f28b310 lavc/h264dsp: stick R-V V weight to 16-bit precision
T-Head C908 (ns):
h264_weight2_8_c:        1607.8
h264_weight2_8_rvv_i32:   515.0 (before)
h264_weight2_8_rvv_i32:   348.5 (after)
h264_weight4_8_c:        2255.8
h264_weight4_8_rvv_i32:  1015.0 (before)
h264_weight4_8_rvv_i32:   691.0 (after)
h264_weight8_8_c:        3857.5
h264_weight8_8_rvv_i32:  2218.8 (before)
h264_weight8_8_rvv_i32:  1561.3 (after)
h264_weight16_8_c:       7431.5
h264_weight16_8_rvv_i32: 2737.3 (before)
h264_weight16_8_rvv_i32: 1848.3 (after)

SpacemiT X60 (ns):
h264_weight2_8_c:        1624.1
h264_weight2_8_rvv_i32:   352.6 (before)
h264_weight2_8_rvv_i32:   259.3 (after)
h264_weight4_8_c:        2259.3
h264_weight4_8_rvv_i32:   685.8 (before)
h264_weight4_8_rvv_i32:   530.3 (after)
h264_weight8_8_c:        4103.3
h264_weight8_8_rvv_i32:  1581.8 (before)
h264_weight8_8_rvv_i32:  1238.6 (after)
h264_weight16_8_c:       7624.3
h264_weight16_8_rvv_i32: 2738.1 (before)
h264_weight16_8_rvv_i32: 1853.3 (after)
2024-08-02 21:24:01 +03:00
Rémi Denis-Courmont
afd45c7ff7 lavc/h264dsp: stick R-V V biweight to 16-bit
T-Head C908 (ns):
h264_biweight2_8_c:        2414.5
h264_biweight2_8_rvv_i32:   701.8 (before)
h264_biweight2_8_rvv_i32:   468.5 (after)
h264_biweight4_8_c:        4655.3
h264_biweight4_8_rvv_i32:  1377.5 (before)
h264_biweight4_8_rvv_i32:   931.8 (after)
h264_biweight8_8_c:        9701.5
h264_biweight8_8_rvv_i32:  2896.0 (before)
h264_biweight8_8_rvv_i32:  2070.5 (after)
h264_biweight16_8_c:      18025.0
h264_biweight16_8_rvv_i32: 3460.8 (before)
h264_biweight16_8_rvv_i32: 1978.0 (after)

SpacemiT X60 (ns):
h264_biweight2_8_c:        2415.5
h264_biweight2_8_rvv_i32:   478.2 (before)
h264_biweight2_8_rvv_i32:   362.8 (after)
h264_biweight4_8_c:        4655.3
h264_biweight4_8_rvv_i32:   946.7 (before)
h264_biweight4_8_rvv_i32:   727.3 (after)
h264_biweight8_8_c:        9061.8
h264_biweight8_8_rvv_i32:  2071.7 (before)
h264_biweight8_8_rvv_i32:  1685.8 (after)
h264_biweight16_8_c:      18020.5
h264_biweight16_8_rvv_i32: 3457.2 (before)
h264_biweight16_8_rvv_i32: 1935.8 (after)
2024-08-02 21:24:01 +03:00
Zhao Zhili
670ff6c7ce avcodec/nvenc: rework on DTS generation
Before the patch, the method to generate DTS only works with
timebase equal to 1/fps. With timebase like 1/1000

./ffmpeg -i foo.mp4 -an -c:v h264_nvenc -enc_time_base 1/1000 bar.mp4

pts 0    dts -3
pts 160  dts 37
pts 80   dts 77
pts 40   dts 117 <-- invalid
pts 120  dts 157
pts 320  dts 197
pts 240  dts 237
pts 200  dts 277 <-- invalid
pts 280  dts 317 <-- invalid

The generated DTS can be larger than PTS, since it only reorder the
input PTS and minus the number of frame delay, which doesn't take
timebase into account. It should minus the "time" of frame delay.

9a245bd trying to fix the issue, but the implementation is incomplete,
which only use time_base.num. Then it got reverted by ac7c265b33.

After this patch:

pts 0    dts -120
pts 160  dts -80
pts 80   dts -40
pts 40   dts 0
pts 120  dts 40
pts 320  dts 80
pts 240  dts 120
pts 200  dts 160
pts 280  dts 200

Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2024-08-02 17:57:19 +02:00
Roman Arzumanyan
bcea693f75 avcodec/cuviddec: more accurately guess probed sw pixel format
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2024-08-02 17:38:46 +02:00
Rémi Denis-Courmont
2f083fd581 lavc/audiodsp: drop R-V F vector_clipf
This is now firmly slower than C.

SiFive-U74 (cycles):
audiodsp.vector_clipf_c:   31.2
audiodsp.vector_clipf_rvf: 39.5
2024-08-01 19:29:40 +03:00
Rémi Denis-Courmont
c48213b2dc lavc/audiodsp: drop opposite sign optimisation
This was added along side the original SSE(one) DSP function in
0a68cd876e without rationale. This was
presumably faster on x87, which is no longer relevant since we pretty
much assume SSE2 or later on x86.

Meanwhile this function is ~2.5x slower than the normal floating point
one on SiFive-U74.
2024-08-01 19:29:40 +03:00
Rémi Denis-Courmont
d86b6767ce lavc/audiodsp: properly unroll vector_clipf
Given that source and destination can alias, the compiler was forced to
perform each read-modify-write sequentially. We cannot use the `restrict`
qualifier to avoid this here because the AC-3 encoder uses the function
in-place. Instead this commit provides an explicit guarantee to the
compiler that batches of 8 elements will not overlap, so that it can
interleave calculations.

In practice contemporary optimising compilers are able to unroll and keep
the temporary array in FPU registers (without spilling).

On SiFive-U74, this speeds the same signs branch by 4x, and the
opposite signs branch 1.5x.
2024-08-01 19:29:40 +03:00
Rémi Denis-Courmont
d527d23872 lavc/pixblockdsp: specialise aligned 16-bit get_pixels
The current code assumes that we have unaligned rows, which hurts on
platforms with slower unaligned accesses. (Also, this lets the compiler
unroll manually, which it seems to do in practice.)
2024-08-01 18:44:01 +03:00
Rémi Denis-Courmont
54ae270213 lavc/rv34dsp: use saturating add/sub for R-V V DC add
T-Head C908 (cycles):
rv34_idct_dc_add_c:      113.2
rv34_idct_dc_add_rvv_i32: 48.5 (before)
rv34_idct_dc_add_rvv_i32: 39.5 (after)
2024-08-01 18:43:04 +03:00
Rémi Denis-Courmont
952b426f3b lavc/bswapdsp: add RV Zvbb bswap16 and bswap32 2024-08-01 18:43:04 +03:00
James Almer
f4daf633b2 avcodec/aacps_tablegen_template: don't redefine CONFIG_HARDCODED_TABLES
Fixes relevant warnings when compiling with --enable-hardcoded-tables

Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-01 12:13:53 -03:00
Anton Khirnov
bcf08c1171 lavc/ffv1: change FFV1SliceContext.plane into a RefStruct object
Frame threading in the FFV1 decoder works in a very unusual way - the
state that needs to be propagated from the previous frame is not decoded
pixels(¹), but each slice's entropy coder state after decoding the slice.

For that purpose, the decoder's update_thread_context() callback stores
a pointer to the previous frame thread's private data. Then, when
decoding each slice, the frame thread uses the standard progress
mechanism to wait for the corresponding slice in the previous frame to
be completed, then copies the entropy coder state from the
previously-stored pointer.

This approach is highly dubious, as update_thread_context() should be
the only point where frame-thread contexts come into direct contact.
There are no guarantees that the stored pointer will be valid at all, or
will contain any particular data after update_thread_context() finishes.

More specifically, this code can break due to the fact that keyframes
reset entropy coder state and thus do not need to wait for the previous
frame. As an example, consider a decoder process with 2 frame threads -
thread 0 with its context 0, and thread 1 with context 1 - decoding a
previous frame P, current frame F, followed by a keyframe K. Then
consider concurrent execution consistent with the following sequence of
events:
* thread 0 starts decoding P
* thread 0 reads P's slice header, then calls
  ff_thread_finish_setup() allowing next frame thread to start
* main thread calls update_thread_context() to transfer state from
  context 0 to context 1; context 1 stores a pointer to context 0's private
  data
* thread 1 starts decoding F
* thread 1 reads F's slice header, then calls
  ff_thread_finish_setup() allowing the next frame thread to start
  decoding
* thread 0 finishes decoding P
* thread 0 starts decoding K; since K is a keyframe, it does not
  wait for F and reallocates the arrays holding entropy coder state
* thread 0 finishes decoding K
* thread 1 reads entropy coder state from its stored pointer to context
  0, however it finds state from K rather than from P

This execution is currently prevented by special-casing FFV1 in the
generic frame threading code, however that is supremely ugly. It also
involves unnecessary copies of the state arrays, when in fact they can
only be used by one thread at a time.

This commit addresses these deficiencies by changing the array of
PlaneContext (each of which contains the allocated state arrays)
embedded in FFV1SliceContext into a RefStruct object. This object can
then be propagated across frame threads in standard manner. Since the
code structure guarantees only one thread accesses it at a time, no
copies are necessary. It is also re-created for keyframes, solving the
above issue cleanly.

Special-casing of FFV1 in the generic frame threading code will be
removed in a later commit.

(¹) except in the case of a damaged slice, when previous frame's pixels
    are used directly
2024-08-01 10:09:26 +02:00
Anton Khirnov
c335218a81 lavc/ffv1dec: inline copy_fields() into update_thread_context()
It is now only called from a single place, so there is no point in it
being a separate function.
2024-08-01 10:09:26 +02:00
Anton Khirnov
d44812f7cf lavc/ffv1dec: stop using per-slice FFV1Context
All remaining accesses to them are for fields that have the same value
in the main encoder context.

Drop now-unused FFV1Context.slice_contexts.
2024-08-01 10:09:26 +02:00
Anton Khirnov
2b21cdff6e lavc/ffv1dec: move slice_damaged to per-slice context 2024-08-01 10:09:26 +02:00
Anton Khirnov
f2aeba56c4 lavc/ffv1dec: move slice_reset_contexts to per-slice context 2024-08-01 10:09:26 +02:00
Anton Khirnov
84dda32202 lavc/ffv1enc: stop using per-slice FFV1Context
All remaining accesses to them are for fields that have the same value
in the main encoder context.
2024-08-01 10:09:26 +02:00
Anton Khirnov
96e8af6c4d lavc/ffv1: move ac_byte_count to per-slice context 2024-08-01 10:09:26 +02:00
Anton Khirnov
e7d0f44138 lavc/ffv1enc: store per-slice rc_stat(2?) in FFV1SliceContext
Instead of the per-slice FFV1Context, which will be removed in future
commits.
2024-08-01 10:09:26 +02:00
Anton Khirnov
7b2bfba55d lavc/ffv1: move RangeCoder to per-slice context 2024-08-01 10:09:26 +02:00
Anton Khirnov
28769f6bc1 lavc/ffv1: move FFV1Context.plane to per-slice context 2024-08-01 10:09:26 +02:00
Anton Khirnov
9b86ba5a92 lavc/ffv1: always use the main context values of ac
It cannot change between slices.
2024-08-01 10:09:26 +02:00
Anton Khirnov
a57c88d67b lavc/ffv1: move FFV1Context.slice_{coding_mode,rct_.y_coef} to per-slice context 2024-08-01 10:09:26 +02:00
Anton Khirnov
39486a2b29 lavc/ffv1: always use the main context values of plane_count/transparency
They cannot change between slices.
2024-08-01 10:09:26 +02:00
Anton Khirnov
492df65201 lavc/ffv1: drop write-only PlaneContext.interlace_bit_state 2024-08-01 10:09:26 +02:00
Anton Khirnov
a411fc5a84 lavc/ffv1: drop redundant PlaneContext.quant_table
It is a copy of FFV1Context.quant_tables[quant_table_index].
2024-08-01 10:09:26 +02:00
Anton Khirnov
4b9f7c7e3a lavc/ffv1: drop redundant FFV1Context.quant_table
In all cases except decoding version 1 it's either not used, or contains
a copy of a table from quant_tables, which we can just as well use
directly.

When decoding version 1, we can just as well decode into
quant_tables[0], which would otherwise be unused.
2024-08-01 10:09:26 +02:00
Anton Khirnov
d2f507233a lavc/ffv1enc: move bit writer to per-slice context 2024-08-01 10:09:26 +02:00
Anton Khirnov
889faedd26 lavc/ffv1dec: move the bitreader to stack
There is no reason to place it in persistent state.
2024-08-01 10:09:25 +02:00
Anton Khirnov
19e9f3d5f2 lavc/ffv1: move run_index to the per-slice context 2024-08-01 10:09:25 +02:00
Anton Khirnov
91d3c1ac47 lavc/ffv1: move sample_buffer to the per-slice context 2024-08-01 10:09:25 +02:00
Anton Khirnov
54aa33f116 lavc/ffv1: add a per-slice context
FFV1 decoder and encoder currently use the same struct - FFV1Context -
both as codec private data and per-slice context. For this purpose
FFV1Context contains an array of pointers to per-slice FFV1Context
instances.

This pattern is highly confusing, as it is not clear which fields are
per-slice and which per-codec.

Address this by adding a new struct storing only per-slice data. Start
by moving slice_{x,y,width,height} to it.
2024-08-01 10:09:25 +02:00
Anton Khirnov
d845ea49c5 lavc/ffv1dec: move copy_fields() under HAVE_THREADS
It is unused otherwise
2024-08-01 10:09:25 +02:00
Anton Khirnov
3a5c814b19 lavc/ffv1dec: drop a pointless variable in decode_slice()
fsdst is by construction always equal to fs, there is even an
av_assert1() checking that. Just use fs directly.
2024-08-01 10:09:25 +02:00
Anton Khirnov
4da146ba83 lavc/ffv1dec: drop FFV1Context.cur
It is merely a pointer to FFV1Context.picture.f, which can just as well
be used directly.
2024-08-01 10:09:25 +02:00