frames_ctx->width/height were unconditionally rounded to even, causing
odd-dimension monochrome/444 clips to be reported with incorrect
surface pool dimensions. Round only for 4:2:0 and 4:2:2; for
monochrome/444 use avctx->coded_width/coded_height unchanged,
matching the dimensions set by the software codec layer.
Patch by: Aniket Dhok <adhok@nvidia.com>
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Leave the existing one for non decoder-specific, post processing usage.
With this, scenarios like nvdec decoding can work algonside lcevc enhancement application.
Signed-off-by: James Almer <jamrial@gmail.com>
Cap ulNumDecodeSurfaces to 32 and ulNumOutputSurfaces to 64 to prevent
cuvidCreateDecoder from failing with CUDA_ERROR_INVALID_VALUE when
initial_pool_size exceeds the hardware limits.
Also cap the decoder index pool (dpb_size) to 32 so that indices
handed out via av_refstruct_pool_get stay within the valid range
for cuvidDecodePicture's CurrPicIdx.
When unsafe_output is enabled, stop holding idx_ref in the unmap
callback. Since cuvidMapVideoFrame copies decoded data into an
independent output mapping slot, the decode surface index can safely
be reused as soon as the DPB releases it, without waiting for the
downstream consumer to release the mapped frame. This decouples the
decode surface index lifetime (max 32) from the output mapping slot
lifetime (max 64), eliminating the "No decoder surfaces left" error
that occurred when downstream components like nvenc held too many
frames.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
This is possible without deprecation period, because said field
is documented as only for our libav* libraries and not the general
public.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
This commit adds support for 4:2:2 decoding for HEVC and H.264 on
NVIDIA Blackwell GPUs. Additionally, it supports 10-bit decoding
for H.264 on Blackwell GPUs.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
There are lots of files that don't need it: The number of object
files that actually need it went down from 2011 to 884 here.
Keep it for external users in order to not cause breakages.
Also improve the other headers a bit while just at it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Ensure all hwaccels that allocate a buffer use NVDECContext->bitstream_internal
instead. Otherwise, if FFHWAccel->end_frame() isn't called before
FFHWAccel->uninit(), an attempt to free a stale pointer to memory not owned by
the hwaccel could take place.
Reviewed-by: Timo Rothenpieler <timo@rothenpieler.org>
Signed-off-by: James Almer <jamrial@gmail.com>
It involves less allocations, in particular no allocations
after the entry has been created. Therefore creating a new
reference from an existing one can't fail and therefore
need not be checked. It also avoids indirections and casts.
Also note that nvdec_decoder_frame_init() (the callback
to initialize new entries from the pool) does not use
atomics to read and replace the number of entries
currently used by the pool. This relies on nvdec (like
most other hwaccels) not being run in a truely frame-threaded
way.
Tested-by: Timo Rothenpieler <timo@rothenpieler.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Avoids allocations and error checks as well as the boilerplate
code for creating an AVBuffer with a custom free callback.
Also increases type safety.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Tested-by: Timo Rothenpieler <timo@rothenpieler.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It (unfortunately) involves an allocation and can therefore fail.
Reviewed-by: Timo Rothenpieler <timo@rothenpieler.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This avoids unnecessary rebuilds of most source files if only the
list of enabled components has changed, but not the other properties
of the build, set in config.h.
Signed-off-by: Martin Storsjö <martin@martin.st>
The nvidia hardware explicitly supports decoding monochrome content,
presumably for the AVIF alpha channel. Supporting this requires an
adjustment in av1dec and explicit monochrome detection in nvdec.
I'm not sure why the monochrome path in av1dec did what it did - it
seems non-functional - YUV440P doesn't seem a logical pix_fmt for
monochrome and conditioning on chroma sub-sampling doesn't make sense.
So I changed it.
I've tested 8bit content, but I haven't found a way to create a 10bit
sample, so that path is untested for now.
With the introduction of HEVC 444 support, we technically have two
codecs that can handle 444 - HEVC and MJPEG. In the case of MJPEG,
it can decode, but can only output one of the semi-planar formats.
That means we need additional logic to decide whether to use a
444 output format or not.
The latest generation video decoder on the Turing chips supports
decoding HEVC 4:4:4. Supporting this is relatively straight-forward;
we need to account for the different chroma format and pick the
right output and sw formats at the right times.
There was one bug which was the hard-coded assumption that the
first chroma plane would be half-height; I fixed this to use the
actual shift value on the plane.
We also need to pass the SPS and PPS range extension flags.
We have a pattern of wrapping CUDA calls to print errors and
normalise return values that is used in a couple of places. To
avoid duplication and increase consistency, let's put the wrapper
implementation in a shared place and use it everywhere.
Affects:
* avcodec/cuviddec
* avcodec/nvdec
* avcodec/nvenc
* avfilter/vf_scale_cuda
* avfilter/vf_scale_npp
* avfilter/vf_thumbnail_cuda
* avfilter/vf_transpose_npp
* avfilter/vf_yadif_cuda
Replaces the data pointers with the mapped cuvid ones.
Adds buffer_refs to the frame to ensure the needed contexts stay alive
and the cuvid idx stays allocated.
Adds another buffer_ref to unmap the frame when it's unreferenced itself.
Copied the check from cuviddec.c (*_cuvid decoders) to allow the
capability check to be optional for older drivers.
Signed-off-by: Jacob Trimble <modmaker@google.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
nvdec will not produce odd width/height output, and while this is
basically never an issue with most codecs, due to internal alignment
requirements, you can get odd sized jpegs.
If an odd-sized jpeg is encountered, nvdec will actually round down
internally and produce output that is slightly smaller. This isn't
the end of the world, as long as you know the output size doesn't
match the original image resolution.
However, with an hwaccel, we don't know. The decoder controls
the reported output size and the hwaccel cannot change it. I was
able to trigger an error in mpv where it tries to copy the output
surface as part of rendering and triggers a cuda error because
cuda knows the output frame is smaller than expected.
To fix this, we can round up the configured width/height passed
to nvdec so that the frames are always at least as large as the
decoder's reported size, and data can be copied out safely.
In this particular jpeg case, you end up with a blank (green) line
at the bottom due to nvdec refusing to decode the last line, but
the behaviour matches cuviddec, so it's as good as you're going to
get.
This was predictably nightmarish, given how ridiculous mpeg4 is.
I had to stare at the cuvid parser output for a long time to work
out what each field was supposed to be, and even then, I still don't
fully understand some of them. Particularly:
vop_coded: If I'm reading the decoder correctly, this flag will always
be 1 as the decoder will not pass the hwaccel any frame
where it is not 1.
divx_flags: There's obviously no documentation on what the possible
flags are. I simply observed that this is '0' for a
normal bitstream and '5' for packed b-frames.
gmc_enabled: I had a number of guesses as to what this mapped to.
I picked the condition I did based on when the cuvid
parser was setting flag.
Also note that as with the vdpau hwaccel, the decoder needs to
consume the entire frame and not the slice.
The 'simple' hwaccels (not h.264 and hevc) all use the same bitstream
management and reference lookup logic so let's refactor all that into
common functions.
I verified that casting a signed int -1 to unsigned char produces 255
according to the C language specification.
This is mostly straight-forward. The weird part is that it should
just work for mpeg1, but I see corruption in my test cases, so I'm
going to try and fix that separately.