Move the slice offsets buffer to the thread decode context.
It isn't part of the resources for frame decoding, the driver
has to process and finish with it at submission time.
That way, it doesn't need to be alloc'd + freed on every frame.
This requires using the new AVHWFramesContext.opaque field, as
otherwise, the profile attached to the decoder will be freed
before the frames context, rendering the frames context useless.
Determined experimentally, on various videos and hardware.
On Intel, using less resources in-flight is around 15% faster,
with similar results on Nvidia hardware.
According to Dave Airlie:
> <airlied> but I think ignoring it should be fine, I can't see any
> other way to get the imaeg extents correct for other usage
> <Lynne> what width/height should be used for the images?
> the final presentable dimensions, or the coded dimensions?
> <airlied> if you don't want noise I think the presentable dims
> <airlied> the driver should round up the allocations internally,
> but if you are going to sample from the images then w/h have to be
> the bounds of the image you want
> <airlied> since otherwise there's no way to stop the sampler from
> going outside the edges
Apparently, the alignment values are informative, rather than mandatory,
but the spec's wording makes it sound as if they're mandatory.