While this function can easily be written with vectors, it just fails to
get any performance improvement.
For reference, this is a simpler loop-free implementation that does get
better performance than the current one depending on hardware, but still
more or less the same metrics as the C code:
func ff_sbr_neg_odd_64_rvv, zve64x
li a1, 32
addi a0, a0, 7
li t0, 8
vsetvli zero, a1, e8, m2, ta, ma
li t1, 0x80
vlse8.v v8, (a0), t0
vxor.vx v8, v8, t1
vsse8.v v8, (a0), t0
ret
endfunc
This reverts commit d06fd18f8f.
Notes:
- The loop is biased toward no unescaped bytes as that should be most common.
- The input byte array is slid rather than the (8 times smaller) bit-mask,
as RISC-V V does not provide a bit-mask (or bit-wise) slide instruction.
- There are two comparisons with 0 per iteration, for the same reason.
- In case of match, bytes are copied until the first match, and the loop is
restarted after the escape byte. Vector compression (vcompress.vm) could
discard all escape bytes but that is slower if escape bytes are rare.
Further optimisations should be possible, e.g.:
- processing 2 bytes fewer per iteration to get rid of a 2 slides,
- taking a short cut if the input vector contains less than 2 zeroes.
But this is a good starting point:
T-Head C908:
vc1dsp.vc1_unescape_buffer_c: 12749.5
vc1dsp.vc1_unescape_buffer_rvv_i32: 6009.0
SpacemiT X60:
vc1dsp.vc1_unescape_buffer_c: 11038.0
vc1dsp.vc1_unescape_buffer_rvv_i32: 2061.0
For RPR, the current frame may reference a frame with a different resolution.
Therefore, we need to consider frame scaling when we wait for reference pixels.
Most users of ff_adts_header_parse() don't already have
an opened GetBitContext for the header, so add a convenience
function for them.
Also use a forward declaration of GetBitContext in adts_header.h
as this avoids (implicit) inclusion of get_bits.h in some of
the users that now no longer use a GetBitContext of their own.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also remove the (unused) AAC_AC3_PARSE_ERROR_CHANNEL_CFG while at it;
furthermore, fix the documentation of ff_ac3_parse_header()
and (ff|avpriv)_adts_header_parse().
Reviewed-by: Andrew Sayers <ffmpeg-devel@pileofstuff.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The documentation of av_adts_header_parse() does not require
the buffer to be padded at all.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When the external allocator is used for dynamic frame allocation, only
video memory is supported, the SDK doesn't lock/unlock the memory block
via Lock()/Unlock() calls.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
This could cause timeouts
Fixes: CID1439568 Untrusted loop bound
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
I dont think this can actually overflow but 64bit seems reasonable to use
Fixes: CID1521983 Unintentional integer overflow
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: CID1516090 Unchecked return value
Sponsored-by: Sovereign Tech Fund
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: CID1560042 Unchecked return value
Sponsored-by: Sovereign Tech Fund
Reviewed-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: CID1461482 Improper use of negative value
Sponsored-by: Sovereign Tech Fund
Reviewed-.by: "Xiang, Haihao" <haihao.xiang@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: CID1452425 Logically dead code
Sponsored-by: Sovereign Tech Fund
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: CID1507483 Unchecked return value
Sponsored-by: Sovereign Tech Fund
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This is pretty much the same as for lpc16, though it only improves half
as large prediction orders. With 128-bit vectors, this gives:
C V old V new
1 69.2 181.5 95.5
2 107.7 180.7 95.2
3 145.5 180.0 103.5
4 183.0 179.2 102.7
5 220.7 178.5 128.0
6 257.7 194.0 127.5
7 294.5 193.7 126.7
8 331.0 193.0 126.5
Larger prediction orders see no significant changes at that size.
This calculates the optimal vector type value at run-time based on the
hardware vector length and the FLAC LPC prediction order. In this
particular case, the additional computation is easily amortised over
the loop iterations:
T-Head C908:
C V before V after
1 48.0 214.7 95.2
2 64.7 214.2 94.7
3 79.7 213.5 94.5
4 96.2 196.5 94.2 #
5 111.0 195.7 118.5
6 127.0 211.2 102.0
7 143.7 194.2 101.5
8 175.7 193.2 101.2 #
9 176.2 224.2 126.0
10 191.5 192.0 125.5
11 224.5 191.2 124.7
12 223.0 190.2 124.2
13 239.2 189.5 123.7
14 253.7 188.7 139.5
15 286.2 188.0 122.7
16 284.0 187.0 122.5 #
17 300.2 186.5 186.5
18 314.0 185.5 185.7
19 329.7 184.7 185.0
20 343.0 184.2 184.2
21 358.7 199.2 183.7
22 371.7 182.7 182.7
23 387.5 181.7 182.0
24 400.7 181.0 181.2
25 431.5 180.2 196.5
26 443.7 195.5 196.0
27 459.0 178.7 196.2
28 470.7 177.7 194.2
29 470.0 177.0 193.5
30 481.2 176.2 176.5
31 496.2 175.5 175.7
32 507.2 174.7 191.0 #
# Power of two boundary.
With 128-bit vectors, improvements are expected for the first two
test cases only. For the other two, there is overhead but below noise.
Improvements should be better observable with prediction order of 8
and less, or on hardware with larger vector sizes.
This decoder does not do anything fancy any more since
c6303f8d70 (before that,
it overwrote the frame's linesize) so that it supports
direct rendering. This effectively reverts
d3de3a16d1.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This commit is the analog of 3f11eac757
for decoding: It sets the AV_FRAME_FLAG_KEY and (for video decoders)
also pict_type to AV_PICTURE_TYPE_I. It furthermore stops setting
audio frames as always being key frames -- it is wrong for e.g.
TrueHD/MLP. The latter also affects TAK and DFPWM.
The change already improves output for several decoders where
it has been forgotten to set e.g. pict_type like speedhq, wnv1
or tiff. The latter is the reason for the change to the exif-image-tiff
FATE test reference file.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>