Fixes out of array read
Fixes: 0a7ff0c1d93da9cef28a315ec91b692a/asan_heap-oob_4a52e5_3604_9c56dbb20e308f4faeef7b35f688521a.ape
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This patch does 4 things, all of which interact and thus it
woudln't be possible to commit them separately without causing
either quality regressions or assertion failures.
Fate comparison targets don't all reflect improvements in
quality, yet listening tests show substantially improved quality
and stability.
1. Increase SF range utilization.
The spec requires SF delta values to be constrained within the
range -60..60. The previous code was applying that range to
the whole SF array and not only the deltas of consecutive values,
because doing so requires smarter code: zeroing or otherwise
skipping a band may invalidate lots of SF choices.
This patch implements that logic to allow the coders to utilize
the full dynamic range of scalefactors, increasing quality quite
considerably, and fixing delta-SF-related assertion failures,
since now the limitation is enforced rather than asserted.
2. PNS tweaks
The previous modification makes big improvements in twoloop's
efficiency, and every time that happens PNS logic needs to be
tweaked accordingly to avoid it from stepping all over twoloop's
decisions. This patch includes modifications of the sort.
3. Account for lowpass cutoff during PSY analysis
The closer PSY's allocation is to final allocation the better
the quality is, and given these modifications, twoloop is now
very efficient at avoiding holes. Thus, to compute accurate
thresholds, PSY needs to account for the lowpass applied
implicitly during twoloop (by zeroing high bands).
This patch makes twoloop set the cutoff in psymodel's context
the first time it runs, and makes PSY account for it during
threshold computation, making PE and threshold computations
closer to the final allocation and thus achieving better
subjective quality.
4. Tweaks to RC lambda tracking loop in relation to PNS
Without this tweak some corner cases cause quality regressions.
Basically, lambda needs to react faster to overall bitrate
efficiency changes since now PNS can be quite successful in
enforcing maximum bitrates, when PSY allocates too many bits
to the lower bands, suppressing the signals RC logic uses to
lower lambda in those cases and causing aggressive PNS.
This tweak makes PNS much less aggressive, though it can still
use some further tweaks.
Also update MIPS specializations and adjust fuzz
Also in lavc/mips/aacpsy_mips.h: remove trailing whitespace
On systems having cbrt, there is no reason to use the slow pow function.
Sample benchmark (x86-64, Haswell, GNU/Linux):
new:
5124920 decicycles in cbrt_tableinit, 1 runs, 0 skips
old:
12321680 decicycles in cbrt_tableinit, 1 runs, 0 skips
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This further speeds up runtime initialization, with identical generated tables.
Sample benchmark (x86-64, Haswell, GNU/Linux):
old:
34441423 decicycles in mpegaudio_tableinit, 8192 runs, 0 skips
new:
10776291 decicycles in mpegaudio_tableinit, 8192 runs, 0 skips
Most low hanging fruit is taken care of here. For some idea, note that
83,064 array elements totalling 233,722 bytes need to be initialized.
Thus, with this patch, we average ~ 12.9 cycles per element or ~ 4.6
cycles per byte.
Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This does some miscellaneous stuff mainly avoiding the usage of pow to
achieve significant speedups. This is not speed critical, but is
unnecessary latency and cycles wasted for a user.
All tables tested and are identical to the old ones
(bit-exact even in floating point case).
Sample benchmark (x86-64, Haswell, GNU/Linux):
old:
102329530 decicycles in mpegaudio_tableinit, 1 runs, 0 skips
new:
34111900 decicycles in mpegaudio_tableinit, 1 runs, 0 skips
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Whoever wrote this stuff had a pretty bad libm - digits differ pretty
quickly.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
The table in question is a 253 byte one. In fact, it turns out that
dynamic generation of the table results in an increased binary size.
Code compiled with GCC 5.2.0, x86-64 (size in bytes), before and after
patch:
old: 62321064 libavcodec/libavcodec.so.57
new: 62320536 libavcodec/libavcodec.so.57
Thus, it always make sense to statically allocate this.
Tested with FATE with/without --enable-hardcoded-tables.
Reviewed-by: wm4 <nfxjfg@googlemail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Will Kelleher <wkelleher@gogoair.com>
Previous version reviewed-by: Ivan Uskov <ivan.uskov@nablet.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array read
Fixes: 76c515fc3779d1b838667c61ea13ce92/asan_heap-oob_1fc0d07_8913_794a4629a264ebdb25b58d3a94ed1785.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The DC VLC table used is too small, fixing this requires a sample,
thus request a sample.
Some samples are said to work even though the table has the wrong size, thus
this is left enabled if the user enables experimental features.
Fixes: 2abd25478c62a675f335fac00b467023/asan_static-oob_10aff98_1227_8811480c6ef1e970a7977ceb7e5a9958.mxf
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Approved-by: kurosu
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
As noted in a comment, pe.min in the reference encoder
is centered around current pe. The bit reservoir algo
needs pe.min to be a local minimum, because it can only
account for local PE variations. If it's set to a global
minimum as was being done, bit reservoir logic doesn't
work as efficiently.
This patch tries to forget old minimums and converge to
a local minimum without losing the stability of the
previous solution. Listening tests until now suggest this
solves numerous RC issues.
Fixes out of array read
Fixes: 59bb925e90201fa0f87f0a31945d43b5/asan_heap-oob_4a52e5_3388_66027f11e3d072f1e02401ecc6193361.jvt
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes out of array access
Fixes: 482d8f2fd17c9f532b586458a33f267c/asan_heap-oob_4a52b6_7417_1d08d477736d66cdadd833d146bb8bae.mov
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes out of array read
Fixes: 2f95ddd996db8a6281d2e18c184595a7/asan_heap-oob_192fe91_3330_58e4441181e30a66c19f743dcb392347.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes out of array access
Fixes: 08664a2a7921ef48172f26495c7455be/asan_heap-oob_23036c6_3301_523388ef84285a0270caf67a43247b59.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
ff_aac_tableinit is a macro in the case of hardcoded tables, so wrap
that up in a function (similar to how the decoder template does it) and
use that as the argument for ff_thread_once().
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Fixes out of array access
Fixes: 01859c9a9ac6cd60a008274123275574/asan_heap-oob_1dff571_8250_50d3d1611e294c3519fd1fa82198b69b.avi
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes out of array read
Fixes: 007c4a36608ebdf27ee260ad60a81184/asan_heap-oob_32076b4_2243_116b1cb29d91cc4974d6680e3d10bd91.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
AAC-Fixed decoder segfaulted. This commit makes the aac encoder
and decoder init the table twice in case of transcoding again.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Since the ff_aac_tableinit() can be called by both the encoder and
the decoder (in case of transcoding) this commit shares the AVOnce
variable to prevent this.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This speeds up aac_tablegen to a ludicruous degree (~97%), i.e to the point
where it can be argued that runtime initialization can always be done instead of
hard-coded tables. The only cost is essentially a trivial increase in
the stack size.
Even if one does not care about this, the patch also improves accuracy
as detailed below.
Performance:
Benchmark obtained by looping 10^4 times over ff_aac_tableinit.
Sample benchmark (x86-64, Haswell, GNU/Linux):
old:
1295292 decicycles in ff_aac_tableinit, 512 runs, 0 skips
1275981 decicycles in ff_aac_tableinit, 1024 runs, 0 skips
1272932 decicycles in ff_aac_tableinit, 2048 runs, 0 skips
1262164 decicycles in ff_aac_tableinit, 4096 runs, 0 skips
1256720 decicycles in ff_aac_tableinit, 8192 runs, 0 skips
new:
21112 decicycles in ff_aac_tableinit, 511 runs, 1 skips
21269 decicycles in ff_aac_tableinit, 1023 runs, 1 skips
21352 decicycles in ff_aac_tableinit, 2043 runs, 5 skips
21386 decicycles in ff_aac_tableinit, 4080 runs, 16 skips
21299 decicycles in ff_aac_tableinit, 8173 runs, 19 skips
Accuracy:
The previous code was resulting in needless loss of
accuracy due to the pow being called in succession. As an illustration
of this:
ff_aac_pow34sf_tab[3]
old : 0.000000000007598092294225
new : 0.000000000007598091426864
real: 0.000000000007598091778545
truncated to float
old : 0.000000000007598092294225
new : 0.000000000007598091426864
real: 0.000000000007598091426864
showing that the old value was not correctly rounded. This affects a
large number of elements of the array.
Patch tested with FATE.
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This hugely reduces the echo which was introduced with the previous
commit (though likely because previously everything was broken).
Makes LTP actually worthwhile now.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>