Commit graph

49295 commits

Author SHA1 Message Date
James Almer
b181868aba x86/h26x/h2656dsp: add missing preprocessor wrappers
Fixes compilation on x86_32 targets.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-02-01 16:04:09 -03:00
James Almer
6b6eb7d74e x86/Makefile: fix hevc and vvc dependency of h2656dsp.o
And remove tabs while at it.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-02-01 16:02:50 -03:00
James Almer
2dc8221e66 x86/hevcdsp_init.c: fix preprocessor check
HAVE_AVX2_EXTERNAL has a value, so check for it.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-02-01 10:47:53 -03:00
James Almer
78a7927df7 x86/vvc/vvc_mc: wrap the entire file in x86_64 and AVX2 checks
Fixes compilation with old yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-02-01 10:23:42 -03:00
James Almer
bf62ddc7bf x86/vvc/vvc_mc: set the correct number of used registers in vvc_w_avg functions
Fixes crashes when running fate-vvc-conformance-WP_A_3 on Win64 targets

Signed-off-by: James Almer <jamrial@gmail.com>
2024-02-01 10:04:14 -03:00
Wu Jianhua
326cc01de9 avcodec/x86/vvc: add avg and avg_w AVX2 optimizations
The avg/avg_w is based on dav1d.
See https://code.videolan.org/videolan/dav1d/-/blob/master/src/x86/mc_avx2.asm

vvc_avg_8_2x2_c: 71.6
vvc_avg_8_2x2_avx2: 26.8
vvc_avg_8_2x4_c: 140.8
vvc_avg_8_2x4_avx2: 34.6
vvc_avg_8_2x8_c: 410.3
vvc_avg_8_2x8_avx2: 41.3
vvc_avg_8_2x16_c: 769.3
vvc_avg_8_2x16_avx2: 60.3
vvc_avg_8_2x32_c: 1669.6
vvc_avg_8_2x32_avx2: 105.1
vvc_avg_8_2x64_c: 1978.3
vvc_avg_8_2x64_avx2: 425.8
vvc_avg_8_2x128_c: 6536.8
vvc_avg_8_2x128_avx2: 1315.1
vvc_avg_8_4x2_c: 155.6
vvc_avg_8_4x2_avx2: 26.1
vvc_avg_8_4x4_c: 250.3
vvc_avg_8_4x4_avx2: 31.3
vvc_avg_8_4x8_c: 831.8
vvc_avg_8_4x8_avx2: 41.3
vvc_avg_8_4x16_c: 1461.1
vvc_avg_8_4x16_avx2: 57.1
vvc_avg_8_4x32_c: 2821.6
vvc_avg_8_4x32_avx2: 105.1
vvc_avg_8_4x64_c: 3615.8
vvc_avg_8_4x64_avx2: 412.6
vvc_avg_8_4x128_c: 11962.6
vvc_avg_8_4x128_avx2: 1274.3
vvc_avg_8_8x2_c: 215.8
vvc_avg_8_8x2_avx2: 29.1
vvc_avg_8_8x4_c: 430.6
vvc_avg_8_8x4_avx2: 37.6
vvc_avg_8_8x8_c: 1463.3
vvc_avg_8_8x8_avx2: 51.8
vvc_avg_8_8x16_c: 2630.1
vvc_avg_8_8x16_avx2: 97.6
vvc_avg_8_8x32_c: 5813.8
vvc_avg_8_8x32_avx2: 196.6
vvc_avg_8_8x64_c: 6687.3
vvc_avg_8_8x64_avx2: 487.8
vvc_avg_8_8x128_c: 13178.6
vvc_avg_8_8x128_avx2: 1290.6
vvc_avg_8_16x2_c: 443.8
vvc_avg_8_16x2_avx2: 28.3
vvc_avg_8_16x4_c: 1253.3
vvc_avg_8_16x4_avx2: 32.1
vvc_avg_8_16x8_c: 2236.3
vvc_avg_8_16x8_avx2: 44.3
vvc_avg_8_16x16_c: 5127.8
vvc_avg_8_16x16_avx2: 63.3
vvc_avg_8_16x32_c: 6573.3
vvc_avg_8_16x32_avx2: 223.6
vvc_avg_8_16x64_c: 30311.8
vvc_avg_8_16x64_avx2: 437.8
vvc_avg_8_16x128_c: 25693.3
vvc_avg_8_16x128_avx2: 1266.8
vvc_avg_8_32x2_c: 954.6
vvc_avg_8_32x2_avx2: 32.1
vvc_avg_8_32x4_c: 2359.6
vvc_avg_8_32x4_avx2: 39.6
vvc_avg_8_32x8_c: 5703.6
vvc_avg_8_32x8_avx2: 57.1
vvc_avg_8_32x16_c: 9967.6
vvc_avg_8_32x16_avx2: 107.1
vvc_avg_8_32x32_c: 21327.6
vvc_avg_8_32x32_avx2: 272.6
vvc_avg_8_32x64_c: 39240.8
vvc_avg_8_32x64_avx2: 529.6
vvc_avg_8_32x128_c: 52580.8
vvc_avg_8_32x128_avx2: 1338.8
vvc_avg_8_64x2_c: 1647.3
vvc_avg_8_64x2_avx2: 38.8
vvc_avg_8_64x4_c: 5130.1
vvc_avg_8_64x4_avx2: 58.8
vvc_avg_8_64x8_c: 6529.3
vvc_avg_8_64x8_avx2: 88.3
vvc_avg_8_64x16_c: 19913.6
vvc_avg_8_64x16_avx2: 162.3
vvc_avg_8_64x32_c: 39360.8
vvc_avg_8_64x32_avx2: 295.8
vvc_avg_8_64x64_c: 49658.3
vvc_avg_8_64x64_avx2: 784.1
vvc_avg_8_64x128_c: 108513.1
vvc_avg_8_64x128_avx2: 1977.1
vvc_avg_8_128x2_c: 3226.1
vvc_avg_8_128x2_avx2: 61.1
vvc_avg_8_128x4_c: 10280.3
vvc_avg_8_128x4_avx2: 94.6
vvc_avg_8_128x8_c: 18079.3
vvc_avg_8_128x8_avx2: 155.3
vvc_avg_8_128x16_c: 45121.8
vvc_avg_8_128x16_avx2: 285.3
vvc_avg_8_128x32_c: 48651.8
vvc_avg_8_128x32_avx2: 581.6
vvc_avg_8_128x64_c: 165078.6
vvc_avg_8_128x64_avx2: 1942.8
vvc_avg_8_128x128_c: 339103.1
vvc_avg_8_128x128_avx2: 4332.6
vvc_avg_10_2x2_c: 144.3
vvc_avg_10_2x2_avx2: 26.8
vvc_avg_10_2x4_c: 142.6
vvc_avg_10_2x4_avx2: 45.3
vvc_avg_10_2x8_c: 478.1
vvc_avg_10_2x8_avx2: 38.1
vvc_avg_10_2x16_c: 518.3
vvc_avg_10_2x16_avx2: 58.1
vvc_avg_10_2x32_c: 2059.8
vvc_avg_10_2x32_avx2: 93.1
vvc_avg_10_2x64_c: 2383.8
vvc_avg_10_2x64_avx2: 714.8
vvc_avg_10_2x128_c: 4498.3
vvc_avg_10_2x128_avx2: 1466.3
vvc_avg_10_4x2_c: 228.6
vvc_avg_10_4x2_avx2: 26.8
vvc_avg_10_4x4_c: 378.3
vvc_avg_10_4x4_avx2: 30.6
vvc_avg_10_4x8_c: 866.8
vvc_avg_10_4x8_avx2: 44.6
vvc_avg_10_4x16_c: 1018.1
vvc_avg_10_4x16_avx2: 58.1
vvc_avg_10_4x32_c: 3590.8
vvc_avg_10_4x32_avx2: 128.8
vvc_avg_10_4x64_c: 4200.8
vvc_avg_10_4x64_avx2: 663.6
vvc_avg_10_4x128_c: 8450.8
vvc_avg_10_4x128_avx2: 1531.8
vvc_avg_10_8x2_c: 369.3
vvc_avg_10_8x2_avx2: 28.3
vvc_avg_10_8x4_c: 513.8
vvc_avg_10_8x4_avx2: 32.1
vvc_avg_10_8x8_c: 1720.3
vvc_avg_10_8x8_avx2: 49.1
vvc_avg_10_8x16_c: 1894.8
vvc_avg_10_8x16_avx2: 71.6
vvc_avg_10_8x32_c: 3931.3
vvc_avg_10_8x32_avx2: 148.1
vvc_avg_10_8x64_c: 7964.3
vvc_avg_10_8x64_avx2: 613.1
vvc_avg_10_8x128_c: 15540.1
vvc_avg_10_8x128_avx2: 1585.1
vvc_avg_10_16x2_c: 877.3
vvc_avg_10_16x2_avx2: 27.6
vvc_avg_10_16x4_c: 955.8
vvc_avg_10_16x4_avx2: 29.8
vvc_avg_10_16x8_c: 3419.6
vvc_avg_10_16x8_avx2: 62.6
vvc_avg_10_16x16_c: 3826.8
vvc_avg_10_16x16_avx2: 54.3
vvc_avg_10_16x32_c: 7655.3
vvc_avg_10_16x32_avx2: 86.3
vvc_avg_10_16x64_c: 30011.1
vvc_avg_10_16x64_avx2: 692.6
vvc_avg_10_16x128_c: 47894.8
vvc_avg_10_16x128_avx2: 1580.3
vvc_avg_10_32x2_c: 944.3
vvc_avg_10_32x2_avx2: 29.8
vvc_avg_10_32x4_c: 2022.6
vvc_avg_10_32x4_avx2: 35.1
vvc_avg_10_32x8_c: 6148.8
vvc_avg_10_32x8_avx2: 51.3
vvc_avg_10_32x16_c: 12601.6
vvc_avg_10_32x16_avx2: 70.8
vvc_avg_10_32x32_c: 15958.6
vvc_avg_10_32x32_avx2: 124.3
vvc_avg_10_32x64_c: 31784.6
vvc_avg_10_32x64_avx2: 757.3
vvc_avg_10_32x128_c: 63892.8
vvc_avg_10_32x128_avx2: 1711.3
vvc_avg_10_64x2_c: 1890.8
vvc_avg_10_64x2_avx2: 34.3
vvc_avg_10_64x4_c: 6267.3
vvc_avg_10_64x4_avx2: 42.6
vvc_avg_10_64x8_c: 12778.1
vvc_avg_10_64x8_avx2: 67.8
vvc_avg_10_64x16_c: 22304.3
vvc_avg_10_64x16_avx2: 116.8
vvc_avg_10_64x32_c: 30777.1
vvc_avg_10_64x32_avx2: 201.1
vvc_avg_10_64x64_c: 60169.1
vvc_avg_10_64x64_avx2: 1454.3
vvc_avg_10_64x128_c: 124392.8
vvc_avg_10_64x128_avx2: 3648.6
vvc_avg_10_128x2_c: 3650.1
vvc_avg_10_128x2_avx2: 41.1
vvc_avg_10_128x4_c: 22887.8
vvc_avg_10_128x4_avx2: 64.1
vvc_avg_10_128x8_c: 14622.6
vvc_avg_10_128x8_avx2: 111.6
vvc_avg_10_128x16_c: 62207.6
vvc_avg_10_128x16_avx2: 186.3
vvc_avg_10_128x32_c: 59761.3
vvc_avg_10_128x32_avx2: 374.6
vvc_avg_10_128x64_c: 117504.3
vvc_avg_10_128x64_avx2: 2684.6
vvc_avg_10_128x128_c: 236767.6
vvc_avg_10_128x128_avx2: 15278.1
vvc_avg_12_2x2_c: 78.6
vvc_avg_12_2x2_avx2: 26.1
vvc_avg_12_2x4_c: 254.1
vvc_avg_12_2x4_avx2: 30.6
vvc_avg_12_2x8_c: 261.8
vvc_avg_12_2x8_avx2: 39.1
vvc_avg_12_2x16_c: 527.6
vvc_avg_12_2x16_avx2: 57.3
vvc_avg_12_2x32_c: 1089.1
vvc_avg_12_2x32_avx2: 93.8
vvc_avg_12_2x64_c: 2337.6
vvc_avg_12_2x64_avx2: 707.1
vvc_avg_12_2x128_c: 4582.1
vvc_avg_12_2x128_avx2: 1414.6
vvc_avg_12_4x2_c: 129.6
vvc_avg_12_4x2_avx2: 26.8
vvc_avg_12_4x4_c: 427.3
vvc_avg_12_4x4_avx2: 30.6
vvc_avg_12_4x8_c: 529.6
vvc_avg_12_4x8_avx2: 36.6
vvc_avg_12_4x16_c: 1022.1
vvc_avg_12_4x16_avx2: 57.3
vvc_avg_12_4x32_c: 1987.6
vvc_avg_12_4x32_avx2: 84.3
vvc_avg_12_4x64_c: 4147.6
vvc_avg_12_4x64_avx2: 706.3
vvc_avg_12_4x128_c: 8469.3
vvc_avg_12_4x128_avx2: 1448.3
vvc_avg_12_8x2_c: 253.6
vvc_avg_12_8x2_avx2: 27.6
vvc_avg_12_8x4_c: 836.3
vvc_avg_12_8x4_avx2: 32.1
vvc_avg_12_8x8_c: 1074.6
vvc_avg_12_8x8_avx2: 45.1
vvc_avg_12_8x16_c: 3616.8
vvc_avg_12_8x16_avx2: 71.6
vvc_avg_12_8x32_c: 3823.6
vvc_avg_12_8x32_avx2: 140.1
vvc_avg_12_8x64_c: 7764.8
vvc_avg_12_8x64_avx2: 656.1
vvc_avg_12_8x128_c: 15896.1
vvc_avg_12_8x128_avx2: 1232.8
vvc_avg_12_16x2_c: 462.1
vvc_avg_12_16x2_avx2: 26.8
vvc_avg_12_16x4_c: 1732.1
vvc_avg_12_16x4_avx2: 29.1
vvc_avg_12_16x8_c: 2097.6
vvc_avg_12_16x8_avx2: 62.6
vvc_avg_12_16x16_c: 6753.1
vvc_avg_12_16x16_avx2: 47.8
vvc_avg_12_16x32_c: 7373.1
vvc_avg_12_16x32_avx2: 80.8
vvc_avg_12_16x64_c: 15046.3
vvc_avg_12_16x64_avx2: 621.1
vvc_avg_12_16x128_c: 52574.6
vvc_avg_12_16x128_avx2: 1417.1
vvc_avg_12_32x2_c: 1712.1
vvc_avg_12_32x2_avx2: 29.8
vvc_avg_12_32x4_c: 2036.8
vvc_avg_12_32x4_avx2: 37.6
vvc_avg_12_32x8_c: 4017.6
vvc_avg_12_32x8_avx2: 44.1
vvc_avg_12_32x16_c: 8018.6
vvc_avg_12_32x16_avx2: 70.8
vvc_avg_12_32x32_c: 15637.6
vvc_avg_12_32x32_avx2: 124.3
vvc_avg_12_32x64_c: 31143.3
vvc_avg_12_32x64_avx2: 830.3
vvc_avg_12_32x128_c: 75706.8
vvc_avg_12_32x128_avx2: 1604.8
vvc_avg_12_64x2_c: 3230.3
vvc_avg_12_64x2_avx2: 33.6
vvc_avg_12_64x4_c: 4139.6
vvc_avg_12_64x4_avx2: 45.1
vvc_avg_12_64x8_c: 8201.6
vvc_avg_12_64x8_avx2: 67.1
vvc_avg_12_64x16_c: 25632.3
vvc_avg_12_64x16_avx2: 110.3
vvc_avg_12_64x32_c: 30744.3
vvc_avg_12_64x32_avx2: 200.3
vvc_avg_12_64x64_c: 105554.8
vvc_avg_12_64x64_avx2: 1325.6
vvc_avg_12_64x128_c: 235254.3
vvc_avg_12_64x128_avx2: 3132.6
vvc_avg_12_128x2_c: 6194.3
vvc_avg_12_128x2_avx2: 55.1
vvc_avg_12_128x4_c: 7583.8
vvc_avg_12_128x4_avx2: 79.3
vvc_avg_12_128x8_c: 14635.6
vvc_avg_12_128x8_avx2: 104.3
vvc_avg_12_128x16_c: 29270.8
vvc_avg_12_128x16_avx2: 194.3
vvc_avg_12_128x32_c: 60113.6
vvc_avg_12_128x32_avx2: 346.3
vvc_avg_12_128x64_c: 197030.3
vvc_avg_12_128x64_avx2: 2779.6
vvc_avg_12_128x128_c: 432809.6
vvc_avg_12_128x128_avx2: 5513.3
vvc_w_avg_8_2x2_c: 84.3
vvc_w_avg_8_2x2_avx2: 42.6
vvc_w_avg_8_2x4_c: 156.3
vvc_w_avg_8_2x4_avx2: 58.8
vvc_w_avg_8_2x8_c: 310.6
vvc_w_avg_8_2x8_avx2: 73.1
vvc_w_avg_8_2x16_c: 942.1
vvc_w_avg_8_2x16_avx2: 113.3
vvc_w_avg_8_2x32_c: 1098.8
vvc_w_avg_8_2x32_avx2: 202.6
vvc_w_avg_8_2x64_c: 2414.3
vvc_w_avg_8_2x64_avx2: 467.6
vvc_w_avg_8_2x128_c: 4763.8
vvc_w_avg_8_2x128_avx2: 1333.1
vvc_w_avg_8_4x2_c: 140.1
vvc_w_avg_8_4x2_avx2: 49.8
vvc_w_avg_8_4x4_c: 276.3
vvc_w_avg_8_4x4_avx2: 58.1
vvc_w_avg_8_4x8_c: 524.3
vvc_w_avg_8_4x8_avx2: 72.3
vvc_w_avg_8_4x16_c: 1108.1
vvc_w_avg_8_4x16_avx2: 111.8
vvc_w_avg_8_4x32_c: 2149.8
vvc_w_avg_8_4x32_avx2: 199.6
vvc_w_avg_8_4x64_c: 12288.1
vvc_w_avg_8_4x64_avx2: 509.3
vvc_w_avg_8_4x128_c: 8398.6
vvc_w_avg_8_4x128_avx2: 1319.6
vvc_w_avg_8_8x2_c: 271.1
vvc_w_avg_8_8x2_avx2: 44.1
vvc_w_avg_8_8x4_c: 503.3
vvc_w_avg_8_8x4_avx2: 61.8
vvc_w_avg_8_8x8_c: 1031.1
vvc_w_avg_8_8x8_avx2: 93.8
vvc_w_avg_8_8x16_c: 2009.8
vvc_w_avg_8_8x16_avx2: 163.1
vvc_w_avg_8_8x32_c: 4161.3
vvc_w_avg_8_8x32_avx2: 292.1
vvc_w_avg_8_8x64_c: 7940.6
vvc_w_avg_8_8x64_avx2: 592.1
vvc_w_avg_8_8x128_c: 16802.3
vvc_w_avg_8_8x128_avx2: 1287.6
vvc_w_avg_8_16x2_c: 762.6
vvc_w_avg_8_16x2_avx2: 53.6
vvc_w_avg_8_16x4_c: 1486.3
vvc_w_avg_8_16x4_avx2: 67.1
vvc_w_avg_8_16x8_c: 1907.8
vvc_w_avg_8_16x8_avx2: 96.8
vvc_w_avg_8_16x16_c: 3883.6
vvc_w_avg_8_16x16_avx2: 151.3
vvc_w_avg_8_16x32_c: 7974.8
vvc_w_avg_8_16x32_avx2: 285.8
vvc_w_avg_8_16x64_c: 25160.6
vvc_w_avg_8_16x64_avx2: 589.8
vvc_w_avg_8_16x128_c: 58328.1
vvc_w_avg_8_16x128_avx2: 1169.8
vvc_w_avg_8_32x2_c: 1009.1
vvc_w_avg_8_32x2_avx2: 65.6
vvc_w_avg_8_32x4_c: 2091.1
vvc_w_avg_8_32x4_avx2: 96.8
vvc_w_avg_8_32x8_c: 3997.8
vvc_w_avg_8_32x8_avx2: 156.3
vvc_w_avg_8_32x16_c: 8216.8
vvc_w_avg_8_32x16_avx2: 269.6
vvc_w_avg_8_32x32_c: 21746.1
vvc_w_avg_8_32x32_avx2: 635.3
vvc_w_avg_8_32x64_c: 31564.8
vvc_w_avg_8_32x64_avx2: 1010.6
vvc_w_avg_8_32x128_c: 114373.3
vvc_w_avg_8_32x128_avx2: 2013.6
vvc_w_avg_8_64x2_c: 2067.3
vvc_w_avg_8_64x2_avx2: 97.6
vvc_w_avg_8_64x4_c: 3901.1
vvc_w_avg_8_64x4_avx2: 154.8
vvc_w_avg_8_64x8_c: 7911.6
vvc_w_avg_8_64x8_avx2: 268.8
vvc_w_avg_8_64x16_c: 16508.8
vvc_w_avg_8_64x16_avx2: 501.8
vvc_w_avg_8_64x32_c: 38770.3
vvc_w_avg_8_64x32_avx2: 1287.6
vvc_w_avg_8_64x64_c: 110350.6
vvc_w_avg_8_64x64_avx2: 1890.8
vvc_w_avg_8_64x128_c: 141354.6
vvc_w_avg_8_64x128_avx2: 3839.6
vvc_w_avg_8_128x2_c: 7012.1
vvc_w_avg_8_128x2_avx2: 159.3
vvc_w_avg_8_128x4_c: 8146.8
vvc_w_avg_8_128x4_avx2: 272.6
vvc_w_avg_8_128x8_c: 24596.8
vvc_w_avg_8_128x8_avx2: 501.1
vvc_w_avg_8_128x16_c: 35918.1
vvc_w_avg_8_128x16_avx2: 948.8
vvc_w_avg_8_128x32_c: 68799.6
vvc_w_avg_8_128x32_avx2: 1963.1
vvc_w_avg_8_128x64_c: 133862.1
vvc_w_avg_8_128x64_avx2: 3833.6
vvc_w_avg_8_128x128_c: 348427.8
vvc_w_avg_8_128x128_avx2: 7682.8
vvc_w_avg_10_2x2_c: 118.6
vvc_w_avg_10_2x2_avx2: 73.1
vvc_w_avg_10_2x4_c: 189.1
vvc_w_avg_10_2x4_avx2: 89.3
vvc_w_avg_10_2x8_c: 382.8
vvc_w_avg_10_2x8_avx2: 179.8
vvc_w_avg_10_2x16_c: 658.3
vvc_w_avg_10_2x16_avx2: 185.1
vvc_w_avg_10_2x32_c: 1409.3
vvc_w_avg_10_2x32_avx2: 290.8
vvc_w_avg_10_2x64_c: 2906.8
vvc_w_avg_10_2x64_avx2: 793.1
vvc_w_avg_10_2x128_c: 6292.6
vvc_w_avg_10_2x128_avx2: 1696.8
vvc_w_avg_10_4x2_c: 178.8
vvc_w_avg_10_4x2_avx2: 80.1
vvc_w_avg_10_4x4_c: 581.6
vvc_w_avg_10_4x4_avx2: 97.6
vvc_w_avg_10_4x8_c: 693.3
vvc_w_avg_10_4x8_avx2: 128.1
vvc_w_avg_10_4x16_c: 1436.6
vvc_w_avg_10_4x16_avx2: 179.8
vvc_w_avg_10_4x32_c: 2409.1
vvc_w_avg_10_4x32_avx2: 292.3
vvc_w_avg_10_4x64_c: 4925.3
vvc_w_avg_10_4x64_avx2: 746.1
vvc_w_avg_10_4x128_c: 10664.6
vvc_w_avg_10_4x128_avx2: 1647.6
vvc_w_avg_10_8x2_c: 359.3
vvc_w_avg_10_8x2_avx2: 80.1
vvc_w_avg_10_8x4_c: 925.6
vvc_w_avg_10_8x4_avx2: 97.6
vvc_w_avg_10_8x8_c: 1360.6
vvc_w_avg_10_8x8_avx2: 121.8
vvc_w_avg_10_8x16_c: 3490.3
vvc_w_avg_10_8x16_avx2: 203.3
vvc_w_avg_10_8x32_c: 5266.1
vvc_w_avg_10_8x32_avx2: 325.8
vvc_w_avg_10_8x64_c: 11127.1
vvc_w_avg_10_8x64_avx2: 747.8
vvc_w_avg_10_8x128_c: 31058.3
vvc_w_avg_10_8x128_avx2: 1424.6
vvc_w_avg_10_16x2_c: 624.8
vvc_w_avg_10_16x2_avx2: 84.6
vvc_w_avg_10_16x4_c: 1389.6
vvc_w_avg_10_16x4_avx2: 109.1
vvc_w_avg_10_16x8_c: 2688.3
vvc_w_avg_10_16x8_avx2: 137.1
vvc_w_avg_10_16x16_c: 5387.1
vvc_w_avg_10_16x16_avx2: 224.6
vvc_w_avg_10_16x32_c: 10776.3
vvc_w_avg_10_16x32_avx2: 312.1
vvc_w_avg_10_16x64_c: 18069.1
vvc_w_avg_10_16x64_avx2: 858.6
vvc_w_avg_10_16x128_c: 43460.3
vvc_w_avg_10_16x128_avx2: 1411.6
vvc_w_avg_10_32x2_c: 1232.8
vvc_w_avg_10_32x2_avx2: 99.1
vvc_w_avg_10_32x4_c: 4017.6
vvc_w_avg_10_32x4_avx2: 134.1
vvc_w_avg_10_32x8_c: 9306.3
vvc_w_avg_10_32x8_avx2: 208.1
vvc_w_avg_10_32x16_c: 8424.6
vvc_w_avg_10_32x16_avx2: 349.3
vvc_w_avg_10_32x32_c: 20787.8
vvc_w_avg_10_32x32_avx2: 655.3
vvc_w_avg_10_32x64_c: 40972.1
vvc_w_avg_10_32x64_avx2: 904.8
vvc_w_avg_10_32x128_c: 85670.3
vvc_w_avg_10_32x128_avx2: 1751.6
vvc_w_avg_10_64x2_c: 2454.1
vvc_w_avg_10_64x2_avx2: 132.6
vvc_w_avg_10_64x4_c: 5012.6
vvc_w_avg_10_64x4_avx2: 215.6
vvc_w_avg_10_64x8_c: 10811.3
vvc_w_avg_10_64x8_avx2: 361.1
vvc_w_avg_10_64x16_c: 33349.1
vvc_w_avg_10_64x16_avx2: 904.1
vvc_w_avg_10_64x32_c: 41892.3
vvc_w_avg_10_64x32_avx2: 1220.6
vvc_w_avg_10_64x64_c: 66983.3
vvc_w_avg_10_64x64_avx2: 2622.1
vvc_w_avg_10_64x128_c: 246508.8
vvc_w_avg_10_64x128_avx2: 3316.8
vvc_w_avg_10_128x2_c: 7791.6
vvc_w_avg_10_128x2_avx2: 198.8
vvc_w_avg_10_128x4_c: 10534.3
vvc_w_avg_10_128x4_avx2: 337.3
vvc_w_avg_10_128x8_c: 21142.3
vvc_w_avg_10_128x8_avx2: 614.8
vvc_w_avg_10_128x16_c: 40968.6
vvc_w_avg_10_128x16_avx2: 1160.6
vvc_w_avg_10_128x32_c: 113043.3
vvc_w_avg_10_128x32_avx2: 1644.6
vvc_w_avg_10_128x64_c: 230658.3
vvc_w_avg_10_128x64_avx2: 5065.3
vvc_w_avg_10_128x128_c: 335236.3
vvc_w_avg_10_128x128_avx2: 6450.3
vvc_w_avg_12_2x2_c: 185.3
vvc_w_avg_12_2x2_avx2: 43.6
vvc_w_avg_12_2x4_c: 340.3
vvc_w_avg_12_2x4_avx2: 55.8
vvc_w_avg_12_2x8_c: 632.3
vvc_w_avg_12_2x8_avx2: 70.1
vvc_w_avg_12_2x16_c: 728.3
vvc_w_avg_12_2x16_avx2: 108.1
vvc_w_avg_12_2x32_c: 1392.6
vvc_w_avg_12_2x32_avx2: 176.8
vvc_w_avg_12_2x64_c: 2618.3
vvc_w_avg_12_2x64_avx2: 757.3
vvc_w_avg_12_2x128_c: 6408.8
vvc_w_avg_12_2x128_avx2: 1435.1
vvc_w_avg_12_4x2_c: 349.3
vvc_w_avg_12_4x2_avx2: 44.3
vvc_w_avg_12_4x4_c: 607.1
vvc_w_avg_12_4x4_avx2: 52.6
vvc_w_avg_12_4x8_c: 1134.8
vvc_w_avg_12_4x8_avx2: 70.1
vvc_w_avg_12_4x16_c: 1378.1
vvc_w_avg_12_4x16_avx2: 115.3
vvc_w_avg_12_4x32_c: 2599.3
vvc_w_avg_12_4x32_avx2: 174.3
vvc_w_avg_12_4x64_c: 4474.8
vvc_w_avg_12_4x64_avx2: 656.1
vvc_w_avg_12_4x128_c: 11319.6
vvc_w_avg_12_4x128_avx2: 1373.1
vvc_w_avg_12_8x2_c: 595.8
vvc_w_avg_12_8x2_avx2: 44.3
vvc_w_avg_12_8x4_c: 1164.3
vvc_w_avg_12_8x4_avx2: 56.6
vvc_w_avg_12_8x8_c: 2019.6
vvc_w_avg_12_8x8_avx2: 80.1
vvc_w_avg_12_8x16_c: 4071.6
vvc_w_avg_12_8x16_avx2: 139.3
vvc_w_avg_12_8x32_c: 4485.1
vvc_w_avg_12_8x32_avx2: 250.6
vvc_w_avg_12_8x64_c: 8404.8
vvc_w_avg_12_8x64_avx2: 735.8
vvc_w_avg_12_8x128_c: 35679.8
vvc_w_avg_12_8x128_avx2: 1252.6
vvc_w_avg_12_16x2_c: 1114.8
vvc_w_avg_12_16x2_avx2: 46.6
vvc_w_avg_12_16x4_c: 2240.1
vvc_w_avg_12_16x4_avx2: 62.6
vvc_w_avg_12_16x8_c: 13174.6
vvc_w_avg_12_16x8_avx2: 88.6
vvc_w_avg_12_16x16_c: 5334.6
vvc_w_avg_12_16x16_avx2: 144.3
vvc_w_avg_12_16x32_c: 8378.1
vvc_w_avg_12_16x32_avx2: 234.6
vvc_w_avg_12_16x64_c: 21300.8
vvc_w_avg_12_16x64_avx2: 761.8
vvc_w_avg_12_16x128_c: 32786.8
vvc_w_avg_12_16x128_avx2: 1432.8
vvc_w_avg_12_32x2_c: 2154.3
vvc_w_avg_12_32x2_avx2: 61.1
vvc_w_avg_12_32x4_c: 4299.8
vvc_w_avg_12_32x4_avx2: 83.1
vvc_w_avg_12_32x8_c: 7964.8
vvc_w_avg_12_32x8_avx2: 132.6
vvc_w_avg_12_32x16_c: 13321.6
vvc_w_avg_12_32x16_avx2: 234.6
vvc_w_avg_12_32x32_c: 21149.3
vvc_w_avg_12_32x32_avx2: 433.3
vvc_w_avg_12_32x64_c: 43666.6
vvc_w_avg_12_32x64_avx2: 876.6
vvc_w_avg_12_32x128_c: 83189.8
vvc_w_avg_12_32x128_avx2: 1756.6
vvc_w_avg_12_64x2_c: 3829.8
vvc_w_avg_12_64x2_avx2: 83.1
vvc_w_avg_12_64x4_c: 8588.1
vvc_w_avg_12_64x4_avx2: 127.1
vvc_w_avg_12_64x8_c: 17027.6
vvc_w_avg_12_64x8_avx2: 310.6
vvc_w_avg_12_64x16_c: 29797.8
vvc_w_avg_12_64x16_avx2: 415.6
vvc_w_avg_12_64x32_c: 43854.3
vvc_w_avg_12_64x32_avx2: 773.3
vvc_w_avg_12_64x64_c: 137767.3
vvc_w_avg_12_64x64_avx2: 1608.6
vvc_w_avg_12_64x128_c: 316428.3
vvc_w_avg_12_64x128_avx2: 3249.8
vvc_w_avg_12_128x2_c: 8824.6
vvc_w_avg_12_128x2_avx2: 130.3
vvc_w_avg_12_128x4_c: 17173.6
vvc_w_avg_12_128x4_avx2: 219.3
vvc_w_avg_12_128x8_c: 21997.8
vvc_w_avg_12_128x8_avx2: 397.3
vvc_w_avg_12_128x16_c: 43553.8
vvc_w_avg_12_128x16_avx2: 790.1
vvc_w_avg_12_128x32_c: 89792.1
vvc_w_avg_12_128x32_avx2: 1497.6
vvc_w_avg_12_128x64_c: 226573.3
vvc_w_avg_12_128x64_avx2: 3153.1
vvc_w_avg_12_128x128_c: 332090.1
vvc_w_avg_12_128x128_avx2: 6499.6

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
2024-02-01 19:54:29 +08:00
Wu Jianhua
70889620f2 avcodec/vvcdec: reuse h26x/2656_inter.asm to enable x86 optimizations
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
2024-02-01 19:54:28 +08:00
Wu Jianhua
fc5ff6b0b8 avcodec/x86/h26x/h2656_inter: add dststride to put
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
2024-02-01 19:54:28 +08:00
Wu Jianhua
7d9f1f5485 avcodec/x86/hevc_mc: move put/put_uni to h26x/h2656_inter.asm
This enable that the asm optimization can be reused by VVC

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
2024-02-01 19:54:28 +08:00
Wu Jianhua
04c2e246a3 avcodec/hevcdsp_template: reuse put/put_luma/put_chroma from h2656_inter_template
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
2024-02-01 19:54:28 +08:00
Wu Jianhua
639b1820ce avcodec/vvc/vvc_inter_template: move put/put_luma/put_chroma template to h2656_inter_template.c
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
2024-02-01 19:54:28 +08:00
James Almer
fa469545ba avcodec: move leb reading functions to its own header
Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-31 11:19:16 -03:00
Andreas Rheinhardt
7252e4f8ee avcodec/aac_defines: Remove unused AAC_RENAME_32
Unused since fbe6a51b11.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-01-30 20:43:48 +01:00
Frank Plowman
36a986d9a1 lavc/vvc: Add check to num_multi_layer_olss
Check that vps_each_layer_is_an_ols_flag, which indicates that "at
least one OLS specified by the VPS contains more than one layer," is
set if num_multi_layer_olss is non-zero.

Fixes: 65160/clusterfuzz-testcase-minimized-ffmpeg_BSF_VVC_METADATA_fuzzer-4665241535119360

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Frank Plowman <post@frankplowman.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-30 09:24:03 -03:00
Zhao Zhili
bc944168db avcodec/mpeg4videodec: Remove write-only variable
Fix warning: variable 'time_incr' set but not used.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-01-30 18:25:52 +08:00
James Almer
66f028accb avcodec/cbs_h266: fix logic setting num_layers_in_ols when vps_ols_mode_idc is 2
The old code did not follow the syntax from the spec.

Reviewed-by: Frank Plowman <post@frankplowman.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-29 20:45:31 -03:00
Frank Plowman
85e031d5bf lavc/vvc: Increase IntraEdgeParams buffer size
The reference line buffers are used with indices in the range
-MAX_TB_SIZE - 3 to refw + FFMAX(1, w/h) * ref_idx + 1, which is
at most 5*MAX_TB_SIZE + 1.

Fixes buffer overflows.
http://fate.ffmpeg.org/report.cgi?slot=armv7-linux-gcc-9&time=20240124051736

Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-29 20:29:57 -03:00
Anton Khirnov
887a7817b6 lavc: move bitstream filters into bsf/ subdir 2024-01-29 18:40:14 +01:00
Andreas Rheinhardt
341d0419e1 avcodec/texturedsp: Factor common code out
Namely calling avctx->execute2(avctx,...).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-01-28 11:00:01 +01:00
Andreas Rheinhardt
4f58372c10 avcodec/hap: Avoid unnecessary opt.h inclusion
It presumably exists because HapContext contains an AVClass*.
Yet AVClass is actually defined in log.h and even this inclusion
can be avoided by struct AVClass*. This avoids opt.h inclusions
in hap.c and hapdec.c.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-01-28 10:59:36 +01:00
Andreas Rheinhardt
a3fc9fb9fb avcodec/texturedsp: Add separate TextureDSPEncContext
ff_texturedspenc_init() doesn't support most of the function types
supported for decoding; add a separate context containing only pointers
for the actually supported types.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-01-28 10:52:48 +01:00
Andreas Rheinhardt
8f791304f2 avcodec/dxvenc, hap(dec|enc): Move TextureDSPContext to stack
Only used during init.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-01-28 10:52:41 +01:00
Andreas Rheinhardt
e1d1304b4b avcodec/texturedspenc: Remove unused rgtc1_u_alpha encoding func
Effectively reverts 50a20de6b9.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-01-28 10:52:38 +01:00
Andreas Rheinhardt
916f016741 avcodec/dxvenc: Fix data races with slice threading
The old code set a common struct from each thread;
this only "worked" (but is still UB) because the values
written are the same for each thread.
Fix this by moving the assignments to the main thread.

(This also avoids casting const away from a const AVFrame*.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-01-28 10:52:24 +01:00
Andreas Rheinhardt
555879ca7c avcodec/dxvenc: Don't cast const away
Reviewed-by: Connor Worley <connorbworley@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-01-28 10:52:11 +01:00
Frank Plowman
0c517fcbe8 lavc/vvc: Fix emulation prevention byte handling
nal->skipped_bytes_pos contains the positions of errors relative to the
start of the slice header, whereas the position they were tested against
is relative to the start of the slice data, i.e. one byte after the end
of the slice header.

Patch fixes this by storing the size of the slice header in H266RawSlice
and adding it to the position given by the GetBitContext before
comparing to skipped_bytes_pos.  This fixes AVERROR_INVALIDDATAs for
various valid bitstreams, such as the LMCS_B_Dolby_2 conformance
bitstream.

Signed-off-by: Frank Plowman <post@frankplowman.com>
2024-01-27 11:29:40 -03:00
James Almer
eb4584f994 avcodec/vvc_ps: remove duplicated enum
Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-26 19:20:48 -03:00
Frank Plowman
763e31a8d3 lavc/vvc: Clamp shift RHS
Resolves the following undefined behavior sanitiser error:
runtime error: shift exponent 32 is too large for 32-bit type 'int'

Signed-off-by: Frank Plowman <post@frankplowman.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-26 15:47:41 -03:00
Frank Plowman
cb7b4ee024 lavc/vvc: Use av_log2 when destination is integer
Signed-off-by: Frank Plowman <post@frankplowman.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-26 15:47:41 -03:00
Tong Wu
8c99a1429a avcodec/d3d12va_mpeg2|vc1: remove the unused macros
These macros are no longer used. Remove them.

Signed-off-by: Tong Wu <tong1.wu@intel.com>
2024-01-26 09:22:12 +08:00
Tong Wu
8b41e9cfbe avcodec/d3d12va_decode: check existance before assigning a new index
Fixes #10759.

It can happen in H.264, MPEG2, VC1 that the current frame resource
memory is already in ref_resource. For example, for a interlaced frame,
the same curr memory is passed twice. For the second time it could possibly
reference itself. When this happens the curr is already given an index and
in ref_resources. When the reference frame index is required, we should check
the existance in the ref_resources first before assigning a new index for it.

Signed-off-by: Tong Wu <tong1.wu@intel.com>
2024-01-26 09:22:12 +08:00
Leo Izen
ac06190a5a
avcodec/libjxl.h: include version.h
This file has been exported since our minimum required version (0.7.0),
but it wasn't documented. Instead it was transitively included by
<jxl/decode.h> (but not jxl/encode.h), which ffmpeg relied on.

libjxl broke its API in libjxl/libjxl@66b9592393 by removing
the transitive include of version.h, and they do not plan on adding
it back. Instead they are choosing to leave the API backwards-
incompatible with downstream callers written for some fairly recent
versions of their API.

As a result, we include <jxl/version.h> to continue to build against
more recent versions of libjxl. The version macros removed are also
present in that file, so we no longer need to redefine them.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2024-01-25 11:07:28 -05:00
James Almer
45a2f2635d avcodec/d3d12va: remove unused variables
Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-24 17:34:28 -03:00
James Almer
49a7fe86fe avcodec/d3d12va_vc1: cast mapped_data to void* before passing it to mapped_data()
Fixes -Wincompatible-pointer-types warnings.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-24 17:28:05 -03:00
James Almer
9aa388e758 avcodec/d3d12va_hevc: cast mapped_data to void* before passing it to mapped_data()
Fixes -Wincompatible-pointer-types warnings.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-24 17:28:05 -03:00
James Almer
342cc1792f avcodec/d3d12va_h264: cast mapped_data to void* before passing it to mapped_data()
Fixes -Wincompatible-pointer-types warnings.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-24 17:28:04 -03:00
James Almer
04332ca35e avcodec/d3d12va_vp9.c: change the type for the ID3D12Resource_Map mapped_data argument
Fixes -Wincompatible-pointer-types warnings.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-24 17:27:58 -03:00
James Almer
b4d871fdc8 avcodec/d3d12va_av1.c: change the type for the ID3D12Resource_Map mapped_data argument
Fixes -Wincompatible-pointer-types warnings.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-24 17:27:40 -03:00
James Almer
cb6a488fba avcodec/vvc_mvs: remove an unnecessary AV_ZERO64() call
Should fix "member access within misaligned address 0xf00 for type 'const union
av_alias64', which requires 8 byte alignment" errors as reported by GCC ubsan.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-24 12:41:01 -03:00
James Almer
bc1d8a9b76 avcodec/vvc_mvs: align local motion vector fields
Should fix "member access within misaligned address 0xf00 for type 'const union
av_alias64', which requires 8 byte alignment" errors as reported by GCC ubsan.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-24 12:41:01 -03:00
Andreas Rheinhardt
3435565e26 avcodec/d3d12va_(av1|hevc|vp9): Don't use deprecated FF_PROFILE_*
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-01-24 15:49:12 +01:00
Connor Worley
dfbbd11a4b lavc/dxvenc: add DXV encoder with support for DXT1 texture format
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2024-01-23 21:31:22 +01:00
James Almer
1496ce8f6b avcodec/vvc_ctu: align motion vector fields
Should fix "member access within misaligned address 0xf00 for type 'const union
av_alias64', which requires 8 byte alignment" errors as reported by GCC ubsan.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-23 17:24:15 -03:00
Frank Plowman
8157b5d405 lavc/vvc: Remove left shifts of negative values
VVC specifies << as arithmetic left shift, i.e. x << y is equivalent to
x * pow2(y).  C's << on the other hand has UB if x is negative.  This
patch removes all UB resulting from this, mostly by replacing x << y
with x * (1 << y), but there are also a couple places where the OOP was
changed instead.

Signed-off-by: Frank Plowman <post@frankplowman.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-23 11:17:05 -03:00
James Almer
ab39cc36c7 avcodec/speexdec: fix setting frame_size from extradata
Finishes fixing vp5/potter512-400-partial.avi

The fate-matroska-ms-mode test ref is updated to reflect that the Speex decoder
can now read the stream.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-22 10:58:12 -03:00
James Almer
cad35f0a77 avcodec/speexdec: relax the extradata check for the speex string
There could be bogus bytes at the start, as is the case of
vp5/potter512-400-partial.avi from the FATE suite, which could be a case of bad
remuxing from an OGG source.

Partially fixes decoding of vp5/potter512-400-partial.avi

Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-22 10:58:12 -03:00
Anton Khirnov
08bebeb1be Revert "all: Don't set AVClass.item_name to its default value"
Some callers assume that item_name is always set, so this may be
considered an API break.

This reverts commit 0c6203c97a.
2024-01-20 10:34:48 +01:00
James Almer
0a5813fc68 avcodec/vvcdec: allocate and store structs on their own within the table list
Fixes "runtime error: member access within misaligned address 0xf00 for type
'struct bar', which requires # byte alignment" errors under GCC ubsan.

Reviewed-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-19 08:53:32 -03:00
sunyuechi
8e23ebe6f9 lavc/svq1enc: R-V V ssd_int8_vs_int16
C908
ssd_int8_vs_int16_c: 207.7
ssd_int8_vs_int16_rvv_i32: 14.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-01-17 17:49:54 +02:00
Nuo Mi
d595e0a0b6 avcodec/vvcdec: misc, constify hor_ctu_edge
Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-17 10:14:50 -03:00