Commit graph

23 commits

Author SHA1 Message Date
Zhao Zhili
952508ae05 aarch64/vvc: Add apply_bdof
Test on rpi 5 with gcc 12:

apply_bdof_8_8x16_c:                                  7315.2 ( 1.00x)
apply_bdof_8_8x16_neon:                               1876.8 ( 3.90x)
apply_bdof_8_16x8_c:                                  7170.5 ( 1.00x)
apply_bdof_8_16x8_neon:                               1752.8 ( 4.09x)
apply_bdof_8_16x16_c:                                14695.2 ( 1.00x)
apply_bdof_8_16x16_neon:                              3490.5 ( 4.21x)
apply_bdof_10_8x16_c:                                 7371.5 ( 1.00x)
apply_bdof_10_8x16_neon:                              1863.8 ( 3.96x)
apply_bdof_10_16x8_c:                                 7172.0 ( 1.00x)
apply_bdof_10_16x8_neon:                              1766.0 ( 4.06x)
apply_bdof_10_16x16_c:                               14551.5 ( 1.00x)
apply_bdof_10_16x16_neon:                             3576.0 ( 4.07x)
apply_bdof_12_8x16_c:                                 7236.5 ( 1.00x)
apply_bdof_12_8x16_neon:                              1863.8 ( 3.88x)
apply_bdof_12_16x8_c:                                 7316.5 ( 1.00x)
apply_bdof_12_16x8_neon:                              1758.8 ( 4.16x)
apply_bdof_12_16x16_c:                               14691.2 ( 1.00x)
apply_bdof_12_16x16_neon:                             3480.5 ( 4.22x)
2024-12-21 11:54:44 +08:00
Martin Storsjö
2bb00ef59c aarch64: vvc: Fix building the dmvr_hv assembly with older MSVC versions
Explicitly use ldur for unaligned offsets; newer versions of
armasm64 implicitly convert ldr to ldur as necessary, but older
versions require it explicitly written out.

This fixes these build errors:

    ffmpeg\libavcodec\aarch64\vvc\inter.o.asm(2039) :
     error A2518: operand 2: Memory offset must be aligned
            ldr             s5, [x1, #1]
    ffmpeg\libavcodec\aarch64\vvc\inter.o.asm(2250) :
     error A2518: operand 2: Memory offset must be aligned
            ldr             d7, [x1, #2]

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-12-18 13:45:09 +02:00
Zhao Zhili
40feba5f77 aarch64/vvc: Fix clip in alf
Fix test failure:
./tests/checkasm/checkasm --test=vvc_alf 3607569773
2024-12-10 21:00:47 +08:00
Zhao Zhili
91436638de aarch64/vvc: Use faster clip operation
Replace sqxtn+smin+smax by sqxtun+umin.
2024-12-10 21:00:47 +08:00
Zhao Zhili
bfed5f6b7d aarch64/vvc: Reuse ff_vvc_put_pel_pixels for chroma 2024-12-10 21:00:47 +08:00
Zhao Zhili
5988a2729b aarch64/vvc: Add dmvr
dmvr_8_12x20_c:                                          1.5 ( 1.00x)
dmvr_8_12x20_neon:                                       0.2 ( 6.56x)
dmvr_8_20x12_c:                                          1.0 ( 1.00x)
dmvr_8_20x12_neon:                                       0.2 ( 4.33x)
dmvr_8_20x20_c:                                          1.7 ( 1.00x)
dmvr_8_20x20_neon:                                       0.5 ( 3.63x)
dmvr_12_12x20_c:                                         2.2 ( 1.00x)
dmvr_12_12x20_neon:                                      0.5 ( 4.68x)
dmvr_12_20x12_c:                                         2.0 ( 1.00x)
dmvr_12_20x12_neon:                                      0.5 ( 4.16x)
dmvr_12_20x20_c:                                         3.7 ( 1.00x)
dmvr_12_20x20_neon:                                      0.7 ( 5.14x)

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-10-01 10:28:54 +08:00
Zhao Zhili
bcd65ebd8f aarch64/vvc: Add dmvr_hv
dmvr_hv_8_12x20_c:                                       8.0 ( 1.00x)
dmvr_hv_8_12x20_neon:                                    1.2 ( 6.62x)
dmvr_hv_8_20x12_c:                                       8.0 ( 1.00x)
dmvr_hv_8_20x12_neon:                                    0.9 ( 8.37x)
dmvr_hv_8_20x20_c:                                      12.9 ( 1.00x)
dmvr_hv_8_20x20_neon:                                    1.7 ( 7.62x)
dmvr_hv_10_12x20_c:                                      7.0 ( 1.00x)
dmvr_hv_10_12x20_neon:                                   1.7 ( 4.09x)
dmvr_hv_10_20x12_c:                                      7.0 ( 1.00x)
dmvr_hv_10_20x12_neon:                                   1.7 ( 4.09x)
dmvr_hv_10_20x20_c:                                     11.2 ( 1.00x)
dmvr_hv_10_20x20_neon:                                   2.7 ( 4.15x)
dmvr_hv_12_12x20_c:                                      6.5 ( 1.00x)
dmvr_hv_12_12x20_neon:                                   1.7 ( 3.79x)
dmvr_hv_12_20x12_c:                                      6.5 ( 1.00x)
dmvr_hv_12_20x12_neon:                                   1.7 ( 3.79x)
dmvr_hv_12_20x20_c:                                     10.2 ( 1.00x)
dmvr_hv_12_20x20_neon:                                   2.2 ( 4.64x)

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-10-01 10:28:54 +08:00
Zhao Zhili
0ba9e8d0d4 aarch64/vvc: Add w_avg
w_avg_8_2x2_c:                                           0.0 ( 0.00x)
w_avg_8_2x2_neon:                                        0.0 ( 0.00x)
w_avg_8_4x4_c:                                           0.2 ( 1.00x)
w_avg_8_4x4_neon:                                        0.0 ( 0.00x)
w_avg_8_8x8_c:                                           1.2 ( 1.00x)
w_avg_8_8x8_neon:                                        0.2 ( 5.00x)
w_avg_8_16x16_c:                                         4.2 ( 1.00x)
w_avg_8_16x16_neon:                                      0.8 ( 5.67x)
w_avg_8_32x32_c:                                        16.2 ( 1.00x)
w_avg_8_32x32_neon:                                      2.5 ( 6.50x)
w_avg_8_64x64_c:                                        64.5 ( 1.00x)
w_avg_8_64x64_neon:                                      9.0 ( 7.17x)
w_avg_8_128x128_c:                                     269.5 ( 1.00x)
w_avg_8_128x128_neon:                                   35.5 ( 7.59x)
w_avg_10_2x2_c:                                          0.2 ( 1.00x)
w_avg_10_2x2_neon:                                       0.2 ( 1.00x)
w_avg_10_4x4_c:                                          0.2 ( 1.00x)
w_avg_10_4x4_neon:                                       0.2 ( 1.00x)
w_avg_10_8x8_c:                                          1.0 ( 1.00x)
w_avg_10_8x8_neon:                                       0.2 ( 4.00x)
w_avg_10_16x16_c:                                        4.2 ( 1.00x)
w_avg_10_16x16_neon:                                     0.8 ( 5.67x)
w_avg_10_32x32_c:                                       16.2 ( 1.00x)
w_avg_10_32x32_neon:                                     2.5 ( 6.50x)
w_avg_10_64x64_c:                                       66.2 ( 1.00x)
w_avg_10_64x64_neon:                                    10.0 ( 6.62x)
w_avg_10_128x128_c:                                    277.8 ( 1.00x)
w_avg_10_128x128_neon:                                  39.8 ( 6.99x)
w_avg_12_2x2_c:                                          0.0 ( 0.00x)
w_avg_12_2x2_neon:                                       0.2 ( 0.00x)
w_avg_12_4x4_c:                                          0.2 ( 1.00x)
w_avg_12_4x4_neon:                                       0.0 ( 0.00x)
w_avg_12_8x8_c:                                          1.2 ( 1.00x)
w_avg_12_8x8_neon:                                       0.5 ( 2.50x)
w_avg_12_16x16_c:                                        4.8 ( 1.00x)
w_avg_12_16x16_neon:                                     0.8 ( 6.33x)
w_avg_12_32x32_c:                                       17.0 ( 1.00x)
w_avg_12_32x32_neon:                                     2.8 ( 6.18x)
w_avg_12_64x64_c:                                       64.0 ( 1.00x)
w_avg_12_64x64_neon:                                    10.0 ( 6.40x)
w_avg_12_128x128_c:                                    269.2 ( 1.00x)
w_avg_12_128x128_neon:                                  42.0 ( 6.41x)

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-10-01 10:28:54 +08:00
Zhao Zhili
3f84d1d1fb aarch64/vvc: Add avg
avg_8_2x2_c:                                             0.2 ( 1.00x)
avg_8_2x2_neon:                                          0.2 ( 1.00x)
avg_8_4x4_c:                                             0.2 ( 1.00x)
avg_8_4x4_neon:                                          0.2 ( 1.00x)
avg_8_8x8_c:                                             0.9 ( 1.00x)
avg_8_8x8_neon:                                          0.2 ( 5.29x)
avg_8_16x16_c:                                           3.7 ( 1.00x)
avg_8_16x16_neon:                                        0.7 ( 5.44x)
avg_8_32x32_c:                                          14.9 ( 1.00x)
avg_8_32x32_neon:                                        1.7 ( 8.91x)
avg_8_64x64_c:                                          59.7 ( 1.00x)
avg_8_64x64_neon:                                        6.9 ( 8.62x)
avg_8_128x128_c:                                       254.7 ( 1.00x)
avg_8_128x128_neon:                                     26.9 ( 9.46x)
avg_10_2x2_c:                                            0.2 ( 1.00x)
avg_10_2x2_neon:                                         0.2 ( 1.00x)
avg_10_4x4_c:                                            0.2 ( 1.00x)
avg_10_4x4_neon:                                         0.2 ( 1.00x)
avg_10_8x8_c:                                            0.9 ( 1.00x)
avg_10_8x8_neon:                                         0.2 ( 5.29x)
avg_10_16x16_c:                                          3.4 ( 1.00x)
avg_10_16x16_neon:                                       0.4 ( 8.06x)
avg_10_32x32_c:                                         13.9 ( 1.00x)
avg_10_32x32_neon:                                       1.9 ( 7.23x)
avg_10_64x64_c:                                         54.2 ( 1.00x)
avg_10_64x64_neon:                                       8.4 ( 6.43x)
avg_10_128x128_c:                                      232.4 ( 1.00x)
avg_10_128x128_neon:                                    30.9 ( 7.52x)
avg_12_2x2_c:                                            0.0 ( 0.00x)
avg_12_2x2_neon:                                         0.2 ( 0.00x)
avg_12_4x4_c:                                            0.4 ( 1.00x)
avg_12_4x4_neon:                                         0.2 ( 2.43x)
avg_12_8x8_c:                                            0.7 ( 1.00x)
avg_12_8x8_neon:                                         0.2 ( 3.86x)
avg_12_16x16_c:                                          3.7 ( 1.00x)
avg_12_16x16_neon:                                       0.4 ( 8.65x)
avg_12_32x32_c:                                         13.7 ( 1.00x)
avg_12_32x32_neon:                                       2.2 ( 6.29x)
avg_12_64x64_c:                                         53.9 ( 1.00x)
avg_12_64x64_neon:                                       7.7 ( 7.03x)
avg_12_128x128_c:                                      270.9 ( 1.00x)
avg_12_128x128_neon:                                    30.4 ( 8.90x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
1be5a2374f aarch64/vvc: Add put_epel_hv
On Apple M1:

put_chroma_hv_8_4x4_c:                                   1.7 ( 1.00x)
put_chroma_hv_8_4x4_neon:                                0.2 ( 7.67x)
put_chroma_hv_8_8x8_c:                                   5.5 ( 1.00x)
put_chroma_hv_8_8x8_neon:                                0.5 (11.53x)
put_chroma_hv_8_16x16_c:                                18.5 ( 1.00x)
put_chroma_hv_8_16x16_neon:                              1.5 (12.53x)
put_chroma_hv_8_32x32_c:                                72.5 ( 1.00x)
put_chroma_hv_8_32x32_neon:                              4.7 (15.34x)
put_chroma_hv_8_64x64_c:                               274.0 ( 1.00x)
put_chroma_hv_8_64x64_neon:                             18.5 (14.83x)
put_chroma_hv_8_128x128_c:                            1058.7 ( 1.00x)
put_chroma_hv_8_128x128_neon:                           75.2 (14.07x)

On Android Pixel 8 Pro:

put_chroma_hv_8_4x4_c:                                   1.2 ( 1.00x)
put_chroma_hv_8_4x4_neon:                                0.0 ( 0.00x)
put_chroma_hv_8_4x4_i8mm:                                0.2 ( 5.00x)
put_chroma_hv_8_8x8_c:                                   4.0 ( 1.00x)
put_chroma_hv_8_8x8_neon:                                0.5 ( 8.00x)
put_chroma_hv_8_8x8_i8mm:                                0.5 ( 8.00x)
put_chroma_hv_8_16x16_c:                                15.2 ( 1.00x)
put_chroma_hv_8_16x16_neon:                              2.5 ( 6.10x)
put_chroma_hv_8_16x16_i8mm:                              2.2 ( 6.78x)
put_chroma_hv_8_32x32_c:                                61.0 ( 1.00x)
put_chroma_hv_8_32x32_neon:                              9.8 ( 6.26x)
put_chroma_hv_8_32x32_i8mm:                              8.5 ( 7.18x)
put_chroma_hv_8_64x64_c:                               229.5 ( 1.00x)
put_chroma_hv_8_64x64_neon:                             38.5 ( 5.96x)
put_chroma_hv_8_64x64_i8mm:                             34.0 ( 6.75x)
put_chroma_hv_8_128x128_c:                             919.8 ( 1.00x)
put_chroma_hv_8_128x128_neon:                          154.5 ( 5.95x)
put_chroma_hv_8_128x128_i8mm:                          140.0 ( 6.57x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
0dcf204e5d aarch64/vvc: Add put_epel_h i8mm
put_chroma_h_8_4x4_c:                                    0.4 ( 1.00x)
put_chroma_h_8_4x4_neon:                                 0.0 ( 0.00x)
put_chroma_h_8_4x4_i8mm:                                 0.1 ( 2.67x)
put_chroma_h_8_8x8_c:                                    1.6 ( 1.00x)
put_chroma_h_8_8x8_neon:                                 0.1 (11.00x)
put_chroma_h_8_8x8_i8mm:                                 0.1 (11.00x)
put_chroma_h_8_16x16_c:                                  6.9 ( 1.00x)
put_chroma_h_8_16x16_neon:                               1.1 ( 6.00x)
put_chroma_h_8_16x16_i8mm:                               0.7 (10.62x)
put_chroma_h_8_32x32_c:                                 27.6 ( 1.00x)
put_chroma_h_8_32x32_neon:                               4.7 ( 5.95x)
put_chroma_h_8_32x32_i8mm:                               4.4 ( 6.28x)
put_chroma_h_8_64x64_c:                                116.2 ( 1.00x)
put_chroma_h_8_64x64_neon:                              19.1 ( 6.07x)
put_chroma_h_8_64x64_i8mm:                              17.1 ( 6.77x)
put_chroma_h_8_128x128_c:                              466.6 ( 1.00x)
put_chroma_h_8_128x128_neon:                            81.4 ( 5.73x)
put_chroma_h_8_128x128_i8mm:                            71.7 ( 6.51x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
41a1885f7a aarch64/vvc: Add put_epel_h
put_chroma_h_8_4x4_c:                                    0.2 ( 1.00x)
put_chroma_h_8_4x4_neon:                                 0.2 ( 1.00x)
put_chroma_h_8_8x8_c:                                    0.8 ( 1.00x)
put_chroma_h_8_8x8_neon:                                 0.2 ( 3.00x)
put_chroma_h_8_16x16_c:                                  3.8 ( 1.00x)
put_chroma_h_8_16x16_neon:                               0.8 ( 5.00x)
put_chroma_h_8_32x32_c:                                 12.5 ( 1.00x)
put_chroma_h_8_32x32_neon:                               2.2 ( 5.56x)
put_chroma_h_8_64x64_c:                                 47.0 ( 1.00x)
put_chroma_h_8_64x64_neon:                               8.8 ( 5.37x)
put_chroma_h_8_128x128_c:                              200.2 ( 1.00x)
put_chroma_h_8_128x128_neon:                            31.8 ( 6.31x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
260e1b4b62 aarch64/vvc: Add sad
sad_8x16_c:                                              0.8 ( 1.00x)
sad_8x16_neon:                                           0.2 ( 3.00x)
sad_16x8_c:                                              0.5 ( 1.00x)
sad_16x8_neon:                                           0.2 ( 2.00x)
sad_16x16_c:                                             1.5 ( 1.00x)
sad_16x16_neon:                                          0.2 ( 6.00x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
5ac6925803 aarch64/vvc: Add put_qpel_hv
With Apple M1 (no i8mm):

put_luma_hv_8_4x4_c:                                     2.2 ( 1.00x)
put_luma_hv_8_4x4_neon:                                  0.8 ( 3.00x)
put_luma_hv_8_8x8_c:                                     7.0 ( 1.00x)
put_luma_hv_8_8x8_neon:                                  0.8 ( 9.33x)
put_luma_hv_8_16x16_c:                                  22.8 ( 1.00x)
put_luma_hv_8_16x16_neon:                                2.5 ( 9.10x)
put_luma_hv_8_32x32_c:                                  84.8 ( 1.00x)
put_luma_hv_8_32x32_neon:                                9.5 ( 8.92x)
put_luma_hv_8_64x64_c:                                 333.0 ( 1.00x)
put_luma_hv_8_64x64_neon:                               35.5 ( 9.38x)
put_luma_hv_8_128x128_c:                              1294.5 ( 1.00x)
put_luma_hv_8_128x128_neon:                            137.8 ( 9.40x)

With Pixel 8 Pro:

put_luma_hv_8_4x4_c:                                     5.0 ( 1.00x)
put_luma_hv_8_4x4_neon:                                  0.8 ( 6.67x)
put_luma_hv_8_4x4_i8mm:                                  0.2 (20.00x)
put_luma_hv_8_8x8_c:                                    13.2 ( 1.00x)
put_luma_hv_8_8x8_neon:                                  1.2 (10.60x)
put_luma_hv_8_8x8_i8mm:                                  1.2 (10.60x)
put_luma_hv_8_16x16_c:                                  44.2 ( 1.00x)
put_luma_hv_8_16x16_neon:                                4.5 ( 9.83x)
put_luma_hv_8_16x16_i8mm:                                4.2 (10.41x)
put_luma_hv_8_32x32_c:                                 160.8 ( 1.00x)
put_luma_hv_8_32x32_neon:                               17.5 ( 9.19x)
put_luma_hv_8_32x32_i8mm:                               16.0 (10.05x)
put_luma_hv_8_64x64_c:                                 611.2 ( 1.00x)
put_luma_hv_8_64x64_neon:                               68.0 ( 8.99x)
put_luma_hv_8_64x64_i8mm:                               62.2 ( 9.82x)
put_luma_hv_8_128x128_c:                              2384.8 ( 1.00x)
put_luma_hv_8_128x128_neon:                            268.8 ( 8.87x)
put_luma_hv_8_128x128_i8mm:                            245.8 ( 9.70x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
a0b52afd32 aarch64/vvc: Add put_qpel_vx
put_luma_v_8_4x4_c:                                      1.0 ( 1.00x)
put_luma_v_8_4x4_neon:                                   0.0 ( 0.00x)
put_luma_v_8_8x8_c:                                      3.5 ( 1.00x)
put_luma_v_8_8x8_neon:                                   0.5 ( 7.00x)
put_luma_v_8_16x16_c:                                   13.8 ( 1.00x)
put_luma_v_8_16x16_neon:                                 1.2 (11.00x)
put_luma_v_8_32x32_c:                                   54.2 ( 1.00x)
put_luma_v_8_32x32_neon:                                 5.0 (10.85x)
put_luma_v_8_64x64_c:                                  217.5 ( 1.00x)
put_luma_v_8_64x64_neon:                                18.8 (11.60x)
put_luma_v_8_128x128_c:                                886.2 ( 1.00x)
put_luma_v_8_128x128_neon:                              74.0 (11.98x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
9f6c8eb412 aarch64/vvc: Add put_qpel_hx i8mm
Benchmark on Android pixel 8 with -fno-vectorize

put_luma_h_8_4x4_c:                                      0.2 ( 1.00x)
put_luma_h_8_4x4_neon:                                   0.2 ( 1.00x)
put_luma_h_8_4x4_i8mm:                                   0.0 ( 0.00x)
put_luma_h_8_8x8_c:                                      1.5 ( 1.00x)
put_luma_h_8_8x8_neon:                                   0.5 ( 3.00x)
put_luma_h_8_8x8_i8mm:                                   0.5 ( 3.00x)
put_luma_h_8_16x16_c:                                    6.2 ( 1.00x)
put_luma_h_8_16x16_neon:                                 2.0 ( 3.12x)
put_luma_h_8_16x16_i8mm:                                 1.5 ( 4.17x)
put_luma_h_8_32x32_c:                                   25.5 ( 1.00x)
put_luma_h_8_32x32_neon:                                 9.0 ( 2.83x)
put_luma_h_8_32x32_i8mm:                                 6.8 ( 3.78x)
put_luma_h_8_64x64_c:                                   99.8 ( 1.00x)
put_luma_h_8_64x64_neon:                                35.2 ( 2.83x)
put_luma_h_8_64x64_i8mm:                                27.2 ( 3.66x)
put_luma_h_8_128x128_c:                                422.0 ( 1.00x)
put_luma_h_8_128x128_neon:                             138.5 ( 3.05x)
put_luma_h_8_128x128_i8mm:                             109.2 ( 3.86x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
25448d1716 aarch64/vvc: Add put_pel/put_pel_uni/put_pel_uni_w
put_luma_pixels_8_4x4_c:                                 0.2 ( 1.00x)
put_luma_pixels_8_4x4_neon:                              0.2 ( 1.00x)
put_luma_pixels_8_8x8_c:                                 0.7 ( 1.00x)
put_luma_pixels_8_8x8_neon:                              0.2 ( 3.22x)
put_luma_pixels_8_16x16_c:                               2.2 ( 1.00x)
put_luma_pixels_8_16x16_neon:                            0.2 ( 9.89x)
put_luma_pixels_8_32x32_c:                               8.2 ( 1.00x)
put_luma_pixels_8_32x32_neon:                            1.2 ( 6.71x)
put_luma_pixels_8_64x64_c:                              33.7 ( 1.00x)
put_luma_pixels_8_64x64_neon:                            2.5 (13.63x)
put_luma_pixels_8_128x128_c:                           145.5 ( 1.00x)
put_luma_pixels_8_128x128_neon:                         10.2 (14.23x)
put_uni_pixels_luma_8_4x4_c:                             0.5 ( 1.00x)
put_uni_pixels_luma_8_4x4_neon:                          0.0 ( 0.00x)
put_uni_pixels_luma_8_8x8_c:                             0.5 ( 1.00x)
put_uni_pixels_luma_8_8x8_neon:                          0.2 ( 2.11x)
put_uni_pixels_luma_8_16x16_c:                           1.2 ( 1.00x)
put_uni_pixels_luma_8_16x16_neon:                        0.2 ( 5.44x)
put_uni_pixels_luma_8_32x32_c:                           3.0 ( 1.00x)
put_uni_pixels_luma_8_32x32_neon:                        0.5 ( 6.26x)
put_uni_pixels_luma_8_64x64_c:                           3.0 ( 1.00x)
put_uni_pixels_luma_8_64x64_neon:                        1.7 ( 1.72x)
put_uni_pixels_luma_8_128x128_c:                         6.5 ( 1.00x)
put_uni_pixels_luma_8_128x128_neon:                      6.5 ( 1.00x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
20f2bf5530 aarch64/vvc: Add put_qpel_h_* and put_qpel_uni_h_*
Just share hevc implementation.

checkasm --test=vvc_mc --benchmark:

put_luma_h_8_4x4_c:                                      0.2 ( 1.00x)
put_luma_h_8_4x4_neon:                                   0.2 ( 1.00x)
put_luma_h_8_8x8_c:                                      1.0 ( 1.00x)
put_luma_h_8_8x8_neon:                                   0.2 ( 4.33x)
put_luma_h_8_16x16_c:                                    3.2 ( 1.00x)
put_luma_h_8_16x16_neon:                                 1.2 ( 2.63x)
put_luma_h_8_32x32_c:                                   13.7 ( 1.00x)
put_luma_h_8_32x32_neon:                                 4.0 ( 3.45x)
put_luma_h_8_64x64_c:                                   48.2 ( 1.00x)
put_luma_h_8_64x64_neon:                                15.7 ( 3.07x)
put_luma_h_8_128x128_c:                                203.5 ( 1.00x)
put_luma_h_8_128x128_neon:                              62.0 ( 3.28x)
put_uni_h_luma_8_4x4_c:                                  0.2 ( 1.00x)
put_uni_h_luma_8_4x4_neon:                               0.2 ( 1.00x)
put_uni_h_luma_8_8x8_c:                                  1.5 ( 1.00x)
put_uni_h_luma_8_8x8_neon:                               0.2 ( 6.56x)
put_uni_h_luma_8_16x16_c:                                5.7 ( 1.00x)
put_uni_h_luma_8_16x16_neon:                             1.2 ( 4.67x)
put_uni_h_luma_8_32x32_c:                               24.0 ( 1.00x)
put_uni_h_luma_8_32x32_neon:                             4.7 ( 5.07x)
put_uni_h_luma_8_64x64_c:                               90.0 ( 1.00x)
put_uni_h_luma_8_64x64_neon:                            17.0 ( 5.30x)
put_uni_h_luma_8_128x128_c:                            357.7 ( 1.00x)
put_uni_h_luma_8_128x128_neon:                          67.5 ( 5.30x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
4c0372281b aarch64/vvc: Bind h26x/sao filter implementation to vvc
Reviewed-by: Martin Storsjö <martin@martin.st>
2024-08-31 16:07:50 +08:00
Martin Storsjö
4acb9b7d10 aarch64: vvc: Fix unnecessary extra spaces
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-07-23 16:04:28 +03:00
Martin Storsjö
99598629e8 aarch64: vvc: Consistently use # for immediate constants
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-07-23 15:24:37 +03:00
Martin Storsjö
400843151d aarch64: vvc: Fix compilation of alf.S with MSVC 2022 17.7 and older
Use the "ldur" instruction explicitly, instead of having the
assembler implicitly convert "ldr" instructions to "ldur".

This fixes build errors like these:

libavcodec\aarch64\vvc\alf.o.asm(1023) : error A2518: operand 2: Memory offset must be aligned
        ldr             q22, [x3, #24]
libavcodec\aarch64\vvc\alf.o.asm(1024) : error A2518: operand 2: Memory offset must be aligned
        ldr             q24, [x2, #24]
libavcodec\aarch64\vvc\alf.o.asm(1393) : error A2518: operand 2: Memory offset must be aligned
        ldr             q22, [x3, #24]
libavcodec\aarch64\vvc\alf.o.asm(1394) : error A2518: operand 2: Memory offset must be aligned
        ldr             q24, [x2, #24]

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-07-23 15:24:33 +03:00
Zhao Zhili
2d4ef304c9 avcodec/vvc: Add aarch64 neon optimization for ALF
vvc_alf_filter_chroma_4x4_8_c: 3.0
vvc_alf_filter_chroma_4x4_8_neon: 1.0
vvc_alf_filter_chroma_4x4_10_c: 2.7
vvc_alf_filter_chroma_4x4_10_neon: 1.0
vvc_alf_filter_chroma_4x4_12_c: 2.7
vvc_alf_filter_chroma_4x4_12_neon: 1.0
vvc_alf_filter_chroma_8x8_8_c: 10.2
vvc_alf_filter_chroma_8x8_8_neon: 3.0
vvc_alf_filter_chroma_8x8_10_c: 10.0
vvc_alf_filter_chroma_8x8_10_neon: 2.5
vvc_alf_filter_chroma_8x8_12_c: 10.0
vvc_alf_filter_chroma_8x8_12_neon: 2.5
vvc_alf_filter_chroma_16x16_8_c: 41.7
vvc_alf_filter_chroma_16x16_8_neon: 11.2
vvc_alf_filter_chroma_16x16_10_c: 39.0
vvc_alf_filter_chroma_16x16_10_neon: 10.0
vvc_alf_filter_chroma_16x16_12_c: 40.2
vvc_alf_filter_chroma_16x16_12_neon: 10.2
vvc_alf_filter_chroma_32x32_8_c: 162.0
vvc_alf_filter_chroma_32x32_8_neon: 45.0
vvc_alf_filter_chroma_32x32_10_c: 155.5
vvc_alf_filter_chroma_32x32_10_neon: 39.5
vvc_alf_filter_chroma_32x32_12_c: 155.5
vvc_alf_filter_chroma_32x32_12_neon: 40.0
vvc_alf_filter_chroma_64x64_8_c: 646.0
vvc_alf_filter_chroma_64x64_8_neon: 175.5
vvc_alf_filter_chroma_64x64_10_c: 708.2
vvc_alf_filter_chroma_64x64_10_neon: 166.7
vvc_alf_filter_chroma_64x64_12_c: 619.2
vvc_alf_filter_chroma_64x64_12_neon: 157.2
vvc_alf_filter_chroma_128x128_8_c: 2611.5
vvc_alf_filter_chroma_128x128_8_neon: 698.2
vvc_alf_filter_chroma_128x128_10_c: 2470.0
vvc_alf_filter_chroma_128x128_10_neon: 616.0
vvc_alf_filter_chroma_128x128_12_c: 2531.5
vvc_alf_filter_chroma_128x128_12_neon: 620.2
vvc_alf_filter_luma_8x8_8_c: 25.2
vvc_alf_filter_luma_8x8_8_neon: 4.2
vvc_alf_filter_luma_8x8_10_c: 18.5
vvc_alf_filter_luma_8x8_10_neon: 4.0
vvc_alf_filter_luma_8x8_12_c: 19.0
vvc_alf_filter_luma_8x8_12_neon: 4.0
vvc_alf_filter_luma_16x16_8_c: 106.5
vvc_alf_filter_luma_16x16_8_neon: 16.2
vvc_alf_filter_luma_16x16_10_c: 75.2
vvc_alf_filter_luma_16x16_10_neon: 14.7
vvc_alf_filter_luma_16x16_12_c: 79.7
vvc_alf_filter_luma_16x16_12_neon: 14.7
vvc_alf_filter_luma_32x32_8_c: 400.5
vvc_alf_filter_luma_32x32_8_neon: 63.2
vvc_alf_filter_luma_32x32_10_c: 299.2
vvc_alf_filter_luma_32x32_10_neon: 57.7
vvc_alf_filter_luma_32x32_12_c: 299.2
vvc_alf_filter_luma_32x32_12_neon: 57.7
vvc_alf_filter_luma_64x64_8_c: 1602.5
vvc_alf_filter_luma_64x64_8_neon: 251.7
vvc_alf_filter_luma_64x64_10_c: 1197.0
vvc_alf_filter_luma_64x64_10_neon: 235.5
vvc_alf_filter_luma_64x64_12_c: 1220.2
vvc_alf_filter_luma_64x64_12_neon: 235.7
vvc_alf_filter_luma_128x128_8_c: 6570.2
vvc_alf_filter_luma_128x128_8_neon: 1007.7
vvc_alf_filter_luma_128x128_10_c: 4822.7
vvc_alf_filter_luma_128x128_10_neon: 936.2
vvc_alf_filter_luma_128x128_12_c: 4791.2
vvc_alf_filter_luma_128x128_12_neon: 938.5

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-22 21:09:56 +08:00