Rémi Denis-Courmont
a4cb6c724b
lavc/llvidencdsp: R-V V sub_left_predict
...
SpacemiT X60:
sub_left_predict_c: 51836.0 ( 1.00x)
sub_left_predict_rvv_i32: 5843.1 ( 8.87x)
2025-12-11 17:24:38 +02:00
Rémi Denis-Courmont
10ea5f8b99
lavc/h264idct: R-V V 9-bit h264_luma_dc_dequant_idct
...
Note that, like the C reference, the same function can be used for
larger bit depths.
2025-12-07 20:27:35 +02:00
Rémi Denis-Courmont
d69a36a8d1
lavc/h264idct: R-V V 8-bit h264_luma_dc_dequant_idct
...
This does not improve performance with current hardware due to the poor
performance of segmented accesses. Performance should be slightly better
with expensive or near-future hardware that I don't have, however it is
still limited by two other factors:
- There are only 4 elements.
- The final stores are necessarily indexed and hit multiple cache lines,
thus as slow as scalar.
2025-12-07 20:27:35 +02:00
Rémi Denis-Courmont
f222eb2b08
lavc/mpv_unquantize: R-V V H.263 DCT unquantize
...
SpacemiT X60:
dct_unquantize_h263_inter_c: 417.8 ( 1.00x)
dct_unquantize_h263_inter_rvv_i32: 66.0 ( 6.33x)
dct_unquantize_h263_intra_c: 140.2 ( 1.00x)
dct_unquantize_h263_intra_rvv_i32: 67.7 ( 2.07x)
Note that the C benchmarks are not stable, depending heavily on the
number of coefficients picked by the RNG. The R-V V benchmarks are
however very stable and generally better than C's.
2025-12-07 20:20:38 +02:00
Andreas Rheinhardt
9cff236e2f
avcodec/riscv/vp8dsp_rvv: Remove unused functions
...
Only the sixtap functions are used for size 16.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-04 15:17:37 +01:00
Rémi Denis-Courmont
39abb1ac94
pixblockdsp: avoid segments on R-V V diff_pixels_unaligned
...
On SpacemiT X86, before:
diff_pixels_unaligned_rvv_i32: 250.2 ( 0.59x)
...after:
diff_pixels_unaligned_rvv_i32: 56.9 ( 2.60x)
2025-11-07 08:43:23 +00:00
Rémi Denis-Courmont
c17d304e1f
pixblockdsp: avoid segments on R-V V get_pixels_unaligned
...
On SpacemiT X86, before:
get_pixels_unaligned_rvv_i32: 172.4 ( 0.37x)
...after:
get_pixels_unaligned_rvv_i32: 34.4 ( 1.84x)
2025-11-07 08:43:23 +00:00
Rémi Denis-Courmont
e3b0d58394
Revert "lavc/pixblockdsp: rework R-V V get_pixels_unaligned"
...
The optimised version does not work if the stride is not a multiple 8,
which can occur as reproduce by vsynth3-asv1 and vsynth3-asv2 tests.
This reverts commit 02594c8c01 .
Conflicts:
libavcodec/riscv/pixblockdsp_init.c
libavcodec/riscv/pixblockdsp_rvv.S
2025-11-07 08:43:23 +00:00
Kacper Michajłow
9ad20839fb
avcodec/pixblockdsp: be consistent about restrict use in ff_{get,diff}_pixels
...
Suppresses warnings about function pointer mismatch.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-10-25 01:01:15 +02:00
Timo Rothenpieler
262d41c804
all: fix typos found by codespell
2025-08-03 13:48:47 +02:00
Andreas Rheinhardt
9b409ea1e6
configure: Factor mpegvideoencdsp out of mpegvideoenc
...
This will allow to relax the dependency on mpegvideoenc
for several codecs.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-06-21 22:08:52 +02:00
Andreas Rheinhardt
20ddada2a3
avcodec/pixblockdsp: Improve 8 vs 16 bit check
...
Before this commit, the input in get_pixels and get_pixels_unaligned
has been treated inconsistenly:
- The generic code treated 9, 10, 12 and 14 bits as 16bit input
(these bits correspond to what FFmpeg's dsputils supported),
everything with <= 8 bits as 8 bit and everything else as 8 bit
when used via AVDCT (which exposes these functions and purports
to support up to 14 bits).
- AARCH64, ARM, PPC and RISC-V, x86 ignore this AVDCT special case.
- RISC-V also ignored the restriction to 9, 10, 12 and 14 for its
16bit check and treated everything > 8 bits as 16bit.
- The mmi MIPS code treats everything as 8 bit when used via
AVDCT (this is certainly broken); otherwise it checks for <= 8 bits.
The msa MIPS code behaves like the generic code.
This commit changes this to treat 9..16 bits as 16 bit input,
everything else as 8 bit (the former because it makes sense,
the latter to preserve the behaviour for external users*).
*: The only internal user of AVDCT (the spp filter) always
uses 8, 9 or 10 bits.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-05-31 01:25:27 +02:00
Andreas Rheinhardt
a064d34a32
avcodec/mpegvideoenc: Add MPVEncContext
...
Many of the fields of MpegEncContext (which is also used by decoders)
are actually only used by encoders. Therefore this commit adds
a new encoder-only structure and moves all of the encoder-only
fields to it except for those which require more explicit
synchronisation between the main slice context and the other
slice contexts. This synchronisation is currently mainly provided
by ff_update_thread_context() which simply copies most of
the main slice context over the other slice contexts. Fields
which are moved to the new MPVEncContext no longer participate
in this (which is desired, because it is horrible and for the
fields b) below wasteful) which means that some fields can only
be moved when explicit synchronisation code is added in later commits.
More explicitly, this commit moves the following fields:
a) Fields not copied by ff_update_duplicate_context():
dct_error_sum and dct_count; the former does not need synchronisation,
the latter is synchronised in merge_context_after_encode().
b) Fields which do not change after initialisation (these fields
could also be put into MPVMainEncContext at the cost of
an indirection to access them): lambda_table, adaptive_quant,
{luma,chroma}_elim_threshold, new_pic, fdsp, mpvencdsp, pdsp,
{p,b_forw,b_back,b_bidir_forw,b_bidir_back,b_direct,b_field}_mv_table,
[pb]_field_select_table, mb_{type,var,mean}, mc_mb_var, {min,max}_qcoeff,
{inter,intra}_quant_bias, ac_esc_length, the *_vlc_length fields,
the q_{intra,inter,chroma_intra}_matrix{,16}, dct_offset, mb_info,
mjpeg_ctx, rtp_mode, rtp_payload_size, encode_mb, all function
pointers, mpv_flags, quantizer_noise_shaping,
frame_reconstruction_bitfield, error_rate and intra_penalty.
c) Fields which are already (re)set explicitly: The PutBitContexts
pb, tex_pb, pb2; dquant, skipdct, encoding_error, the statistics
fields {mv,i_tex,p_tex,misc,last}_bits and i_count; last_mv_dir,
esc_pos (reset when writing the header).
d) Fields which are only used by encoders not supporting slice
threading for which synchronisation doesn't matter: esc3_level_length
and the remaining mb_info fields.
e) coded_score: This field is only really used when FF_MPV_FLAG_CBP_RD
is set (which implies trellis) and even then it is only used for
non-intra blocks. For these blocks dct_quantize_trellis_c() either
sets coded_score[n] or returns a last_non_zero value of -1
in which case coded_score will be reset in encode_mb_internal().
Therefore no old values are ever used.
The MotionEstContext has not been moved yet.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-03-26 04:08:33 +01:00
sunyuechi
a0a89efd07
Fix the tail handling in R-V V sad
...
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2025-01-25 09:37:45 +02:00
Jun Zhao
b88fc4e098
lavc/ac3dsp: fix R-V HAVE_RVV scope issue
...
fix R-V HAVE_RVV scope issue
Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2025-01-13 23:58:54 +08:00
Nuo Mi
8d27256a74
avcodec/vvcdec: remove vvc prefix for x86 and riscv
2024-12-22 21:00:06 +08:00
sunyuechi
6b31e42c47
lavc/riscv: vset macro for simplify if-else
2024-12-21 12:03:45 +08:00
Rémi Denis-Courmont
bd226fdd74
lavc/h264dsp: R-V V intra loop filter
...
As with the inter loop filter, performance metrics seem to be biased in
favour of the C implementation because checkasm inputs almost always
fall in the no-op case.
h264_h_loop_filter_chroma_intra_8bpp_c: 82.8 ( 1.00x)
h264_h_loop_filter_chroma_intra_8bpp_rvv_i32: 72.6 ( 1.14x)
h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 41.1 ( 1.00x)
h264_h_loop_filter_chroma_mbaff_intra_8bpp_rvv_i32: 72.6 ( 0.57x)
h264_h_loop_filter_luma_intra_8bpp_c: 166.1 ( 1.00x)
h264_h_loop_filter_luma_intra_8bpp_rvv_i32: 395.4 ( 0.42x)
h264_h_loop_filter_luma_mbaff_intra_8bpp_c: 93.3 ( 1.00x)
h264_h_loop_filter_luma_mbaff_intra_8bpp_rvv_i32: 395.4 ( 0.24x)
h264_v_loop_filter_chroma_intra_8bpp_c: 134.8 ( 1.00x)
h264_v_loop_filter_chroma_intra_8bpp_rvv_i32: 51.6 ( 2.61x)
h264_v_loop_filter_luma_intra_8bpp_c: 468.1 ( 1.00x)
h264_v_loop_filter_luma_intra_8bpp_rvv_i32: 134.8 ( 3.47x)
2024-12-17 09:00:28 +02:00
sunyuechi
16d4945e9a
lavc/vvc_mc R-V V sad
...
k230 banana_f3
sad_8x16_c: 387.7 ( 1.00x) 394.9 ( 1.00x)
sad_8x16_rvv_i32: 109.7 ( 3.53x) 103.5 ( 3.82x)
sad_16x8_c: 378.2 ( 1.00x) 384.7 ( 1.00x)
sad_16x8_rvv_i32: 82.0 ( 4.61x) 61.7 ( 6.24x)
sad_16x16_c: 748.7 ( 1.00x) 759.7 ( 1.00x)
sad_16x16_rvv_i32: 128.5 ( 5.83x) 113.7 ( 6.68x)
2024-12-17 09:21:20 +08:00
sunyuechi
b3f7440298
lavc/hevc: R-V V put_pixels(pow2)
...
k230 banana_f3
put_hevc_pel_pixels4_8_c: 61.6 ( 1.00x) 69.5 ( 1.00x)
put_hevc_pel_pixels4_8_rvv_i32: 24.6 ( 2.50x) 28.0 ( 2.48x)
put_hevc_pel_pixels8_8_c: 209.8 ( 1.00x) 215.5 ( 1.00x)
put_hevc_pel_pixels8_8_rvv_i32: 52.6 ( 3.99x) 38.2 ( 5.64x)
put_hevc_pel_pixels16_8_c: 839.4 ( 1.00x) 830.0 ( 1.00x)
put_hevc_pel_pixels16_8_rvv_i32: 126.6 ( 6.63x) 90.5 ( 9.17x)
put_hevc_pel_pixels32_8_c: 3246.6 ( 1.00x) 3246.7 ( 1.00x)
put_hevc_pel_pixels32_8_rvv_i32: 311.6 (10.42x) 257.0 (12.63x)
put_hevc_pel_pixels64_8_c: 12894.6 ( 1.00x) 12892.7 ( 1.00x)
put_hevc_pel_pixels64_8_rvv_i32: 1135.8 (11.35x) 778.0 (16.57x)
2024-12-17 09:21:20 +08:00
sunyuechi
dad062c4f8
lavc/vvc_mc: R-V V put_pixels
...
k230 banana_f3
put_chroma_pixels_8_4x4_c: 63.5 ( 1.00x) 59.2 ( 1.00x)
put_chroma_pixels_8_4x4_rvv_i32: 26.5 ( 2.39x) 28.0 ( 2.12x)
put_chroma_pixels_8_8x8_c: 211.8 ( 1.00x) 215.5 ( 1.00x)
put_chroma_pixels_8_8x8_rvv_i32: 54.3 ( 3.90x) 48.8 ( 4.42x)
put_chroma_pixels_8_16x16_c: 841.3 ( 1.00x) 830.0 ( 1.00x)
put_chroma_pixels_8_16x16_rvv_i32: 137.5 ( 6.12x) 121.8 ( 6.82x)
put_chroma_pixels_8_32x32_c: 3248.8 ( 1.00x) 3288.2 ( 1.00x)
put_chroma_pixels_8_32x32_rvv_i32: 350.5 ( 9.27x) 288.5 (11.40x)
put_chroma_pixels_8_64x64_c: 12998.3 ( 1.00x) 12976.2 ( 1.00x)
put_chroma_pixels_8_64x64_rvv_i32: 1100.5 (11.81x) 924.0 (14.04x)
put_chroma_pixels_8_128x128_c: 54284.0 ( 1.00x) 52654.5 ( 1.00x)
put_chroma_pixels_8_128x128_rvv_i32: 7192.8 ( 7.55x) 2934.2 (17.94x)
put_luma_pixels_8_4x4_c: 63.5 ( 1.00x) 69.5 ( 1.00x)
put_luma_pixels_8_4x4_rvv_i32: 26.5 ( 2.39x) 28.0 ( 2.48x)
put_luma_pixels_8_8x8_c: 211.5 ( 1.00x) 225.8 ( 1.00x)
put_luma_pixels_8_8x8_rvv_i32: 54.3 ( 3.90x) 38.5 ( 5.86x)
put_luma_pixels_8_16x16_c: 850.5 ( 1.00x) 830.0 ( 1.00x)
put_luma_pixels_8_16x16_rvv_i32: 137.5 ( 6.18x) 100.8 ( 8.24x)
put_luma_pixels_8_32x32_c: 3248.8 ( 1.00x) 3257.2 ( 1.00x)
put_luma_pixels_8_32x32_rvv_i32: 341.3 ( 9.52x) 246.8 (13.20x)
put_luma_pixels_8_64x64_c: 13007.5 ( 1.00x) 13038.8 ( 1.00x)
put_luma_pixels_8_64x64_rvv_i32: 1119.0 (11.62x) 684.2 (19.06x)
put_luma_pixels_8_128x128_c: 54219.3 ( 1.00x) 52060.8 ( 1.00x)
put_luma_pixels_8_128x128_rvv_i32: 6813.5 ( 7.96x) 2548.8 (20.43x)
2024-12-17 09:21:20 +08:00
sunyuechi
9288196c0d
lavc/riscv: Move VVC macro to h26x
2024-12-17 09:21:20 +08:00
sunyuechi
89df9c4404
lavc/vvc_mc: R-V V dmvr
...
k230 banana_f3
dmvr_8_12x20_c: 619.3 ( 1.00x) 624.1 ( 1.00x)
dmvr_8_12x20_rvv_i32: 128.6 ( 4.82x) 103.4 ( 6.04x)
dmvr_8_20x12_c: 610.0 ( 1.00x) 665.6 ( 1.00x)
dmvr_8_20x12_rvv_i32: 137.6 ( 4.44x) 92.9 ( 7.17x)
dmvr_8_20x20_c: 1008.0 ( 1.00x) 1082.7 ( 1.00x)
dmvr_8_20x20_rvv_i32: 221.1 ( 4.56x) 155.4 ( 6.97x)
dmvr_h_8_12x20_c: 2008.0 ( 1.00x) 2009.7 ( 1.00x)
dmvr_h_8_12x20_rvv_i32: 239.6 ( 8.38x) 186.7 (10.77x)
dmvr_h_8_20x12_c: 1989.5 ( 1.00x) 2009.4 ( 1.00x)
dmvr_h_8_20x12_rvv_i32: 230.3 ( 8.64x) 155.4 (12.93x)
dmvr_h_8_20x20_c: 3304.1 ( 1.00x) 3342.9 ( 1.00x)
dmvr_h_8_20x20_rvv_i32: 378.3 ( 8.73x) 248.9 (13.43x)
dmvr_hv_8_12x20_c: 3609.8 ( 1.00x) 3603.4 ( 1.00x)
dmvr_hv_8_12x20_rvv_i32: 369.1 ( 9.78x) 322.1 (11.19x)
dmvr_hv_8_20x12_c: 3628.3 ( 1.00x) 3624.2 ( 1.00x)
dmvr_hv_8_20x12_rvv_i32: 322.8 (11.24x) 238.7 (15.19x)
dmvr_hv_8_20x20_c: 5933.8 ( 1.00x) 5936.6 ( 1.00x)
dmvr_hv_8_20x20_rvv_i32: 526.5 (11.27x) 374.1 (15.87x)
dmvr_v_8_12x20_c: 2156.3 ( 1.00x) 2155.4 ( 1.00x)
dmvr_v_8_12x20_rvv_i32: 239.6 ( 9.00x) 176.2 (12.24x)
dmvr_v_8_20x12_c: 2137.6 ( 1.00x) 2165.9 ( 1.00x)
dmvr_v_8_20x12_rvv_i32: 230.3 ( 9.28x) 155.2 (13.96x)
dmvr_v_8_20x20_c: 4183.8 ( 1.00x) 3592.9 ( 1.00x)
dmvr_v_8_20x20_rvv_i32: 369.3 (11.33x) 249.2 (14.42x)
2024-12-17 09:21:20 +08:00
sunyuechi
b86766d610
Update R-V V vvc_mc vset to support more lengths
2024-12-17 09:21:20 +08:00
sunyuechi
2dc864eb4e
lavc/rv40dsp: fix RISC-V chroma_mc
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2024-12-10 11:24:45 -05:00
Rémi Denis-Courmont
f8e91ab05f
lavc/h264idct: fix compilation for RV32IMA
2024-11-25 19:29:21 +02:00
Rémi Denis-Courmont
f2b945147d
lavc/vp8dsp: fix compilation for RV32IMA
2024-11-25 19:29:21 +02:00
Rémi Denis-Courmont
d3acffae7a
lavc/pixblockdsp: fix compilation for RV32IMA
2024-11-25 19:29:21 +02:00
Rémi Denis-Courmont
da1ab7940e
riscv: remove unnecessary #include's
2024-11-25 19:29:21 +02:00
Rémi Denis-Courmont
607d4cca8e
riscv/h264dsp: remove spurious instruction
2024-11-18 22:02:19 +02:00
Rémi Denis-Courmont
b75dff0e20
lavc/h264dsp: fix R-V V weight_pixels pointer arithmetic
...
As of 459a1512f1 ,
the code is unrolled to process two rows per iteration.
The output cursor thus needs to be incremented by twice the
stride, which is taken care of with SH1ADD. However the original
ADD from the original implemetation was incorrectly left over.
2024-11-18 20:04:58 +02:00
Rémi Denis-Courmont
bbb0fdedb7
lavc/h264idct: fix RISC-V group multiplier
...
After the branch, the expected SEW/LMUL ratio is 1 byte/vector.
So we have to set the same ratio before branching (QEMU does not care,
but real hardware does).
2024-11-17 16:35:27 +02:00
Rémi Denis-Courmont
fd8cbfec3d
lavc/vp8dsp: remove RISC-V table alignment
...
These values are bytes and need not be aligned.
2024-11-17 11:28:21 +02:00
Rémi Denis-Courmont
690c015758
lavc/h264dsp: remove RISC-V table alignment
...
These values are bytes and need not be aligned.
2024-11-17 11:28:21 +02:00
Rémi Denis-Courmont
c3051d94a7
lavc/h264dsp: move RISC-V fn pointers to .data.rel.ro
...
This should fix PIC builds.
2024-11-16 16:04:24 +02:00
Rémi Denis-Courmont
1eb026dd8b
riscv/vvc: fix UNDEF whilst initialising DSP
...
The current triggers an illegal instruction if the CPU does not support
vectors.
2024-10-12 09:23:33 +03:00
Niklas Haas
2f77ecc6bc
avcodec/riscv: add h264 qpel
...
Benched on K230 for VLEN 128, SpaceMIT for VLEN 256. Variants for 4
width have no speedup for VLEN 256 vs VLEN 128 on available hardware,
so were disabled.
C RVV128 C RVV256
avg_h264_qpel_4_mc00_8 33.9 33.6 (1.01x)
avg_h264_qpel_4_mc01_8 218.8 89.1 (2.46x)
avg_h264_qpel_4_mc02_8 218.8 79.8 (2.74x)
avg_h264_qpel_4_mc03_8 218.8 89.1 (2.46x)
avg_h264_qpel_4_mc10_8 172.3 126.1 (1.37x)
avg_h264_qpel_4_mc11_8 339.1 190.8 (1.78x)
avg_h264_qpel_4_mc12_8 533.6 357.6 (1.49x)
avg_h264_qpel_4_mc13_8 348.4 190.8 (1.83x)
avg_h264_qpel_4_mc20_8 144.8 116.8 (1.24x)
avg_h264_qpel_4_mc21_8 478.1 385.6 (1.24x)
avg_h264_qpel_4_mc22_8 348.4 283.6 (1.23x)
avg_h264_qpel_4_mc23_8 478.1 394.6 (1.21x)
avg_h264_qpel_4_mc30_8 172.6 126.1 (1.37x)
avg_h264_qpel_4_mc31_8 339.4 191.1 (1.78x)
avg_h264_qpel_4_mc32_8 542.9 357.6 (1.52x)
avg_h264_qpel_4_mc33_8 339.4 191.1 (1.78x)
avg_h264_qpel_8_mc00_8 116.8 42.9 (2.72x) 123.6 50.6 (2.44x)
avg_h264_qpel_8_mc01_8 774.4 163.1 (4.75x) 779.8 165.1 (4.72x)
avg_h264_qpel_8_mc02_8 774.4 154.1 (5.03x) 779.8 144.3 (5.40x)
avg_h264_qpel_8_mc03_8 774.4 163.3 (4.74x) 779.8 165.3 (4.72x)
avg_h264_qpel_8_mc10_8 617.1 237.3 (2.60x) 613.1 227.6 (2.69x)
avg_h264_qpel_8_mc11_8 1209.3 376.4 (3.21x) 1206.8 363.1 (3.32x)
avg_h264_qpel_8_mc12_8 1913.3 598.6 (3.20x) 1894.3 561.1 (3.38x)
avg_h264_qpel_8_mc13_8 1218.6 376.4 (3.24x) 1217.1 363.1 (3.35x)
avg_h264_qpel_8_mc20_8 524.4 228.1 (2.30x) 519.3 227.6 (2.28x)
avg_h264_qpel_8_mc21_8 1709.6 681.9 (2.51x) 1707.1 644.3 (2.65x)
avg_h264_qpel_8_mc22_8 1274.3 459.6 (2.77x) 1279.8 436.1 (2.93x)
avg_h264_qpel_8_mc23_8 1700.3 672.6 (2.53x) 1706.8 644.6 (2.65x)
avg_h264_qpel_8_mc30_8 607.6 246.6 (2.46x) 623.6 238.1 (2.62x)
avg_h264_qpel_8_mc31_8 1209.6 376.4 (3.21x) 1206.8 363.1 (3.32x)
avg_h264_qpel_8_mc32_8 1904.1 607.9 (3.13x) 1894.3 571.3 (3.32x)
avg_h264_qpel_8_mc33_8 1209.6 376.1 (3.22x) 1206.8 363.1 (3.32x)
avg_h264_qpel_16_mc00_8 431.9 89.1 (4.85x) 436.1 71.3 (6.12x)
avg_h264_qpel_16_mc01_8 2894.6 376.1 (7.70x) 2842.3 300.6 (9.46x)
avg_h264_qpel_16_mc02_8 2987.3 348.4 (8.57x) 2967.3 290.1 (10.23x)
avg_h264_qpel_16_mc03_8 2885.3 376.4 (7.67x) 2842.3 300.6 (9.46x)
avg_h264_qpel_16_mc10_8 2404.1 524.4 (4.58x) 2404.8 456.8 (5.26x)
avg_h264_qpel_16_mc11_8 4709.4 811.6 (5.80x) 4675.6 706.8 (6.62x)
avg_h264_qpel_16_mc12_8 7477.9 1274.3 (5.87x) 7436.1 1061.1 (7.01x)
avg_h264_qpel_16_mc13_8 4718.6 820.6 (5.75x) 4655.1 706.8 (6.59x)
avg_h264_qpel_16_mc20_8 2052.1 487.1 (4.21x) 2071.3 446.3 (4.64x)
avg_h264_qpel_16_mc21_8 7440.6 1422.6 (5.23x) 6727.8 1217.3 (5.53x)
avg_h264_qpel_16_mc22_8 5051.9 950.4 (5.32x) 5071.6 790.3 (6.42x)
avg_h264_qpel_16_mc23_8 6764.9 1422.3 (4.76x) 6748.6 1217.3 (5.54x)
avg_h264_qpel_16_mc30_8 2413.1 524.4 (4.60x) 2415.1 467.3 (5.17x)
avg_h264_qpel_16_mc31_8 4681.6 839.1 (5.58x) 4675.6 727.6 (6.43x)
avg_h264_qpel_16_mc32_8 8579.6 1292.8 (6.64x) 7436.3 1071.3 (6.94x)
avg_h264_qpel_16_mc33_8 5375.9 829.9 (6.48x) 4665.3 717.3 (6.50x)
put_h264_qpel_4_mc00_8 24.4 24.4 (1.00x)
put_h264_qpel_4_mc01_8 987.4 79.8 (12.37x)
put_h264_qpel_4_mc02_8 190.8 79.8 (2.39x)
put_h264_qpel_4_mc03_8 209.6 89.1 (2.35x)
put_h264_qpel_4_mc10_8 163.3 117.1 (1.39x)
put_h264_qpel_4_mc11_8 339.4 181.6 (1.87x)
put_h264_qpel_4_mc12_8 533.6 348.4 (1.53x)
put_h264_qpel_4_mc13_8 339.4 190.8 (1.78x)
put_h264_qpel_4_mc20_8 126.3 116.8 (1.08x)
put_h264_qpel_4_mc21_8 468.9 376.1 (1.25x)
put_h264_qpel_4_mc22_8 330.1 274.4 (1.20x)
put_h264_qpel_4_mc23_8 468.9 376.1 (1.25x)
put_h264_qpel_4_mc30_8 163.3 126.3 (1.29x)
put_h264_qpel_4_mc31_8 339.1 191.1 (1.77x)
put_h264_qpel_4_mc32_8 533.6 348.4 (1.53x)
put_h264_qpel_4_mc33_8 339.4 181.8 (1.87x)
put_h264_qpel_8_mc00_8 98.6 33.6 (2.93x) 92.3 40.1 (2.30x)
put_h264_qpel_8_mc01_8 737.1 153.8 (4.79x) 738.1 144.3 (5.12x)
put_h264_qpel_8_mc02_8 663.1 135.3 (4.90x) 665.1 134.1 (4.96x)
put_h264_qpel_8_mc03_8 737.4 154.1 (4.79x) 1508.8 144.3 (10.46x)
put_h264_qpel_8_mc10_8 598.4 237.1 (2.52x) 592.3 227.6 (2.60x)
put_h264_qpel_8_mc11_8 1172.3 357.9 (3.28x) 1175.6 342.3 (3.43x)
put_h264_qpel_8_mc12_8 1867.1 589.1 (3.17x) 1863.1 561.1 (3.32x)
put_h264_qpel_8_mc13_8 1172.6 366.9 (3.20x) 1175.6 352.8 (3.33x)
put_h264_qpel_8_mc20_8 450.4 218.8 (2.06x) 446.3 206.8 (2.16x)
put_h264_qpel_8_mc21_8 1672.3 663.1 (2.52x) 1675.6 633.8 (2.64x)
put_h264_qpel_8_mc22_8 1144.6 1200.1 (0.95x) 1144.3 425.6 (2.69x)
put_h264_qpel_8_mc23_8 1672.6 672.4 (2.49x) 1665.3 634.1 (2.63x)
put_h264_qpel_8_mc30_8 598.6 237.3 (2.52x) 613.1 227.6 (2.69x)
put_h264_qpel_8_mc31_8 1172.3 376.1 (3.12x) 1175.6 352.6 (3.33x)
put_h264_qpel_8_mc32_8 1857.8 598.6 (3.10x) 1863.1 561.1 (3.32x)
put_h264_qpel_8_mc33_8 1172.3 376.1 (3.12x) 1175.6 352.8 (3.33x)
put_h264_qpel_16_mc00_8 320.6 61.4 (5.22x) 321.3 60.8 (5.28x)
put_h264_qpel_16_mc01_8 2774.3 339.1 (8.18x) 2759.1 279.8 (9.86x)
put_h264_qpel_16_mc02_8 2589.1 320.6 (8.08x) 2571.6 269.3 (9.55x)
put_h264_qpel_16_mc03_8 2774.3 339.4 (8.17x) 2738.1 290.1 (9.44x)
put_h264_qpel_16_mc10_8 2274.3 487.4 (4.67x) 2290.1 436.1 (5.25x)
put_h264_qpel_16_mc11_8 5237.1 792.9 (6.60x) 4529.8 685.8 (6.61x)
put_h264_qpel_16_mc12_8 7357.6 1255.8 (5.86x) 7352.8 1040.1 (7.07x)
put_h264_qpel_16_mc13_8 4579.9 792.9 (5.78x) 4571.6 686.1 (6.66x)
put_h264_qpel_16_mc20_8 1802.1 459.6 (3.92x) 1800.6 425.6 (4.23x)
put_h264_qpel_16_mc21_8 6644.6 2246.6 (2.96x) 6644.3 1196.6 (5.55x)
put_h264_qpel_16_mc22_8 4589.1 913.4 (5.02x) 4592.3 769.3 (5.97x)
put_h264_qpel_16_mc23_8 6644.6 1394.6 (4.76x) 6634.1 1196.6 (5.54x)
put_h264_qpel_16_mc30_8 2274.3 496.6 (4.58x) 2290.1 456.8 (5.01x)
put_h264_qpel_16_mc31_8 5255.6 802.1 (6.55x) 4550.8 706.8 (6.44x)
put_h264_qpel_16_mc32_8 7376.1 1265.1 (5.83x) 7352.8 1050.6 (7.00x)
put_h264_qpel_16_mc33_8 4579.9 802.1 (5.71x) 4561.1 696.3 (6.55x)
Signed-off-by: Niklas Haas <git@haasn.dev>
Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-09-28 18:35:35 +02:00
Rémi Denis-Courmont
6611bf5484
lavc/h264dsp: optimise R-V V biweight for shorter heights
...
T-Head C908:
h264_biweight2_8_c: 313.7 ( 1.00x)
h264_biweight2_8_rvv_i32: before 239.5 ( 1.23x)
h264_biweight2_8_rvv_i32: after 72.7 ( 4.31x)
h264_biweight4_8_c: 582.0 ( 1.00x)
h264_biweight4_8_rvv_i32: before 471.0 ( 1.16x)
h264_biweight4_8_rvv_i32: after 91.5 ( 6.36x)
h264_biweight8_8_c: 1110.0 ( 1.00x)
h264_biweight8_8_rvv_i32: before 943.3 ( 1.10x)
h264_biweight8_8_rvv_i64: after 147.0 ( 7.55x)
SpacemiT X60:
h264_biweight2_8_c: 311.4 ( 1.00x)
h264_biweight2_8_rvv_i32: before 363.1 ( 0.83x)
h264_biweight2_8_rvv_i32: after 103.1 ( 3.02x)
h264_biweight4_8_c: 571.9 ( 1.00x)
h264_biweight4_8_rvv_i32: before 717.4 ( 0.78x)
h264_biweight4_8_rvv_i32: after 71.8 ( 7.96x)
h264_biweight8_8_c: 1103.1 ( 1.00x)
h264_biweight8_8_rvv_i32: before 1415.2 ( 0.76x)
h264_biweight8_8_rvv_i64: ater 92.8 (11.88x)
2024-09-24 20:04:51 +03:00
Rémi Denis-Courmont
459a1512f1
lavc/h264dsp: unroll R-V V weight16
...
As VLSE128.V does not exist, we have no other way to deal with latency.
T-Head C908:
h264_weight16_8_c: 989.4 ( 1.00x)
h264_weight16_8_rvv_i32: 193.2 ( 5.12x)
SpacemiT X60:
h264_weight16_8_c: 874.1 ( 1.00x)
h264_weight16_8_rvv_i32: 196.9 ( 4.44x)
2024-09-24 20:04:51 +03:00
Rémi Denis-Courmont
4936bb2508
lavc/h264dsp: optimise R-V V weight for shorter heights
...
The height is a power of two of up to 16 rows. The current code was
optimised for large sample counts.
T-Head C908:
h264_weight2_8_c: 211.7 ( 1.00x)
h264_weight2_8_rvv_i32: before 184.0 ( 1.15x)
h264_weight2_8_rvv_i32: after 54.2 ( 3.90x)
h264_weight4_8_c: 285.7 ( 1.00x)
h264_weight4_8_rvv_i32: before 341.2 ( 0.86x)
h264_weight4_8_rvv_i32: after 82.2 ( 3.47x)
h264_weight8_8_c: 498.7 ( 1.00x)
h264_weight8_8_rvv_i32: before 683.7 ( 0.73x)
h264_weight8_8_rvv_i64: after 128.5 ( 3.95x)
h264_weight16_8_c: 878.2 ( 1.00x)
h264_weight16_8_rvv_i32: unchanged 239.5 ( 3.67x)
SpacemiT X60:
h264_weight2_8_c: 207.2 ( 1.00x)
h264_weight2_8_rvv_i32: before 259.6 ( 0.80x)
h264_weight2_8_rvv_i32: after 82.2 ( 2.52x)
h264_weight4_8_c: 290.8 ( 1.00x)
h264_weight4_8_rvv_i32: before 509.6 ( 0.57x)
h264_weight4_8_rvv_i32: after 61.5 ( 4.73x)
h264_weight8_8_c: 498.8 ( 1.00x)
h264_weight8_8_rvv_i32: before 1019.8 ( 0.49x)
h264_weight8_8_rvv_i64: after 71.8 ( 6.95x)
h264_weight16_8_c: 874.0 ( 1.00x)
h264_weight16_8_rvv_i32: unchanged 249.0 ( 3.51x)
2024-09-24 20:04:51 +03:00
sunyuechi
ba7d0d5fc3
lavc/vvc_mc: R-V V avg w_avg
...
C908 X60
avg_8_2x2_c : 1.2 1.0
avg_8_2x2_rvv_i32 : 0.7 0.7
avg_8_2x4_c : 2.0 2.2
avg_8_2x4_rvv_i32 : 1.2 1.2
avg_8_2x8_c : 3.7 4.0
avg_8_2x8_rvv_i32 : 1.7 1.5
avg_8_2x16_c : 7.2 7.7
avg_8_2x16_rvv_i32 : 3.0 2.7
avg_8_2x32_c : 14.2 15.2
avg_8_2x32_rvv_i32 : 5.5 5.0
avg_8_2x64_c : 51.0 43.7
avg_8_2x64_rvv_i32 : 39.2 29.7
avg_8_2x128_c : 100.5 79.2
avg_8_2x128_rvv_i32 : 79.7 68.2
avg_8_4x2_c : 1.7 2.0
avg_8_4x2_rvv_i32 : 1.0 0.7
avg_8_4x4_c : 3.5 3.7
avg_8_4x4_rvv_i32 : 1.2 1.2
avg_8_4x8_c : 6.7 7.0
avg_8_4x8_rvv_i32 : 1.7 1.5
avg_8_4x16_c : 13.5 14.0
avg_8_4x16_rvv_i32 : 3.0 2.7
avg_8_4x32_c : 26.2 27.7
avg_8_4x32_rvv_i32 : 5.5 4.7
avg_8_4x64_c : 73.0 73.7
avg_8_4x64_rvv_i32 : 39.0 32.5
avg_8_4x128_c : 143.0 137.2
avg_8_4x128_rvv_i32 : 72.7 68.0
avg_8_8x2_c : 3.5 3.5
avg_8_8x2_rvv_i32 : 1.0 0.7
avg_8_8x4_c : 6.2 6.5
avg_8_8x4_rvv_i32 : 1.5 1.0
avg_8_8x8_c : 12.7 13.2
avg_8_8x8_rvv_i32 : 2.0 1.5
avg_8_8x16_c : 25.0 26.5
avg_8_8x16_rvv_i32 : 3.2 2.7
avg_8_8x32_c : 50.0 52.7
avg_8_8x32_rvv_i32 : 6.2 5.0
avg_8_8x64_c : 118.7 122.5
avg_8_8x64_rvv_i32 : 40.2 31.5
avg_8_8x128_c : 236.7 220.2
avg_8_8x128_rvv_i32 : 85.2 67.7
avg_8_16x2_c : 6.2 6.7
avg_8_16x2_rvv_i32 : 1.2 0.7
avg_8_16x4_c : 12.5 13.0
avg_8_16x4_rvv_i32 : 1.7 1.0
avg_8_16x8_c : 24.5 26.0
avg_8_16x8_rvv_i32 : 3.0 1.7
avg_8_16x16_c : 49.0 51.5
avg_8_16x16_rvv_i32 : 5.5 3.0
avg_8_16x32_c : 97.5 102.5
avg_8_16x32_rvv_i32 : 10.5 5.5
avg_8_16x64_c : 213.7 222.0
avg_8_16x64_rvv_i32 : 48.5 34.2
avg_8_16x128_c : 434.7 420.0
avg_8_16x128_rvv_i32 : 97.7 74.0
avg_8_32x2_c : 12.2 12.7
avg_8_32x2_rvv_i32 : 1.5 1.0
avg_8_32x4_c : 24.5 25.5
avg_8_32x4_rvv_i32 : 3.0 1.7
avg_8_32x8_c : 48.5 50.7
avg_8_32x8_rvv_i32 : 5.2 2.7
avg_8_32x16_c : 96.7 101.2
avg_8_32x16_rvv_i32 : 10.2 5.0
avg_8_32x32_c : 192.7 202.2
avg_8_32x32_rvv_i32 : 19.7 9.5
avg_8_32x64_c : 427.5 426.5
avg_8_32x64_rvv_i32 : 64.2 18.2
avg_8_32x128_c : 816.5 821.0
avg_8_32x128_rvv_i32 : 135.2 75.5
avg_8_64x2_c : 24.0 25.2
avg_8_64x2_rvv_i32 : 2.7 1.5
avg_8_64x4_c : 48.2 50.5
avg_8_64x4_rvv_i32 : 5.0 2.7
avg_8_64x8_c : 96.0 100.7
avg_8_64x8_rvv_i32 : 9.7 4.5
avg_8_64x16_c : 207.7 201.2
avg_8_64x16_rvv_i32 : 19.0 9.0
avg_8_64x32_c : 383.2 402.0
avg_8_64x32_rvv_i32 : 37.5 17.5
avg_8_64x64_c : 837.2 828.7
avg_8_64x64_rvv_i32 : 84.7 35.5
avg_8_64x128_c : 1640.7 1640.2
avg_8_64x128_rvv_i32 : 206.0 153.0
avg_8_128x2_c : 48.7 51.0
avg_8_128x2_rvv_i32 : 5.2 2.7
avg_8_128x4_c : 96.7 101.5
avg_8_128x4_rvv_i32 : 10.2 5.0
avg_8_128x8_c : 192.2 202.0
avg_8_128x8_rvv_i32 : 19.7 9.2
avg_8_128x16_c : 400.7 403.2
avg_8_128x16_rvv_i32 : 38.7 18.5
avg_8_128x32_c : 786.7 805.7
avg_8_128x32_rvv_i32 : 77.0 36.2
avg_8_128x64_c : 1615.5 1655.5
avg_8_128x64_rvv_i32 : 189.7 80.7
avg_8_128x128_c : 3182.0 3238.0
avg_8_128x128_rvv_i32 : 397.5 308.5
w_avg_8_2x2_c : 1.7 1.2
w_avg_8_2x2_rvv_i32 : 1.2 1.0
w_avg_8_2x4_c : 2.7 2.7
w_avg_8_2x4_rvv_i32 : 1.7 1.5
w_avg_8_2x8_c : 21.7 4.7
w_avg_8_2x8_rvv_i32 : 2.7 2.5
w_avg_8_2x16_c : 9.5 9.2
w_avg_8_2x16_rvv_i32 : 4.7 4.2
w_avg_8_2x32_c : 19.0 18.7
w_avg_8_2x32_rvv_i32 : 9.0 8.0
w_avg_8_2x64_c : 62.0 50.2
w_avg_8_2x64_rvv_i32 : 47.7 33.5
w_avg_8_2x128_c : 116.7 87.7
w_avg_8_2x128_rvv_i32 : 80.0 69.5
w_avg_8_4x2_c : 2.5 2.5
w_avg_8_4x2_rvv_i32 : 1.2 1.0
w_avg_8_4x4_c : 4.7 4.5
w_avg_8_4x4_rvv_i32 : 1.7 1.7
w_avg_8_4x8_c : 9.0 8.7
w_avg_8_4x8_rvv_i32 : 2.7 2.5
w_avg_8_4x16_c : 17.7 17.5
w_avg_8_4x16_rvv_i32 : 4.7 4.2
w_avg_8_4x32_c : 35.0 35.0
w_avg_8_4x32_rvv_i32 : 9.0 8.0
w_avg_8_4x64_c : 100.5 84.5
w_avg_8_4x64_rvv_i32 : 42.2 33.7
w_avg_8_4x128_c : 203.5 151.2
w_avg_8_4x128_rvv_i32 : 83.0 69.5
w_avg_8_8x2_c : 4.5 4.5
w_avg_8_8x2_rvv_i32 : 1.2 1.2
w_avg_8_8x4_c : 8.7 8.7
w_avg_8_8x4_rvv_i32 : 2.0 1.7
w_avg_8_8x8_c : 17.0 17.0
w_avg_8_8x8_rvv_i32 : 3.2 2.5
w_avg_8_8x16_c : 34.0 33.5
w_avg_8_8x16_rvv_i32 : 5.5 4.2
w_avg_8_8x32_c : 86.0 67.5
w_avg_8_8x32_rvv_i32 : 10.5 8.0
w_avg_8_8x64_c : 187.2 149.5
w_avg_8_8x64_rvv_i32 : 45.0 35.5
w_avg_8_8x128_c : 342.7 290.0
w_avg_8_8x128_rvv_i32 : 108.7 70.2
w_avg_8_16x2_c : 8.5 8.2
w_avg_8_16x2_rvv_i32 : 2.0 1.2
w_avg_8_16x4_c : 16.7 16.7
w_avg_8_16x4_rvv_i32 : 3.0 1.7
w_avg_8_16x8_c : 33.2 33.5
w_avg_8_16x8_rvv_i32 : 5.5 3.0
w_avg_8_16x16_c : 66.2 66.7
w_avg_8_16x16_rvv_i32 : 10.5 5.0
w_avg_8_16x32_c : 132.5 131.0
w_avg_8_16x32_rvv_i32 : 20.0 9.7
w_avg_8_16x64_c : 340.0 283.5
w_avg_8_16x64_rvv_i32 : 60.5 37.2
w_avg_8_16x128_c : 641.2 597.5
w_avg_8_16x128_rvv_i32 : 118.7 77.7
w_avg_8_32x2_c : 16.5 16.7
w_avg_8_32x2_rvv_i32 : 3.2 1.7
w_avg_8_32x4_c : 33.2 33.2
w_avg_8_32x4_rvv_i32 : 5.5 2.7
w_avg_8_32x8_c : 66.0 62.5
w_avg_8_32x8_rvv_i32 : 10.5 5.0
w_avg_8_32x16_c : 131.5 132.0
w_avg_8_32x16_rvv_i32 : 20.2 9.5
w_avg_8_32x32_c : 261.7 272.0
w_avg_8_32x32_rvv_i32 : 39.7 18.0
w_avg_8_32x64_c : 575.2 545.5
w_avg_8_32x64_rvv_i32 : 105.5 58.7
w_avg_8_32x128_c : 1154.2 1088.0
w_avg_8_32x128_rvv_i32 : 207.0 98.0
w_avg_8_64x2_c : 33.0 33.0
w_avg_8_64x2_rvv_i32 : 6.2 2.7
w_avg_8_64x4_c : 65.5 66.0
w_avg_8_64x4_rvv_i32 : 11.5 5.0
w_avg_8_64x8_c : 131.2 132.5
w_avg_8_64x8_rvv_i32 : 22.5 9.5
w_avg_8_64x16_c : 268.2 262.5
w_avg_8_64x16_rvv_i32 : 44.2 18.0
w_avg_8_64x32_c : 561.5 528.7
w_avg_8_64x32_rvv_i32 : 88.0 35.2
w_avg_8_64x64_c : 1136.2 1124.0
w_avg_8_64x64_rvv_i32 : 222.0 82.2
w_avg_8_64x128_c : 2345.0 2312.7
w_avg_8_64x128_rvv_i32 : 423.0 190.5
w_avg_8_128x2_c : 65.7 66.5
w_avg_8_128x2_rvv_i32 : 11.2 5.5
w_avg_8_128x4_c : 131.2 132.2
w_avg_8_128x4_rvv_i32 : 22.0 10.2
w_avg_8_128x8_c : 263.5 312.0
w_avg_8_128x8_rvv_i32 : 43.2 19.7
w_avg_8_128x16_c : 528.7 526.2
w_avg_8_128x16_rvv_i32 : 85.5 39.5
w_avg_8_128x32_c : 1067.7 1062.7
w_avg_8_128x32_rvv_i32 : 171.7 78.2
w_avg_8_128x64_c : 2234.7 2168.7
w_avg_8_128x64_rvv_i32 : 400.0 159.0
w_avg_8_128x128_c : 4752.5 4295.0
w_avg_8_128x128_rvv_i32 : 757.7 365.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-09-24 20:04:51 +03:00
Anton Khirnov
3f9ca51015
lavc/opus*: move to opus/ subdir
2024-09-02 11:56:53 +02:00
Ramiro Polla
6aafe61285
avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t
2024-09-01 13:42:30 +02:00
Rémi Denis-Courmont
7d1dda4892
lavc/h264dsp: R-V V loop_filter_chroma
...
T-Head C908:
h264_v_loop_filter_chroma_8bpp_c: 137.4
h264_v_loop_filter_chroma_8bpp_rvv_i32: 54.2
2024-09-01 10:58:48 +03:00
Rémi Denis-Courmont
3a53656837
lavc/h264dsp: do not write back unmodified rows in R-V V loop filter
2024-09-01 10:52:26 +03:00
Rémi Denis-Courmont
d8fb44c0aa
lavc/mpegvideoencdsp: R-V V add_8x8basis
...
T-Head C908:
add_8x8basis_c: 440.6
add_8x8basis_rvv_i32: 70.3
SpacemiT X60:
add_8x8basis_c: 436.3
add_8x8basis_rvv_i32: 40.5
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
1907dd7f23
lavc/mpegvideoencdsp: R-V V try_8x8basis
...
T-Head C908:
try_8x8basis_c: 922.5
try_8x8basis_rvv_i32: 135.3
SpacemiT X60:
try_8x8basis_c: 926.1
try_8x8basis_rvv_i32: 103.1
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
0fd37c00d7
lavc/mpegvideoencdsp: R-V V pix_norm1
...
T-Head C908:
pix_norm1_c: 480.2
pix_norm1_rvv_i64: 146.9
SpacemiT X60:
pix_norm1_c: 478.2
pix_norm1_rvv_i64: 92.7
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
63d016aea5
lavc/mpegvideoencdsp: R-V V pix_sum
...
T-Head C908:
pix_sum_c: 332.2
pix_sum_rvv_i64: 91.2
SpacemiT X60:
pix_sum_c: 321.2
pix_sum_rvv_i64: 60.9
2024-08-19 22:41:13 +03:00
sunyuechi
4e7b5ac48f
lavc/vp9dsp: R-V V mc bilin hv
...
C908 X60
vp9_avg_bilin_4hv_8bpp_c : 10.7 9.5
vp9_avg_bilin_4hv_8bpp_rvv_i32 : 4.0 3.5
vp9_avg_bilin_8hv_8bpp_c : 38.5 34.2
vp9_avg_bilin_8hv_8bpp_rvv_i32 : 7.2 6.5
vp9_avg_bilin_16hv_8bpp_c : 147.2 130.5
vp9_avg_bilin_16hv_8bpp_rvv_i32 : 14.5 12.7
vp9_avg_bilin_32hv_8bpp_c : 574.2 509.7
vp9_avg_bilin_32hv_8bpp_rvv_i32 : 42.5 38.0
vp9_avg_bilin_64hv_8bpp_c : 2321.2 2017.7
vp9_avg_bilin_64hv_8bpp_rvv_i32 : 163.5 131.0
vp9_put_bilin_4hv_8bpp_c : 10.0 8.7
vp9_put_bilin_4hv_8bpp_rvv_i32 : 3.5 3.0
vp9_put_bilin_8hv_8bpp_c : 35.2 31.2
vp9_put_bilin_8hv_8bpp_rvv_i32 : 6.5 5.7
vp9_put_bilin_16hv_8bpp_c : 134.0 119.0
vp9_put_bilin_16hv_8bpp_rvv_i32 : 12.7 11.5
vp9_put_bilin_32hv_8bpp_c : 538.5 464.2
vp9_put_bilin_32hv_8bpp_rvv_i32 : 39.7 35.2
vp9_put_bilin_64hv_8bpp_c : 2111.7 1833.2
vp9_put_bilin_64hv_8bpp_rvv_i32 : 138.5 122.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-08-19 22:29:20 +03:00