Commit graph

13 commits

Author SHA1 Message Date
Timo Rothenpieler
262d41c804 all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
Rémi Denis-Courmont
3152c684cb lavc/vc1dsp: R-V V vc1_inv_trans_4x4
T-Head C908 (cycles):
vc1dsp.vc1_inv_trans_4x4_c: 310.7
vc1dsp.vc1_inv_trans_4x4_rvv_i32: 120.0

We could use 1 `vlseg4e64.v` instead of 4 `vle16.v`, but that seems to
be about 7% slower.
2024-06-07 17:53:05 +03:00
Rémi Denis-Courmont
6ffa639c8a lavc/vc1dsp: R-V V vc1_inv_trans_4x8
T-Head C908 (cycles):
vc1dsp.vc1_inv_trans_4x8_c: 653.2
vc1dsp.vc1_inv_trans_4x8_rvv_i32: 234.0
2024-06-07 17:53:05 +03:00
Rémi Denis-Courmont
a169f3bca5 lavc/vc1dsp: R-V V vc1_inv_trans_8x4
T-Head C908 (cycles):
vc1dsp.vc1_inv_trans_8x4_c:       626.2
vc1dsp.vc1_inv_trans_8x4_rvv_i32: 215.2
2024-06-07 17:53:05 +03:00
Rémi Denis-Courmont
04397a29de lavc/vc1dsp: R-V V vc1_inv_trans_8x8
T-Head C908 (cycles):
vc1dsp.vc1_inv_trans_8x8_c:       871.7
vc1dsp.vc1_inv_trans_8x8_rvv_i32: 286.7
2024-06-07 17:53:05 +03:00
Rémi Denis-Courmont
6c6bec04f3 lavc/vc1dsp: fix R-V V avg_mspel_pixels
The 8x8 pixel arrays are not necessarily aligned to 64 bits, so the
current code leads to Bus error on real hardware. This reproducible
with FATE's vc1_ilaced_twomv test case.

The new "pessimist" code can trivially be shared for 16x16 pixel
arrays so we also do that. FWIW, this also nominally reduces the
hardware requirement from Zve64x to Zve32x.

T-Head C908:
vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_c:      14.7
vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_rvv_i32: 3.5
vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_c:       3.7
vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_rvv_i32: 1.5

SpacemiT X60:
vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_c:      13.0
vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_rvv_i32: 3.0
vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_c:       3.2
vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_rvv_i32: 1.2
2024-06-02 10:37:09 +03:00
Rémi Denis-Courmont
d452db8410 lavc/vc1dsp: R-V V vc1_unescape_buffer
Notes:
- The loop is biased toward no unescaped bytes as that should be most common.
- The input byte array is slid rather than the (8 times smaller) bit-mask,
  as RISC-V V does not provide a bit-mask (or bit-wise) slide instruction.
- There are two comparisons with 0 per iteration, for the same reason.
- In case of match, bytes are copied until the first match, and the loop is
  restarted after the escape byte. Vector compression (vcompress.vm) could
  discard all escape bytes but that is slower if escape bytes are rare.

Further optimisations should be possible, e.g.:
- processing 2 bytes fewer per iteration to get rid of a 2 slides,
- taking a short cut if the input vector contains less than 2 zeroes.
But this is a good starting point:

T-Head C908:
vc1dsp.vc1_unescape_buffer_c:      12749.5
vc1dsp.vc1_unescape_buffer_rvv_i32: 6009.0

SpacemiT X60:
vc1dsp.vc1_unescape_buffer_c:      11038.0
vc1dsp.vc1_unescape_buffer_rvv_i32: 2061.0
2024-05-21 21:16:30 +03:00
Rémi Denis-Courmont
fa47299516 lavc/startcode: add R-V V startcode_find_candidate 2024-05-19 10:03:49 +03:00
Rémi Denis-Courmont
4ad5b9c8db lavc/startcode: add R-V Zbb startcode_find_candidate
The main loop processes 8 bytes in 5 instructions.
For comparison, the optimal plain strnlen() requires 4 instructions per
byte (6.4x worse): LBU; ADDI; BEQZ; BNE. The current libavcodec C code
involves 5 instructions per byte (8x worse). Actual benchmarks may be
slightly less favourable due to latency from ORC.B to BNE.
2024-05-19 10:03:49 +03:00
sunyuechi
d4083ecb7c lavc/vc1dsp: R-V V mspel_pixels
C908 X60
vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_c            :  14.7 13.2
vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_rvv_i32      :   2.5  2.2
vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_c            :   3.7  3.5
vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_rvv_i64      :   1.0  1.2
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c            :   9.0  8.0
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi          :   1.0  1.0
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c            :   2.5  2.2
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi          :   0.5  0.5

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-05-16 17:08:18 +03:00
Rémi Denis-Courmont
cdcb4b98b7 lavc/riscv: use ff_rv_vlen_least() 2024-05-13 18:36:07 +03:00
Martin Storsjö
b51d9eb58e riscv: vc1dsp: Don't check vlenb before checking the CPU flags
We can't call ff_get_rv_vlenb() if we don't have RVV available
at all.

Acked-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-16 22:30:26 +02:00
sunyuechi
0b9d009b4a lavc/vc1dsp: R-V V inv_trans
C908:
vc1dsp.vc1_inv_trans_4x4_dc_c:      125.7
vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 53.5
vc1dsp.vc1_inv_trans_4x8_dc_c:      230.7
vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 65.5
vc1dsp.vc1_inv_trans_8x4_dc_c:      228.7
vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 64.5
vc1dsp.vc1_inv_trans_8x8_dc_c:      476.5
vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 80.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-08 17:20:48 +02:00