ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2025-12-08 06:09:50 +00:00

Author	SHA1	Message	Date
Timo Rothenpieler	262d41c804	all: fix typos found by codespell	2025-08-03 13:48:47 +02:00
Rémi Denis-Courmont	3152c684cb	lavc/vc1dsp: R-V V vc1_inv_trans_4x4 T-Head C908 (cycles): vc1dsp.vc1_inv_trans_4x4_c: 310.7 vc1dsp.vc1_inv_trans_4x4_rvv_i32: 120.0 We could use 1 `vlseg4e64.v` instead of 4 `vle16.v`, but that seems to be about 7% slower.	2024-06-07 17:53:05 +03:00
Rémi Denis-Courmont	6ffa639c8a	lavc/vc1dsp: R-V V vc1_inv_trans_4x8 T-Head C908 (cycles): vc1dsp.vc1_inv_trans_4x8_c: 653.2 vc1dsp.vc1_inv_trans_4x8_rvv_i32: 234.0	2024-06-07 17:53:05 +03:00
Rémi Denis-Courmont	a169f3bca5	lavc/vc1dsp: R-V V vc1_inv_trans_8x4 T-Head C908 (cycles): vc1dsp.vc1_inv_trans_8x4_c: 626.2 vc1dsp.vc1_inv_trans_8x4_rvv_i32: 215.2	2024-06-07 17:53:05 +03:00
Rémi Denis-Courmont	04397a29de	lavc/vc1dsp: R-V V vc1_inv_trans_8x8 T-Head C908 (cycles): vc1dsp.vc1_inv_trans_8x8_c: 871.7 vc1dsp.vc1_inv_trans_8x8_rvv_i32: 286.7	2024-06-07 17:53:05 +03:00
Rémi Denis-Courmont	6c6bec04f3	lavc/vc1dsp: fix R-V V avg_mspel_pixels The 8x8 pixel arrays are not necessarily aligned to 64 bits, so the current code leads to Bus error on real hardware. This reproducible with FATE's vc1_ilaced_twomv test case. The new "pessimist" code can trivially be shared for 16x16 pixel arrays so we also do that. FWIW, this also nominally reduces the hardware requirement from Zve64x to Zve32x. T-Head C908: vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_c: 14.7 vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_rvv_i32: 3.5 vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_c: 3.7 vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_rvv_i32: 1.5 SpacemiT X60: vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_c: 13.0 vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_rvv_i32: 3.0 vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_c: 3.2 vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_rvv_i32: 1.2	2024-06-02 10:37:09 +03:00
Rémi Denis-Courmont	d452db8410	lavc/vc1dsp: R-V V vc1_unescape_buffer Notes: - The loop is biased toward no unescaped bytes as that should be most common. - The input byte array is slid rather than the (8 times smaller) bit-mask, as RISC-V V does not provide a bit-mask (or bit-wise) slide instruction. - There are two comparisons with 0 per iteration, for the same reason. - In case of match, bytes are copied until the first match, and the loop is restarted after the escape byte. Vector compression (vcompress.vm) could discard all escape bytes but that is slower if escape bytes are rare. Further optimisations should be possible, e.g.: - processing 2 bytes fewer per iteration to get rid of a 2 slides, - taking a short cut if the input vector contains less than 2 zeroes. But this is a good starting point: T-Head C908: vc1dsp.vc1_unescape_buffer_c: 12749.5 vc1dsp.vc1_unescape_buffer_rvv_i32: 6009.0 SpacemiT X60: vc1dsp.vc1_unescape_buffer_c: 11038.0 vc1dsp.vc1_unescape_buffer_rvv_i32: 2061.0	2024-05-21 21:16:30 +03:00
Rémi Denis-Courmont	fa47299516	lavc/startcode: add R-V V startcode_find_candidate	2024-05-19 10:03:49 +03:00
Rémi Denis-Courmont	4ad5b9c8db	lavc/startcode: add R-V Zbb startcode_find_candidate The main loop processes 8 bytes in 5 instructions. For comparison, the optimal plain strnlen() requires 4 instructions per byte (6.4x worse): LBU; ADDI; BEQZ; BNE. The current libavcodec C code involves 5 instructions per byte (8x worse). Actual benchmarks may be slightly less favourable due to latency from ORC.B to BNE.	2024-05-19 10:03:49 +03:00
sunyuechi	d4083ecb7c	lavc/vc1dsp: R-V V mspel_pixels C908 X60 vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_c : 14.7 13.2 vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_rvv_i32 : 2.5 2.2 vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_c : 3.7 3.5 vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_rvv_i64 : 1.0 1.2 vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c : 9.0 8.0 vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi : 1.0 1.0 vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c : 2.5 2.2 vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi : 0.5 0.5 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2024-05-16 17:08:18 +03:00
Rémi Denis-Courmont	cdcb4b98b7	lavc/riscv: use ff_rv_vlen_least()	2024-05-13 18:36:07 +03:00
Martin Storsjö	b51d9eb58e	riscv: vc1dsp: Don't check vlenb before checking the CPU flags We can't call ff_get_rv_vlenb() if we don't have RVV available at all. Acked-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-16 22:30:26 +02:00
sunyuechi	0b9d009b4a	lavc/vc1dsp: R-V V inv_trans C908: vc1dsp.vc1_inv_trans_4x4_dc_c: 125.7 vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 53.5 vc1dsp.vc1_inv_trans_4x8_dc_c: 230.7 vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 65.5 vc1dsp.vc1_inv_trans_8x4_dc_c: 228.7 vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 64.5 vc1dsp.vc1_inv_trans_8x8_dc_c: 476.5 vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 80.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-12-08 17:20:48 +02:00

13 commits