ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2025-12-08 06:09:50 +00:00

Author	SHA1	Message	Date
Rémi Denis-Courmont	616fdeaea3	lavc/riscv: depend on RVB and simplify accordingly There is no known (real) hardware with V and without the complete B extension. B was indeed required in the RISC-V application profile from 2022, earlier than V. There should not be any relevant hardware in the future either. In practice, different R-V Vector optimisations in FFmpeg already depend on every constituent of the B extension anyhow, so it would not work well.	2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont	06fc919aad	lavc/sbrdsp: add support for 256-bit vectors hf_apply_noise_0_c: 35.7 hf_apply_noise_0_rvv_f32: 9.5 hf_apply_noise_1_c: 38.5 hf_apply_noise_1_rvv_f32: 10.0 hf_apply_noise_2_c: 35.5 hf_apply_noise_2_rvv_f32: 9.7 hf_apply_noise_3_c: 38.5 hf_apply_noise_3_rvv_f32: 10.0 Maybe extending the noise table manually is not such great idea, but I not quite sure how to deal with that otherwise? Allocating the table dynamically is possible but would require an ELF destructor to clean up.	2024-05-31 22:22:43 +03:00
Rémi Denis-Courmont	0b2316e37f	lavc/sbrdsp: fix inverted boundary check 128-bit is the maximum, not the minimum here. Larger vector sizes can result in reads past the end of the noise value table. This partially reverts commit `cdcb4b98b7`.	2024-05-25 22:03:37 +03:00
Rémi Denis-Courmont	7591eb4055	Revert "lavc/sbrdsp: R-V V neg_odd_64" While this function can easily be written with vectors, it just fails to get any performance improvement. For reference, this is a simpler loop-free implementation that does get better performance than the current one depending on hardware, but still more or less the same metrics as the C code: func ff_sbr_neg_odd_64_rvv, zve64x li a1, 32 addi a0, a0, 7 li t0, 8 vsetvli zero, a1, e8, m2, ta, ma li t1, 0x80 vlse8.v v8, (a0), t0 vxor.vx v8, v8, t1 vsse8.v v8, (a0), t0 ret endfunc This reverts commit `d06fd18f8f`.	2024-05-21 21:26:39 +03:00
Rémi Denis-Courmont	cdcb4b98b7	lavc/riscv: use ff_rv_vlen_least()	2024-05-13 18:36:07 +03:00
Rémi Denis-Courmont	c536e92207	lavc/sbrdsp: R-V V hf_apply_noise functions This is restricted to 128-bit vectors as larger vector sizes could read past the end of the noise array. Support for future hardware with larger vector sizes is left for some other time. hf_apply_noise_0_c: 2319.7 hf_apply_noise_0_rvv_f32: 1229.0 hf_apply_noise_1_c: 2539.0 hf_apply_noise_1_rvv_f32: 1244.7 hf_apply_noise_2_c: 2319.7 hf_apply_noise_2_rvv_f32: 1232.7 hf_apply_noise_3_c: 2541.2 hf_apply_noise_3_rvv_f32: 1244.2	2023-11-13 18:34:29 +02:00
Rémi Denis-Courmont	5b33104fca	lavc/sbrdsp: R-V V hf_gen hf_gen_c: 2922.7 hf_gen_rvv_f32: 731.5	2023-11-13 18:33:02 +02:00
Rémi Denis-Courmont	cd7b352c53	lavc/sbrdsp: R-V V autocorrelate With 5 accumulator vectors and 6 inputs, this can only use LMUL=2. Also the number of vector loop iterations is small, just 5 on 128-bit vector hardware. The vector loop is somewhat unusual in that it processes data in descending memory order, in order to save on vector slides: in descending order, we can extract elements to carry over to the next iteration from the bottom of the vectors directly. With ascending order (see in the Opus postfilter function), there are no ways to get the top elements directly. On the downside, this requires the use of separate shift and sub (the would-be SH3SUB instruction does not exist), with a small pipeline stall on the vector load address. The edge cases in scalar are done in scalar as this saves on loads and remains significantly faster than C. autocorrelate_c: 669.2 autocorrelate_rvv_f32: 421.0	2023-11-12 14:03:09 +02:00
Rémi Denis-Courmont	f68ad5d2de	lavc/sbrdsp: R-V V sbr_hf_g_filt hf_g_filt_c: 1552.5 hf_g_filt_rvv_f32: 679.5	2023-11-06 19:42:49 +02:00
Rémi Denis-Courmont	d06fd18f8f	lavc/sbrdsp: R-V V neg_odd_64 With 128-bit vectors, this is mostly pointless but also harmless. Performance gains should be more noticeable with larger vector sizes. neg_odd_64_c: 76.2 neg_odd_64_rvv_i64: 74.7	2023-11-01 22:53:26 +02:00
Rémi Denis-Courmont	b0aba7dd0c	lavc/sbrdsp: R-V V sum_square sum_square_c: 803.5 sum_square_rvv_f32: 283.2	2023-11-01 22:53:26 +02:00
Rémi Denis-Courmont	86bee42473	lavc/sbrdsp: R-V V sum64x5 sum64x5_c: 385.0 sum64x5_rvv_f32: 116.0	2023-11-01 22:53:26 +02:00

12 commits