ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-02-10 03:49:54 +00:00

Author	SHA1	Message	Date
Rémi Denis-Courmont	c536e92207	lavc/sbrdsp: R-V V hf_apply_noise functions This is restricted to 128-bit vectors as larger vector sizes could read past the end of the noise array. Support for future hardware with larger vector sizes is left for some other time. hf_apply_noise_0_c: 2319.7 hf_apply_noise_0_rvv_f32: 1229.0 hf_apply_noise_1_c: 2539.0 hf_apply_noise_1_rvv_f32: 1244.7 hf_apply_noise_2_c: 2319.7 hf_apply_noise_2_rvv_f32: 1232.7 hf_apply_noise_3_c: 2541.2 hf_apply_noise_3_rvv_f32: 1244.2	2023-11-13 18:34:29 +02:00
Rémi Denis-Courmont	5b33104fca	lavc/sbrdsp: R-V V hf_gen hf_gen_c: 2922.7 hf_gen_rvv_f32: 731.5	2023-11-13 18:33:02 +02:00
Rémi Denis-Courmont	cd7b352c53	lavc/sbrdsp: R-V V autocorrelate With 5 accumulator vectors and 6 inputs, this can only use LMUL=2. Also the number of vector loop iterations is small, just 5 on 128-bit vector hardware. The vector loop is somewhat unusual in that it processes data in descending memory order, in order to save on vector slides: in descending order, we can extract elements to carry over to the next iteration from the bottom of the vectors directly. With ascending order (see in the Opus postfilter function), there are no ways to get the top elements directly. On the downside, this requires the use of separate shift and sub (the would-be SH3SUB instruction does not exist), with a small pipeline stall on the vector load address. The edge cases in scalar are done in scalar as this saves on loads and remains significantly faster than C. autocorrelate_c: 669.2 autocorrelate_rvv_f32: 421.0	2023-11-12 14:03:09 +02:00
Rémi Denis-Courmont	f68ad5d2de	lavc/sbrdsp: R-V V sbr_hf_g_filt hf_g_filt_c: 1552.5 hf_g_filt_rvv_f32: 679.5	2023-11-06 19:42:49 +02:00
Rémi Denis-Courmont	d06fd18f8f	lavc/sbrdsp: R-V V neg_odd_64 With 128-bit vectors, this is mostly pointless but also harmless. Performance gains should be more noticeable with larger vector sizes. neg_odd_64_c: 76.2 neg_odd_64_rvv_i64: 74.7	2023-11-01 22:53:26 +02:00
Rémi Denis-Courmont	b0aba7dd0c	lavc/sbrdsp: R-V V sum_square sum_square_c: 803.5 sum_square_rvv_f32: 283.2	2023-11-01 22:53:26 +02:00
Rémi Denis-Courmont	86bee42473	lavc/sbrdsp: R-V V sum64x5 sum64x5_c: 385.0 sum64x5_rvv_f32: 116.0	2023-11-01 22:53:26 +02:00

7 commits