Commit graph

7 commits

Author SHA1 Message Date
Rémi Denis-Courmont
c536e92207 lavc/sbrdsp: R-V V hf_apply_noise functions
This is restricted to 128-bit vectors as larger vector sizes could read
past the end of the noise array. Support for future hardware with larger
vector sizes is left for some other time.

hf_apply_noise_0_c:       2319.7
hf_apply_noise_0_rvv_f32: 1229.0
hf_apply_noise_1_c:       2539.0
hf_apply_noise_1_rvv_f32: 1244.7
hf_apply_noise_2_c:       2319.7
hf_apply_noise_2_rvv_f32: 1232.7
hf_apply_noise_3_c:       2541.2
hf_apply_noise_3_rvv_f32: 1244.2
2023-11-13 18:34:29 +02:00
Rémi Denis-Courmont
5b33104fca lavc/sbrdsp: R-V V hf_gen
hf_gen_c:      2922.7
hf_gen_rvv_f32: 731.5
2023-11-13 18:33:02 +02:00
Rémi Denis-Courmont
cd7b352c53 lavc/sbrdsp: R-V V autocorrelate
With 5 accumulator vectors and 6 inputs, this can only use LMUL=2.
Also the number of vector loop iterations is small, just 5 on 128-bit
vector hardware.

The vector loop is somewhat unusual in that it processes data in
descending memory order, in order to save on vector slides:
in descending order, we can extract elements to carry over to the next
iteration from the bottom of the vectors directly. With ascending order
(see in the Opus postfilter function), there are no ways to get the top
elements directly. On the downside, this requires the use of separate
shift and sub (the would-be SH3SUB instruction does not exist), with
a small pipeline stall on the vector load address.

The edge cases in scalar are done in scalar as this saves on loads
and remains significantly faster than C.

autocorrelate_c: 669.2
autocorrelate_rvv_f32: 421.0
2023-11-12 14:03:09 +02:00
Rémi Denis-Courmont
f68ad5d2de lavc/sbrdsp: R-V V sbr_hf_g_filt
hf_g_filt_c:      1552.5
hf_g_filt_rvv_f32: 679.5
2023-11-06 19:42:49 +02:00
Rémi Denis-Courmont
d06fd18f8f lavc/sbrdsp: R-V V neg_odd_64
With 128-bit vectors, this is mostly pointless but also harmless.
Performance gains should be more noticeable with larger vector sizes.

neg_odd_64_c:       76.2
neg_odd_64_rvv_i64: 74.7
2023-11-01 22:53:26 +02:00
Rémi Denis-Courmont
b0aba7dd0c lavc/sbrdsp: R-V V sum_square
sum_square_c:       803.5
sum_square_rvv_f32: 283.2
2023-11-01 22:53:26 +02:00
Rémi Denis-Courmont
86bee42473 lavc/sbrdsp: R-V V sum64x5
sum64x5_c:       385.0
sum64x5_rvv_f32: 116.0
2023-11-01 22:53:26 +02:00