Commit graph

6 commits

Author SHA1 Message Date
Rémi Denis-Courmont
ca664f2254 lavc/flacdsp: R-V V LPC16 function
In this case, the inner loop computing the scalar product can be reduced
to just one multiplication and one sum even with 128-bit vectors. The
result is a lot simpler, but also brings more modest performance gains:

flac_lpc_16_13_c:       15241.0
flac_lpc_16_13_rvv_i32: 11230.0
flac_lpc_16_16_c:       17884.0
flac_lpc_16_16_rvv_i32: 12125.7
flac_lpc_16_29_c:       27847.7
flac_lpc_16_29_rvv_i32: 10494.0
flac_lpc_16_32_c:       30051.5
flac_lpc_16_32_rvv_i32: 10355.0
2023-11-18 22:06:57 +02:00
Rémi Denis-Courmont
295092b46d lavc/flacdsp: R-V V LPC32
The entire set of 32 coefficients and corresponding past 32 samples can
fit in a single vector (with LMUL=8) exactly, but... since widening
double the needed vector sizes, we still end up too short with 128-bit
vectors. This adds a very simple version for future 256+-bit hardware,
and for pred_orders values up to 16, and a bit more involved loop for
for 128-bit hardware with pred_orders between 17 and 32.

With 128-bit hardware, the benchmarks look like this:
flac_lpc_32_13_c:       30152.0
flac_lpc_32_13_rvv_i32: 10244.7
flac_lpc_32_16_c:       37314.2
flac_lpc_32_16_rvv_i32: 10126.2
flac_lpc_32_29_c:       61910.0
flac_lpc_32_29_rvv_i32: 14495.2
flac_lpc_32_32_c:       68204.0
flac_lpc_32_32_rvv_i32: 13273.7
2023-11-18 22:05:43 +02:00
Rémi Denis-Courmont
07c303b708 lavc/flacdsp: R-V V decorrelate_indep 16-bit packed
flac_decorrelate_indep2_16_c:        981.7
flac_decorrelate_indep2_16_rvv_i32:  199.2
flac_decorrelate_indep4_16_c:       1749.7
flac_decorrelate_indep4_16_rvv_i32:  401.2
flac_decorrelate_indep6_16_c:       2517.7
flac_decorrelate_indep6_16_rvv_i32:  858.0
flac_decorrelate_indep8_16_c:       3285.7
flac_decorrelate_indep8_16_rvv_i32: 1123.5
2023-11-17 23:59:56 +02:00
Rémi Denis-Courmont
fb0295e5fd lavc/flacdsp: R-V V decorrelate_indep 32-bit packed
flac_decorrelate_indep2_32_c:       981.7
flac_decorrelate_indep2_32_rvv_i32: 183.7
flac_decorrelate_indep4_32_c:      1749.7
flac_decorrelate_indep4_32_rvv_i32: 362.5
flac_decorrelate_indep6_32_c:      2517.7
flac_decorrelate_indep6_32_rvv_i32: 715.2
flac_decorrelate_indep8_32_c:      3285.7
flac_decorrelate_indep8_32_rvv_i32: 909.0
2023-11-17 23:59:56 +02:00
Rémi Denis-Courmont
6183a69c0b lavc/flacdsp: R-V V decorrelate_ms packed
flac_decorrelate_ms_16_c:       585.5
flac_decorrelate_ms_16_rvv_i32: 263.0
flac_decorrelate_ms_32_c:       584.7
flac_decorrelate_ms_32_rvv_i32: 250.0
2023-11-17 23:59:23 +02:00
Rémi Denis-Courmont
636ae0e0bc lavc/flacdsp: R-V V packed decorrelate_{l,r}s
flac_decorrelate_ms_16_c:       457.2
flac_decorrelate_ms_16_rvv_i32: 203.0
flac_decorrelate_ms_32_c:       457.2
flac_decorrelate_ms_32_rvv_i32: 203.5
flac_decorrelate_rs_16_c:       456.2
flac_decorrelate_rs_16_rvv_i32: 207.0
flac_decorrelate_rs_32_c:       456.2
flac_decorrelate_rs_32_rvv_i32: 210.5
2023-11-17 23:59:22 +02:00