ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2025-12-08 06:09:50 +00:00

Author	SHA1	Message	Date
Rémi Denis-Courmont	616fdeaea3	lavc/riscv: depend on RVB and simplify accordingly There is no known (real) hardware with V and without the complete B extension. B was indeed required in the RISC-V application profile from 2022, earlier than V. There should not be any relevant hardware in the future either. In practice, different R-V Vector optimisations in FFmpeg already depend on every constituent of the B extension anyhow, so it would not work well.	2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont	a535ce2ac0	lavc/flacdsp: R-V Zvl256b lpc33 flac_lpc_33_13_c: 499.7 flac_lpc_33_13_rvv_i64: 197.7 flac_lpc_33_16_c: 601.5 flac_lpc_33_16_rvv_i64: 195.2 flac_lpc_33_29_c: 1011.5 flac_lpc_33_29_rvv_i64: 300.7 flac_lpc_33_32_c: 1099.0 flac_lpc_33_32_rvv_i64: 296.7	2024-05-27 22:07:29 +03:00
Rémi Denis-Courmont	233066e85a	lavc/flacdsp: optimise RVV vector type for lpc32 This is pretty much the same as for lpc16, though it only improves half as large prediction orders. With 128-bit vectors, this gives: C V old V new 1 69.2 181.5 95.5 2 107.7 180.7 95.2 3 145.5 180.0 103.5 4 183.0 179.2 102.7 5 220.7 178.5 128.0 6 257.7 194.0 127.5 7 294.5 193.7 126.7 8 331.0 193.0 126.5 Larger prediction orders see no significant changes at that size.	2024-05-19 18:37:33 +03:00
Rémi Denis-Courmont	6ab4b92e82	lavc/flacdsp: optimise RVV vector type for lpc16 This calculates the optimal vector type value at run-time based on the hardware vector length and the FLAC LPC prediction order. In this particular case, the additional computation is easily amortised over the loop iterations: T-Head C908: C V before V after 1 48.0 214.7 95.2 2 64.7 214.2 94.7 3 79.7 213.5 94.5 4 96.2 196.5 94.2 # 5 111.0 195.7 118.5 6 127.0 211.2 102.0 7 143.7 194.2 101.5 8 175.7 193.2 101.2 # 9 176.2 224.2 126.0 10 191.5 192.0 125.5 11 224.5 191.2 124.7 12 223.0 190.2 124.2 13 239.2 189.5 123.7 14 253.7 188.7 139.5 15 286.2 188.0 122.7 16 284.0 187.0 122.5 # 17 300.2 186.5 186.5 18 314.0 185.5 185.7 19 329.7 184.7 185.0 20 343.0 184.2 184.2 21 358.7 199.2 183.7 22 371.7 182.7 182.7 23 387.5 181.7 182.0 24 400.7 181.0 181.2 25 431.5 180.2 196.5 26 443.7 195.5 196.0 27 459.0 178.7 196.2 28 470.7 177.7 194.2 29 470.0 177.0 193.5 30 481.2 176.2 176.5 31 496.2 175.5 175.7 32 507.2 174.7 191.0 # # Power of two boundary. With 128-bit vectors, improvements are expected for the first two test cases only. For the other two, there is overhead but below noise. Improvements should be better observable with prediction order of 8 and less, or on hardware with larger vector sizes.	2024-05-19 18:37:33 +03:00
Rémi Denis-Courmont	88d973a5d6	lavc/flacdsp: R-V V flac_wasted33 T-Head C908: flac_wasted_33_c: 786.2 flac_wasted_33_rvv_i64: 486.5	2024-05-17 18:08:04 +03:00
Rémi Denis-Courmont	7b47099bc0	lavc/flacdsp: R-V V flac_wasted32 T-Head C908: flac_wasted_32_c: 949.0 flac_wasted_32_rvv_i32: 278.7	2024-05-15 20:04:08 +03:00
Rémi Denis-Courmont	a3e45063c0	lavc/flacdsp: fix CPU requirement for 32-bit LPC	2024-05-15 19:45:07 +03:00
Rémi Denis-Courmont	ca664f2254	lavc/flacdsp: R-V V LPC16 function In this case, the inner loop computing the scalar product can be reduced to just one multiplication and one sum even with 128-bit vectors. The result is a lot simpler, but also brings more modest performance gains: flac_lpc_16_13_c: 15241.0 flac_lpc_16_13_rvv_i32: 11230.0 flac_lpc_16_16_c: 17884.0 flac_lpc_16_16_rvv_i32: 12125.7 flac_lpc_16_29_c: 27847.7 flac_lpc_16_29_rvv_i32: 10494.0 flac_lpc_16_32_c: 30051.5 flac_lpc_16_32_rvv_i32: 10355.0	2023-11-18 22:06:57 +02:00
Rémi Denis-Courmont	295092b46d	lavc/flacdsp: R-V V LPC32 The entire set of 32 coefficients and corresponding past 32 samples can fit in a single vector (with LMUL=8) exactly, but... since widening double the needed vector sizes, we still end up too short with 128-bit vectors. This adds a very simple version for future 256+-bit hardware, and for pred_orders values up to 16, and a bit more involved loop for for 128-bit hardware with pred_orders between 17 and 32. With 128-bit hardware, the benchmarks look like this: flac_lpc_32_13_c: 30152.0 flac_lpc_32_13_rvv_i32: 10244.7 flac_lpc_32_16_c: 37314.2 flac_lpc_32_16_rvv_i32: 10126.2 flac_lpc_32_29_c: 61910.0 flac_lpc_32_29_rvv_i32: 14495.2 flac_lpc_32_32_c: 68204.0 flac_lpc_32_32_rvv_i32: 13273.7	2023-11-18 22:05:43 +02:00
Rémi Denis-Courmont	07c303b708	lavc/flacdsp: R-V V decorrelate_indep 16-bit packed flac_decorrelate_indep2_16_c: 981.7 flac_decorrelate_indep2_16_rvv_i32: 199.2 flac_decorrelate_indep4_16_c: 1749.7 flac_decorrelate_indep4_16_rvv_i32: 401.2 flac_decorrelate_indep6_16_c: 2517.7 flac_decorrelate_indep6_16_rvv_i32: 858.0 flac_decorrelate_indep8_16_c: 3285.7 flac_decorrelate_indep8_16_rvv_i32: 1123.5	2023-11-17 23:59:56 +02:00
Rémi Denis-Courmont	fb0295e5fd	lavc/flacdsp: R-V V decorrelate_indep 32-bit packed flac_decorrelate_indep2_32_c: 981.7 flac_decorrelate_indep2_32_rvv_i32: 183.7 flac_decorrelate_indep4_32_c: 1749.7 flac_decorrelate_indep4_32_rvv_i32: 362.5 flac_decorrelate_indep6_32_c: 2517.7 flac_decorrelate_indep6_32_rvv_i32: 715.2 flac_decorrelate_indep8_32_c: 3285.7 flac_decorrelate_indep8_32_rvv_i32: 909.0	2023-11-17 23:59:56 +02:00
Rémi Denis-Courmont	6183a69c0b	lavc/flacdsp: R-V V decorrelate_ms packed flac_decorrelate_ms_16_c: 585.5 flac_decorrelate_ms_16_rvv_i32: 263.0 flac_decorrelate_ms_32_c: 584.7 flac_decorrelate_ms_32_rvv_i32: 250.0	2023-11-17 23:59:23 +02:00
Rémi Denis-Courmont	636ae0e0bc	lavc/flacdsp: R-V V packed decorrelate_{l,r}s flac_decorrelate_ms_16_c: 457.2 flac_decorrelate_ms_16_rvv_i32: 203.0 flac_decorrelate_ms_32_c: 457.2 flac_decorrelate_ms_32_rvv_i32: 203.5 flac_decorrelate_rs_16_c: 456.2 flac_decorrelate_rs_16_rvv_i32: 207.0 flac_decorrelate_rs_32_c: 456.2 flac_decorrelate_rs_32_rvv_i32: 210.5	2023-11-17 23:59:22 +02:00

13 commits