ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2025-12-08 06:09:50 +00:00

Author	SHA1	Message	Date
Anton Khirnov	3f9ca51015	lavc/opus*: move to opus/ subdir	2024-09-02 11:56:53 +02:00
Rémi Denis-Courmont	616fdeaea3	lavc/riscv: depend on RVB and simplify accordingly There is no known (real) hardware with V and without the complete B extension. B was indeed required in the RISC-V application profile from 2022, earlier than V. There should not be any relevant hardware in the future either. In practice, different R-V Vector optimisations in FFmpeg already depend on every constituent of the B extension anyhow, so it would not work well.	2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont	adc87a5f7c	lavc/opusdsp: rewrite R-V V postfilter This uses a more traditional approach allowing up processing of up to period minus two elements per iteration. This also allows the algorithm to work for all and any vector length. As the T-Head C908 device under test can load 16 elements loop, there is unsurprisingly a little performance drop when the period is minimal and the parallelism is capped at 13 elements: Before: postfilter_15_c: 21222.2 postfilter_15_rvv_f32: 22007.7 postfilter_512_c: 20189.7 postfilter_512_rvv_f32: 22004.2 postfilter_1022_c: 20189.7 postfilter_1022_rvv_f32: 22004.2 After: postfilter_15_c: 20189.5 postfilter_15_rvv_f32: 7057.2 postfilter_512_c: 20189.5 postfilter_512_rvv_f32: 5667.2 postfilter_1022_c: 20192.7 postfilter_1022_rvv_f32: 5667.2	2023-11-06 22:09:30 +02:00
Rémi Denis-Courmont	bfc69297c5	lavc/opusdsp: RISC-V V (512-bit) postfilter This adds a variant of the postfilter for use with 512-bit vectors. Half a vector is enough to perform the scalar product. Normally a whole vector would be used anyhow. Indeed fractional multiplers are no faster than the unit multipler. But in this particular function, a full vector makes up 16 samples, which would be loaded at each iteration of the outer loop. The minimum guaranteed CELT postfilter period is only 15. Accounting for the edges, we can only safely preload up to 13 samples. The fractional multipler is thus used to cap the selected vector length to a safe value of 8 elements or 256 bits. Likewise, we have the 1024-bit variant with the quarter multipler. In theory, a 2048-bit one would be possible with the eigth multipler, but that length is not even defined in the specifications as of yet, nor is it supported by any emulator - forget actual hardware.	2022-10-10 02:23:17 +02:00
Rémi Denis-Courmont	97d34befea	lavc/opusdsp: RISC-V V (256-bit) postfilter This adds a variant of the postfilter for use with 256-bit vectors. As a single vector is then large enough to perform the scalar product, the group multipler is reduced to just one at run-time. The different vector type is passed via register. Unfortunately, there is no VSETIVL instruction, so the constant vector size (5) also needs to be passed via a register.	2022-10-10 02:22:39 +02:00
Rémi Denis-Courmont	8009581912	lavc/opusdsp: RISC-V V (128-bit) postfilter This is implemented for a vector size of 128-bit. Since the scalar product in the inner loop covers 5 samples or 160 bits, we need a group multipler of 2. To avoid reconfiguring the vector type, the outer loop, which loads multiple input samples sticks to the same multipler. Consequently, the outer loop loads 8 samples per iteration. This is safe since the minimum period of the CELT codec is 15 samples. The same code would also work, albeit needlessly inefficiently with a vector length of 256 bits. A proper implementation will follow instead.	2022-10-10 02:22:10 +02:00

6 commits