ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-06-12 10:30:26 +00:00

Author	SHA1	Message	Date
sunyuechi	b41e115dde	lavc/me_cmp: R-V V pix_abs C908: pix_abs_0_0_c: 534.0 pix_abs_0_0_rvv_i32: 136.2 pix_abs_1_0_c: 287.7 pix_abs_1_0_rvv_i32: 125.2 sad_0_c: 534.0 sad_0_rvv_i32: 136.2 sad_1_c: 287.7 sad_1_rvv_i32: 125.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2024-02-21 20:08:25 +02:00
sunyuechi	c12053cefc	lavc/vp8dsp: R-V V vp8_idct_dc_add c908: vp8_idct_dc_add_c: 102.2 vp8_idct_dc_add_rvv_i32: 42.0 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2024-02-17 14:45:49 +02:00
sunyuechi	ee08974f90	lavc/rv34dsp: R-V V rv34_inv_transform_dc C908: rv34_inv_transform_dc_c: 35.5 rv34_inv_transform_dc_rvv_i32: 27.0 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2024-02-17 14:33:35 +02:00
sunyuechi	0748d2bbc7	lavc/blockdsp: R-V V clear_block C908: blockdsp.clear_block_c: 47.2 blockdsp.clear_block_rvv_i64: 28.5 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2024-02-12 22:00:03 +02:00
sunyuechi	8e23ebe6f9	lavc/svq1enc: R-V V ssd_int8_vs_int16 C908 ssd_int8_vs_int16_c: 207.7 ssd_int8_vs_int16_rvv_i32: 14.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2024-01-17 17:49:54 +02:00
sunyuechi	864174dd00	lavc/takdsp: R-V V decorrelate_ls C908: decorrelate_ls_c: 69.7 decorrelate_ls_rvv_i32: 27.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-12-21 22:42:34 +02:00
sunyuechi	98596f90f4	lavc/aacencdsp: R-V V abs_pow34 C908: abs_pow34_c: 535.5 abs_pow34_rvv_f32: 337.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-12-11 18:42:07 +02:00
Rémi Denis-Courmont	272d0c164d	lavc/lpc: R-V V apply_welch_window apply_welch_window_even_c: 617.5 apply_welch_window_even_rvv_f64: 235.0 apply_welch_window_odd_c: 709.0 apply_welch_window_odd_rvv_f64: 256.5	2023-12-11 18:17:43 +02:00
Rémi Denis-Courmont	b3825bbe45	riscv: test for assembler support This should fix the build on LLVM 16 and earlier, at the cost of turning all non-RVV optimisations off.	2023-12-08 17:21:09 +02:00
sunyuechi	0b9d009b4a	lavc/vc1dsp: R-V V inv_trans C908: vc1dsp.vc1_inv_trans_4x4_dc_c: 125.7 vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 53.5 vc1dsp.vc1_inv_trans_4x8_dc_c: 230.7 vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 65.5 vc1dsp.vc1_inv_trans_8x4_dc_c: 228.7 vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 64.5 vc1dsp.vc1_inv_trans_8x8_dc_c: 476.5 vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 80.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-12-08 17:20:48 +02:00
sunyuechi	8bdb663062	lavc/ac3dsp: R-V V float_to_fixed24 c910 float_to_fixed24_c: 2207.2 float_to_fixed24_rvv_f32: 696.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-12-06 16:04:22 +02:00
Rémi Denis-Courmont	0fa421c8f1	lavc/llvidencdsp: add R-V V diff_bytes diff_bytes_c: 163.0 diff_bytes_rvv_i32: 52.7	2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont	fbc7adba67	lavc/llviddsp: R-V V add_bytes add_bytes_c: 2077.2 add_bytes_rvv_i32: 105.0	2023-11-18 22:07:14 +02:00
Rémi Denis-Courmont	636ae0e0bc	lavc/flacdsp: R-V V packed decorrelate_{l,r}s flac_decorrelate_ms_16_c: 457.2 flac_decorrelate_ms_16_rvv_i32: 203.0 flac_decorrelate_ms_32_c: 457.2 flac_decorrelate_ms_32_rvv_i32: 203.5 flac_decorrelate_rs_16_c: 456.2 flac_decorrelate_rs_16_rvv_i32: 207.0 flac_decorrelate_rs_32_c: 456.2 flac_decorrelate_rs_32_rvv_i32: 210.5	2023-11-17 23:59:22 +02:00
Rémi Denis-Courmont	45d0eb3f70	lavc/llauddsp: R-V V scalarproduct_and_madd_int16 scalarproduct_and_madd_int16_c: 10355.7 scalarproduct_and_madd_int16_rvv_i32: 1480.0	2023-11-16 16:53:44 +02:00
Rémi Denis-Courmont	86bee42473	lavc/sbrdsp: R-V V sum64x5 sum64x5_c: 385.0 sum64x5_rvv_f32: 116.0	2023-11-01 22:53:26 +02:00
Rémi Denis-Courmont	73dea2bb91	lavc/jpeg2000dsp: R-V V ict_float jpeg2000_ict_float_c: 3112.2 jpeg2000_ict_float_rvv_f32: 1225.0	2023-11-01 18:52:55 +02:00
Rémi Denis-Courmont	424c8ceb08	lavc/huffyuvdsp: R-V V add_int16 add_int16_128_c: 2390.5 add_int16_128_rvv_i32: 832.0 add_int16_rnd_width_c: 2390.2 add_int16_rnd_width_rvv_i32: 832.5	2023-10-31 21:33:25 +02:00
Rémi Denis-Courmont	4aea0da230	lavc/utvideodsp: R-V V restore_rgb_planes restore_rgb_planes_c: 133065.7 restore_rgb_planes_rvv_i32: 33317.2	2023-10-31 21:33:25 +02:00
Rémi Denis-Courmont	3c6516330f	lavc/exrdsp: R-V V reoder_pixels	2023-10-09 19:52:51 +03:00
Rémi Denis-Courmont	89c10d8d20	lavc/ac3: add R-V Zbb extract_exponents	2023-10-05 18:13:00 +03:00
Rémi Denis-Courmont	9bc5676e40	lavc/g722dsp: add RISC-V V DSP function	2023-08-24 21:07:18 +03:00
Arnie Chang	c5508f60c2	lavc/h264chroma: RISC-V V add motion compensation for 8x8 chroma blocks Optimize the put and avg filtering for 8x8 chroma blocks Signed-off-by: Arnie Chang <arnie.chang@sifive.com>	2023-05-30 17:15:05 +02:00
Rémi Denis-Courmont	8009581912	lavc/opusdsp: RISC-V V (128-bit) postfilter This is implemented for a vector size of 128-bit. Since the scalar product in the inner loop covers 5 samples or 160 bits, we need a group multipler of 2. To avoid reconfiguring the vector type, the outer loop, which loads multiple input samples sticks to the same multipler. Consequently, the outer loop loads 8 samples per iteration. This is safe since the minimum period of the CELT codec is 15 samples. The same code would also work, albeit needlessly inefficiently with a vector length of 256 bits. A proper implementation will follow instead.	2022-10-10 02:22:10 +02:00
Rémi Denis-Courmont	d7528af4df	lavc/bswapdsp: RISC-V V bswap_buf	2022-10-05 08:26:19 +02:00
Rémi Denis-Courmont	f0ef11ea83	lavc/bswapdsp: RISC-V B bswap_buf Simply taking the Zbb REV8 instruction into use in a simple loop gives some significant savings: bswap_buf_c: 1081.0 bswap_buf_rvb_b: 771.0 But we can also use the 64-bit REV8 as a pseudo-SIMD instruction with just one additional shift, and one fewer load, effectively doubling the bandwidth. Consequently, this patch is useful even if the compile-time target has Zbb enabled for C code: bswap_buf_c: 1081.0 bswap_buf_rvb_b: 341.0 (this patch) On the other hand, this approach fails miserably for bswap16_buf as the ratio of shifts and stores becomes unfavorable compared to naïve C: bswap16_buf_c: 1542.0 bswap16_buf_rvb_b: 1803.7 Unrolling to process 128 bits (4 samples) at a time actually worsens performance ever so slightly: bswap_buf_c: 1081.0 bswap_buf_rvb_b: 408.5	2022-10-05 08:26:19 +02:00
Rémi Denis-Courmont	64ab577954	lavc/alacdsp: RISC-V V decorrelate_stereo To avoid data dependencies, this does the following unroll, which requires one extra but probably free addition: coeff = (b * left_weight) >> decorr_shift; b += a; a -= coeff; b -= coeff; swap(a, b);	2022-10-05 06:51:11 +02:00
Rémi Denis-Courmont	676b08cb70	lavc/pixblockdsp: RISC-V V 8-bit get_pixels & get_pixels_unaligned	2022-09-28 11:46:11 +02:00
Rémi Denis-Courmont	b29ee63a1b	lavc/idctdsp: RISC-V V put_pixels_clamped function	2022-09-28 11:46:11 +02:00
Rémi Denis-Courmont	b0cacf4c3f	lavc/aacpsdsp: RISC-V V add_squares	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	453aba71e6	lavc/vorbisdsp: RISC-V V inverse_coupling This uses the following vectorisation: for (i = 0; i < blocksize; i++) { ang[i] = mag[i] - copysignf(fmaxf(ang[i], 0.f), mag[i]); mag[i] = mag[i] - copysignf(fminf(ang[i], 0.f), mag[i]); }	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	47a10b9a99	lavc/fmtconvert: RISC-V V int32_to_float_fmul_scalar	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	27da9514c3	lavc/audiodsp: RISC-V V vector_clip_int32	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	1edac8eb46	lavc/pixblockdsp: RISC-V I get_pixels Benchmarks on SiFive U74-MC (courtesy of Shanghai StarFive Tech): get_pixels_c: 180.0 get_pixels_rvi: 136.7	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	04d092e7d5	lavc/audiodsp: RISC-V F vector_clipf RV64G supports MIN & MAX instructions natively only on floating point registers, not general purpose ones. The later would require the Zbb extension. Due to that, it is actually faster to perform the clipping "properly" in FPU. Benchmarks on SiFive U74-MC (courtesy of Shanghai StarFive Tech): audiodsp.vector_clipf_c: 29551.5 audiodsp.vector_clipf_rvf: 17871.0 Also tried unrolling with 2 or 8 elements but it gets worse either way.	2022-09-27 13:19:52 +02:00

35 commits