ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-04-21 01:40:23 +00:00

Author	SHA1	Message	Date
Lynne	f80addbb07	ffv1enc_vulkan: fix encoding with large contexts When RGB_LINECACHE == 2, then top2 is not the current line.	2025-12-04 16:53:58 +01:00
Andreas Rheinhardt	4b6e40a298	avcodec/vp8dsp: Don't compile unused functions The width 16 epel functions never use four taps in any direction, so don't build said functions. Saves 4352B of .text and 89B of .text.unlikely here. : mx and my in vp8_mc_luma() are always even. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:37 +01:00
Andreas Rheinhardt	9cff236e2f	avcodec/riscv/vp8dsp_rvv: Remove unused functions Only the sixtap functions are used for size 16. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:37 +01:00
Andreas Rheinhardt	050c80a526	avcodec/x86/vp8dsp: Don't use saturated addition when unnecessary For the epel functions, there can be no overflow as long as the sum contains only one of the two large central coefficients; for bilinear functions, there can be no overflow whatsoever. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:37 +01:00
Andreas Rheinhardt	575e9e9c08	avcodec/x86/vp8dsp: Reduce number of coefficient tables By changing the permutations used in the epel8_h{4,6} case we can simply reuse the coefficient tables from the vertical epel filters. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:37 +01:00
Andreas Rheinhardt	99fb257f58	avcodec/x86/vp8dsp: Don't use MMX registers in ff_put_vp8_epel4_h6_ssse3 Doubling the register width allowed to avoid a pshufb and a pmaddubsw. Old benchmarks: vp8_put_epel4_h6_c: 115.9 ( 1.00x) vp8_put_epel4_h6_ssse3: 20.2 ( 5.74x) vp8_put_epel4_h6v4_c: 276.3 ( 1.00x) vp8_put_epel4_h6v4_ssse3: 58.6 ( 4.71x) vp8_put_epel4_h6v6_c: 363.6 ( 1.00x) vp8_put_epel4_h6v6_ssse3: 62.5 ( 5.82x) New benchmarks: vp8_put_epel4_h6_c: 116.4 ( 1.00x) vp8_put_epel4_h6_ssse3: 16.0 ( 7.29x) vp8_put_epel4_h6v4_c: 280.9 ( 1.00x) vp8_put_epel4_h6v4_ssse3: 44.3 ( 6.33x) vp8_put_epel4_h6v6_c: 365.6 ( 1.00x) vp8_put_epel4_h6v6_ssse3: 53.1 ( 6.89x) Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:37 +01:00
Andreas Rheinhardt	3135bc0d3a	avcodec/x86/vp8dsp: Don't use MMX registers in ff_put_vp8_epel4_h4_ssse3 Doubling the register width allows to use only one pshufb and pmaddubsw. Old benchmarks: vp8_put_epel4_h4_c: 82.8 ( 1.00x) vp8_put_epel4_h4_ssse3: 13.9 ( 5.96x) New benchmarks: vp8_put_epel4_h4_c: 82.7 ( 1.00x) vp8_put_epel4_h4_ssse3: 11.7 ( 7.08x) Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:37 +01:00
Andreas Rheinhardt	714cbf1c70	avcodec/x86/vp8dsp: Don't use MMX registers in ff_put_vp8_epel4_v4_ssse3 Switching to xmm registers allows to process two rows in parallel, leading to speedups. It is also ABI compliant (no more missing emms). Old benchmarks: vp8_put_epel4_v4_c: 96.8 ( 1.00x) vp8_put_epel4_v4_ssse3: 28.2 ( 3.43x) New benchmarks: vp8_put_epel4_v4_c: 95.1 ( 1.00x) vp8_put_epel4_v4_ssse3: 22.8 ( 4.17x) Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:37 +01:00
Andreas Rheinhardt	f017806829	avcodec/x86/vp8dsp: Don't use MMX registers in ff_put_vp8_epel4_v6_ssse3 Switching to xmm registers allows to process two rows in parallel, leading to speedups. It is also ABI compliant (no more missing emms). Old benchmarks: vp8_put_epel4_v6_c: 132.8 ( 1.00x) vp8_put_epel4_v6_ssse3: 34.3 ( 3.87x) New benchmarks: vp8_put_epel4_v6_c: 131.5 ( 1.00x) vp8_put_epel4_v6_ssse3: 27.1 ( 4.86x) Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:37 +01:00
Andreas Rheinhardt	7411998757	avcodec/x86/vp8dsp: Avoid unpacking multiple times Always pair row i with row i+2 for the vertical four-tap filter and row i+3 for the vertical six-tap filter (instead of pairing the first with the sixth, the second with the third and the fourth and the fifth). This allows to unpack each row only once instead of (at most) three times. Old benchmarks: vp8_put_epel4_v4_c: 98.4 ( 1.00x) vp8_put_epel4_v4_ssse3: 28.6 ( 3.44x) vp8_put_epel4_v6_c: 131.6 ( 1.00x) vp8_put_epel4_v6_ssse3: 38.5 ( 3.42x) vp8_put_epel8_v4_c: 362.5 ( 1.00x) vp8_put_epel8_v4_sse2: 63.8 ( 5.68x) vp8_put_epel8_v4_ssse3: 44.4 ( 8.16x) vp8_put_epel8_v6_c: 538.3 ( 1.00x) vp8_put_epel8_v6_sse2: 86.5 ( 6.22x) vp8_put_epel8_v6_ssse3: 57.0 ( 9.44x) vp8_put_epel16_v6_c: 1044.6 ( 1.00x) vp8_put_epel16_v6_sse2: 158.0 ( 6.61x) vp8_put_epel16_v6_ssse3: 106.7 ( 9.79x) New benchmarks: vp8_put_epel4_v4_c: 100.0 ( 1.00x) vp8_put_epel4_v4_ssse3: 28.4 ( 3.52x) vp8_put_epel4_v6_c: 131.7 ( 1.00x) vp8_put_epel4_v6_ssse3: 34.3 ( 3.84x) vp8_put_epel8_v4_c: 364.4 ( 1.00x) vp8_put_epel8_v4_sse2: 63.7 ( 5.72x) vp8_put_epel8_v4_ssse3: 43.3 ( 8.42x) vp8_put_epel8_v6_c: 550.2 ( 1.00x) vp8_put_epel8_v6_sse2: 86.4 ( 6.37x) vp8_put_epel8_v6_ssse3: 52.9 (10.40x) vp8_put_epel16_v6_c: 1052.5 ( 1.00x) vp8_put_epel16_v6_sse2: 158.3 ( 6.65x) vp8_put_epel16_v6_ssse3: 98.9 (10.64x) Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:37 +01:00
Andreas Rheinhardt	24cdd4100d	avcodec/x86/vp8dsp_init: Remove unused macro Forgotten in `6a551f1405`. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:37 +01:00
Andreas Rheinhardt	76900089fb	avcodec/x86/vp8dsp: Avoid reload Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:37 +01:00
Andreas Rheinhardt	86aa1b81ec	avcodec/x86/vp8dsp: Increment src pointer earlier Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:37 +01:00
Andreas Rheinhardt	e59ed3470d	avcodec/x86/vp8dsp: Directly use negated stride There is a register available. No change in benchmarks here. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:37 +01:00
Andreas Rheinhardt	8fb6b0c733	avcodec/x86/vp8dsp: Don't use MMX registers in put_vp8_pixels8 Use GPRs on x64 and xmm registers else (using GPRs reduces codesize). This avoids clobbering the floating point state and therefore no longer breaks the ABI. No change in benchmarks here. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:36 +01:00
Andreas Rheinhardt	ed5e0f9c68	avcodec/x86/vp8dsp: Remove MMXEXT functions overridden by SSSE3 SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD), so that the overwhelming majority of our users (particularly those that actually update their FFmpeg) will be using the SSSE3 versions. This commit therefore removes the MMX(EXT) functions overridden by them (which don't abide by the ABI) to get closer to a removal of emms_c. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:36 +01:00
Lynne	9b14ea0aa1	vulkan_dpx: fix alignment issue 12-bit images apparently require mod-32 alignment for each line. Go figure.	2025-12-04 15:08:46 +01:00
Oliver Chang	d6458f6a8b	avcodec/aacdec: Fix heap-use-after-free in USAC decoding A heap-use-after-free vulnerability was identified in `libavcodec/aac/aacdec.c`. When `che_configure` frees a `ChannelElement` (`ac->che[type][id]`), it failed to clear all references to it in `ac->tag_che_map`. `ac->tag_che_map` caches pointers to `ChannelElement`s and can contain cross-type mappings (e.g., a `TYPE_SCE` tag mapping to a `TYPE_LFE` element). In a USAC stream reconfiguration scenario, an LFE element was freed, but a stale pointer remained in `ac->tag_che_map`. Subsequent calls to `ff_aac_get_che` returned this dangling pointer, leading to a crash in `decode_usac_core_coder`. This commit fixes the issue by iterating over the entire `ac->tag_che_map` in `che_configure` and clearing any entries that point to the `ChannelElement` about to be freed, ensuring no dangling pointers remain. Fixes: https://issues.oss-fuzz.com/issues/440220467	2025-12-04 09:34:32 +00:00
Xia Tao	7922d4ca7d	avcodec/wasm/hevc: fix typo in butterfly macro Signed-off-by: Xia Tao <xiatao@gmail.com>	2025-12-04 08:40:43 +00:00
stevxiao	7b2ae2ccf7	avcodec/d3d12va_encode: add intra refresh support for d3d12va encode Intra refresh is a technique that gradually refreshes the video by encoding rows or regions as intra macroblocks/CTUs spread over multiple frames, rather than using periodic I-frames. This provides better error resilience for video streaming while maintaining more consistent bitrate. Disable Intra Refresh (This is the default) ffmpeg -init_hw_device d3d12va -hwaccel d3d12va -hwaccel_output_format d3d12 \ -i input.mp4 \ -c:v h264_d3d12va \ -intra_refresh_mode none \ -intra_refresh_duration 30 \ -g 60 \ output.h264 Enable Intra Refresh ffmpeg -init_hw_device d3d12va -hwaccel d3d12va -hwaccel_output_format d3d12 \ -i input.mp4 \ -c:v h264_d3d12va \ -intra_refresh_mode row_based \ -intra_refresh_duration 30 \ -g 60 \ output.h264 Parameters - `-intra_refresh_mode`: Set to `row_based` to enable row-based intra refresh, or `NONE` to disable - `-intra_refresh_duration`: Number of frames over which to spread the intra refresh (default: 0 = use GOP size) - `-g`: GOP size (should typically be larger than intra refresh duration)	2025-12-04 08:26:26 +00:00
Oliver Chang	041d4f010e	libavcodec/prores_raw: Fix heap-buffer-overflow in decode_frame Fixes a heap-buffer-overflow in `decode_frame` where `header_len` read from the bitstream was not validated against the remaining bytes in the input buffer (`gb`). This allowed `gb_hdr` to be initialized with a size exceeding the actual packet data, leading to an out-of-bounds read. The fix adds a check to ensure `bytestream2_get_bytes_left(&gb)` is greater than or equal to `header_len - 2` before initializing `gb_hdr`. Fixes: https://issues.oss-fuzz.com/issues/439711053	2025-12-03 16:40:02 +00:00
Martin Storsjö	b98179cec6	avcodec/{arm,neon}/mpegvideo: Readd a missed initialization This was accidentally removed in `357fc5243c`. This fixes test failures when built with Clang and MSVC; surprisingly, the checkasm test did seem to pass when built with GCC. Clang and MSVC also warn about the use of the uninitialized variable, while GCC didn't.	2025-12-03 13:53:54 +02:00
Andreas Rheinhardt	5d9270df7f	libavutil/internal: Remove {SIZE,PTRDIFF}_SPECIFIER Possible since `222127418b`. Reviewed-by: Kacper Michajłow <kasper93@gmail.com> Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 11:52:54 +01:00
Andreas Rheinhardt	c22c2c5e03	avcodec/mpegvideo: Port dct_unquantize_mpeg2_intra_mmx to SSE2 Benefits from wider registers. Benchmarks: dct_unquantize_mpeg2_intra_c: 228.2 ( 1.00x) dct_unquantize_mpeg2_intra_mmx: 28.2 ( 8.10x) dct_unquantize_mpeg2_intra_sse2: 18.4 (12.37x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 10:23:43 +01:00
Andreas Rheinhardt	6e2153111d	avcodec/x86/mpegvideo: Port dct_unquantize_mpeg2_inter_mmx to SSSE3 Benefits from wider registers, pabsw and psignw. Benchmarks: dct_unquantize_mpeg2_inter_c: 131.2 ( 1.00x) dct_unquantize_mpeg2_inter_mmx: 50.2 ( 2.62x) dct_unquantize_mpeg2_inter_ssse3: 20.5 ( 6.38x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 10:23:43 +01:00
Andreas Rheinhardt	60084b1369	avcodec/x86/mpegvideo: Port MPEG-1 unquantize functions to SSSE3 Benefits from wider registers and pabsw, psignw. Benchmarks: dct_unquantize_mpeg1_inter_c: 343.0 ( 1.00x) dct_unquantize_mpeg1_inter_mmx: 50.6 ( 6.78x) dct_unquantize_mpeg1_inter_ssse3: 17.2 (19.94x) dct_unquantize_mpeg1_intra_c: 352.1 ( 1.00x) dct_unquantize_mpeg1_intra_mmx: 48.8 ( 7.22x) dct_unquantize_mpeg1_intra_ssse3: 19.5 (18.03x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 10:23:43 +01:00
Andreas Rheinhardt	1cb987d25b	avcodec/x86/mpegvideo: Port dct_unquantize_h263_{intra,inter}_mmx to SSSE3 It benefits from wider registers and psignw. Benchmarks: dct_unquantize_h263_inter_c: 88.3 ( 1.00x) dct_unquantize_h263_inter_mmx: 24.7 ( 3.58x) dct_unquantize_h263_inter_ssse3: 9.3 ( 9.47x) dct_unquantize_h263_intra_c: 93.7 ( 1.00x) dct_unquantize_h263_intra_mmx: 30.6 ( 3.06x) dct_unquantize_h263_intra_ssse3: 16.5 ( 5.69x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 10:23:43 +01:00
Andreas Rheinhardt	a9a23925df	avcodec/x86/mpegvideo: Don't duplicate register Currently several inline ASM blocks used a value as an input and rax as clobber register. The input value was just moved into the register which then served as loop counter. This is wasteful, as one can just use the value's register directly as loop counter. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 10:23:43 +01:00
Andreas Rheinhardt	1fa8ffc1db	avcodec/x86/mpegvideo: Improve unquantizing MPEG-2 intra blocks Unquantizing involves calculating (block[j] * qscale * quant_matrix[j]) / 16 where / rounds towards zero. Arithmetic right shifts naturally round towards -inf, so the earlier code calculated the absolute value first, then used a right-shift and then negated the result if necessary. This commit uses a different procedure: It biases the product for negative values of block[j] by 0xf. The combination of this and the arithmetic right shift is the same as rounding towards zero. Furthermore, a write-only store to mm7 has been removed. Benchmarks: dct_unquantize_mpeg2_intra_c: 214.3 ( 1.00x) dct_unquantize_mpeg2_intra_mmx (old): 43.0 ( 4.98x) dct_unquantize_mpeg2_intra_mmx (new): 28.4 ( 7.56x) (The bitexact flag and the test for correctness have beem removed from checkasm for the benchmarks.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 10:23:43 +01:00
Andreas Rheinhardt	6d56807a06	avcodec/x86/mpegvideo: Use correct inline assembly constraints The H.263 unquantize functions modified an input parameter. (And they did so since this code was added in `7f3f5ec87b`. I am surprised that this didn't cause issues, particularly with the intra function.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 10:23:43 +01:00
Andreas Rheinhardt	0f7cc6aeea	avcodec/mpegvideo: Move ff_init_scantable() to mpegvideo_unquantize.c This is necessary so that the mpegvideo_unquantize checkasm test does not pull mpegvideo.o and then all of libavcodec into checkasm. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 10:23:43 +01:00
Andreas Rheinhardt	357fc5243c	avcodec/{arm,neon}/mpegvideo: Fix h263 unquantize functions These functions currently operate on the assumption that the number of coefficients to process is always of the form 16k+m with m<=4 or >8. Yet this is not true when the IDCT permutation is of type FF_IDCT_PERM_LIBMPEG2 (i.e. when FF_IDCT_INT is in use). Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 10:23:39 +01:00
Andreas Rheinhardt	e7a629049f	avcodec/{arm,neon}/mpegvideo: Use intra scantable to unquant H263 intra Forgotten in `70a7df049c`. Using the wrong scantable matters for codecs for which both scantables can differ, namely the MPEG-4 decoder and the WMV1/2 codecs. For WMV1 it can lead to wrong output in case the IDCT permutation is FF_IDCT_PERM_PARTTRANS, because in this case the entries of of the intra scantable's raster end are not always <= the corresponding entries of the inter scantable's raster end when the former is initialized via ff_wmv1_scantable[1] and the latter via ff_wmv1_scantable[0]. FF_IDCT_PERM_PARTTRANS is used iff the Neon IDCT is used (for both arm and aarch64).* Said IDCT is not used during FATE, so that this issue went unnoticed. WMV2 uses the same scantables, but uses a custom IDCT which always uses FF_IDCT_PERM_NONE for which the inter_scantable, so that the output is always correct for it. The scantable for MPEG-4 can change mid-stream (for the decoder), but since `c41818dc5d` only the intra scantable is updated, so that both scantables can get out of sync. In such a case the unquantize intra functions could unquantize an incorrect number of coefficients. Using raster_end of the wrong scantable can also lead to an unnecessarily large amount of coefficients unquantized. *: FF_IDCT_PERM_SIMPLE and FF_IDCT_PERM_TRANSPOSE would also not work, but they are not used at all by arm and aarch64. Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 10:20:42 +01:00
Andreas Rheinhardt	5d41d3e21d	avcodec/ppc/mpegvideo_altivec: Reindent after the previous commit Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 10:20:42 +01:00
Andreas Rheinhardt	011ef7fc65	avcodec/ppc/mpegvideo_altivec: Split intra/inter unquantizing Don't use a single function that checks mb_intra. Forgotten in `d50635cd24`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 10:20:42 +01:00
Andreas Rheinhardt	358c569b05	avcodec/mpegvideo_unquantize: Constify MPVContext pointee Also use MPVContext instead of MpegEncContext. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 10:20:41 +01:00
yuanhecai	f7551e7505	avcodec: fix checkasm-hpeldsp failed on LA	2025-12-03 01:36:01 +00:00
Andreas Rheinhardt	eccf130fdb	{lib{avcodec,swscale}/x86/,}Makefile: Kill MMX-OBJS Reviewed-by: Kacper Michajłow <kasper93@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-30 22:20:13 +01:00
Andreas Rheinhardt	ba94177242	avcodec/x86/Makefile: Only compile ASM init files when X86ASM is enabled To do so, simply add these init files to X86ASM-OBJS instead of OBJS in the Makefile. The former is already used for the actual assembly files, but using them for the C init files just works, because the build system uses file extensions to derive whether it is a C or a NASM file. This avoids compiling unused function stubs and also reduces our reliance on DCE: We don't add %if checks to the asm files except for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4 functions will be available. It also allows to remove HAVE_X86ASM checks in these init files. Reviewed-by: Kacper Michajłow <kasper93@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-30 22:20:13 +01:00
Andreas Rheinhardt	65b4feb782	avcodec/x86/Makefile: Remove redundant WebP decoder->vp8dsp dependencies Redundant since `35b02732b9`. Reviewed-by: Kacper Michajłow <kasper93@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-30 22:20:13 +01:00
averne	1d1643b42a	vulkan/prores: use cached bitstream reader Speedup is around 75% on NVIDIA 3050, 20% on AMD 6700XT, 5% on Intel TigerLake.	2025-11-30 22:01:17 +01:00
averne	fd2fd3828c	libavcodec/vulkan: remove unnessary member in GetBitContext The number of remaining bits can be calculated using existing state. This simplifies calculations and frees up one register.	2025-11-30 19:21:08 +01:00
averne	ef7354d471	libavcodec/vulkan: introduce cached bitstream reader This stores a small buffer in shared memory per decode thread (16 bytes), which helps reduce the number of memory accesses. The bitstream buffer is first aligned to a 4 byte boundary, so that the buffer can be filled with a single memory request.	2025-11-30 19:21:04 +01:00
Andreas Rheinhardt	89f984e3d1	avcodec/x86/h264_idct: Fix ff_h264_luma_dc_dequant_idct_sse2 checkasm failures ff_h264_luma_dc_dequant_idct_sse2() does not pass checkasm for certain seeds, because the input to packssdw no longer fits into an int16_t, leading to saturation, where the C code just truncates. I don't know whether the spec contains provisions that ensure that valid input must not exceed 16 bit or whether the such inputs (even if invalid) can be triggered by the actual code and not only the test. This commit adapts the behavior of the function to the C reference code to fix the test. packssdw is avoided, instead the lower words are directly transfered to GPRs to be written out. This has unfortunately led to a slight performance regression here (14.5 vs 15.1 cycles). Fixes issue #20835. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-30 00:15:43 +01:00
Andreas Rheinhardt	e6ae2802a3	avcodec/x86/h264_idct: Deduplicate generating constant pw_1 is currently loaded in both codepaths. Generate it earlier instead. Gives tiny speedups (15 vs 14.5 cycles) and reduces codesize. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-30 00:15:43 +01:00
Andreas Rheinhardt	ada0a81577	avcodec/x86/h264_idct: Don't use MMX registers in ff_h264_luma_dc_dequant_idct_sse2 It is ABI compliant and gives a tiny speedup here (and is 16B smaller). Old benchmarks: h264_luma_dc_dequant_idct_8_c: 33.2 ( 1.00x) h264_luma_dc_dequant_idct_8_sse2: 16.0 ( 2.07x) New benchmarks: h264_luma_dc_dequant_idct_8_c: 33.0 ( 1.00x) h264_luma_dc_dequant_idct_8_sse2: 15.0 ( 2.20x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-30 00:15:43 +01:00
Andreas Rheinhardt	012c25bac4	avcodec/x86/h264_idct: Zero with full-width stores Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-30 00:15:43 +01:00
Andreas Rheinhardt	b9cbbd9074	avcodec/x86/h264_idct: Use tail call where advantageous It is possible on UNIX64. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-30 00:15:43 +01:00
Andreas Rheinhardt	01ff05e4bc	avcodec/x86/h264_idct: Avoid call where possible Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-30 00:15:43 +01:00
Andreas Rheinhardt	b51cbd4116	avcodec/x86/h264_idct: Remove redundant movsxdifnidn Only exported (i.e. cglobal) functions need it; stride is already sign-extended when it reaches any of the internal functions used here, so don't sign-extend again. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-30 00:15:43 +01:00

1 2 3 4 5 ...

53188 commits