James Almer
aef221b22a
avcodec/hevc/refs: export Stereo 3D side data
...
Use the 3D Reference Displays Info SEI message to link a view_id with
an eye.
Signed-off-by: James Almer <jamrial@gmail.com>
2024-09-23 17:15:02 +02:00
Anton Khirnov
14746871e1
lavc/hevcdec: implement decoding MV-HEVC
...
At most two layers are supported.
Aspects of this work were sponsored by Vimeo and Meta.
2024-09-23 17:15:02 +02:00
Anton Khirnov
0fde9c609f
lavc/decode: merge stereo3d information from decoder with packet side data
...
The HEVC decoder will start setting stereoscopic view position (left or
right) based on 3D Reference Displays Info SEI message in future
commits. This information should be merged with container-derived
stereo3D side data.
2024-09-23 17:12:17 +02:00
Anton Khirnov
327080c088
lavc/decode: make sure side data mapping does not produce duplicates
...
Also, deduplicate the code performing the mapping.
2024-09-23 17:12:15 +02:00
Anton Khirnov
bcbe999077
lavc/decode: clear side data in reget_buffer()
...
Otherwise it may accumulate when e.g. global side data is repeatedly
copied to the frame with in each subsequent reget_buffer() call.
2024-09-23 17:11:40 +02:00
Anton Khirnov
e19551d165
lavc/decode: do not clear the frame discard flag in ff_decode_frame_props_from_pkt()
...
Only do it in reget_buffer().
The purpose of this clearing this flag is to prevent it for
unintentionally persisting across multiple invocations of this function
on one frame, however that is only a problem if the frame is not
unreffed between uses, which is only the case with reget_buffer().
In other cases the caller may legitimately want to set the discard flag
and should have the option of doing so.
2024-09-23 17:11:40 +02:00
Anton Khirnov
75914b5822
lavc/hevc/hevcdec: implement MV-HEVC inter-layer prediction
...
The per-frame reference picture set contains two more lists -
INTER_LAYER[01]. Assuming at most two layers, INTER_LAYER1 is always
empty, but is added anyway for completeness.
When inter-layer prediction is enabled, INTER_LAYER0 for the
second-layer frame will contain the base-layer frame from the same
access unit, if it exists.
The new lists are then used in per-slice reference picture set
construction as per F.8.3.4 "Decoding process for reference picture
lists construction".
2024-09-23 17:11:40 +02:00
Anton Khirnov
02a9435cb0
lavc/hevcdec: implement slice header parsing for nuh_layer_id>0
...
Cf. F.7.3.6.1 "General slice segment header syntax"
2024-09-23 17:11:40 +02:00
Anton Khirnov
a811ab74f0
lavc/hevc/parser: only split packets on NALUs with nuh_layer_id=0
...
A packet should contain a full access unit, which for multilayer video
should contain all the layers.
2024-09-23 17:11:40 +02:00
Anton Khirnov
52ce2d2a04
lavc/hevcdec/parse: process NALUs with nuh_layer_id>0
...
Otherwise parameter sets from extradata with nuh_layer_id>0 would be
ignored. Needed for upcoming MV-HEVC support.
2024-09-23 17:11:40 +02:00
Anton Khirnov
81e9afa6c2
lavc/hevc/ps: reindent
2024-09-23 17:11:40 +02:00
Anton Khirnov
7d245866b8
lavc/hevc/ps: implement SPS parsing for nuh_layer_id>0
...
Cf. F.7.3.2.2 "Sequence parameter set RBSP syntax", which extends normal
SPS parsing with special clauses depending on MultiLayerExtSpsFlag.
2024-09-23 17:11:40 +02:00
Anton Khirnov
4359467ad6
lavc/hevc/ps: drop a warning for sps_multilayer_extension_flag
...
SPS multilayer extension contains a single flag that we are free to
ignore, no reason to print a warning.
2024-09-23 17:11:40 +02:00
Niklas Haas
7351e067bc
lavc/hevc_ps: parse VPS extension
...
Only implementing what's needed for MV-HEVC with two views.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2024-09-23 17:11:40 +02:00
James Almer
efa9d3deca
avcodec/hevc/sei: add support for 3D Reference Displays Information SEI
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2024-09-23 17:11:40 +02:00
James Almer
660e7e6a0e
avcodec: add LCEVC decoding support via LCEVCdec
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-09-23 10:20:47 -03:00
James Almer
6147385393
avcodec: add an export_side_data flag to export picture enhancement layers
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-09-23 10:07:37 -03:00
James Almer
d250cc02e2
avcodec/hevc/refs: ensure LCEVC SEI payloads are exported as frame side data before get_buffer() calls
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-09-23 10:05:34 -03:00
James Almer
dbbf9a5ff7
avcodec/decode: split ProgressFrame allocator into two functions
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-09-23 10:05:34 -03:00
VÃctor Manuel Jáquez Leal
2bcc124e1a
vulkan_encode: set the quality level in session parameters
...
While running this command
./ffmpeg_g -loglevel debug -hwaccel vulkan -init_hw_device vulkan=vk:0,debug=1 -hwaccel_output_format vulkan -i input.y4m -vf 'format=nv12,hwupload' -c:v h264_vulkan -quality 2 output.mp4 -y
It hit this validation error:
Validation Error: [ VUID-vkCmdEncodeVideoKHR-None-08318 ] Object 0: handle =
0x8f000000008f, type = VK_OBJECT_TYPE_VIDEO_SESSION_KHR; Object 1: handle =
0xfd00000000fd, type = VK_OBJECT_TYPE_VIDEO_SESSION_PARAMETERS_KHR;
| MessageID = 0x5dc3dd39
| vkCmdEncodeVideoKHR(): The currently configured encode quality level (2) for
VkVideoSessionKHR 0x8f000000008f[] does not match the encode quality level (0)
VkVideoSessionParametersKHR 0xfd00000000fd[] was created with. The Vulkan spec
states: The bound video session parameters object must have been created with
the currently set video encode quality level for the bound video session at the
time the command is executed on the
device (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-vkCmdEncodeVideoKHR-None-08318 )
This patch adds a new function helper for creating session parameters, which
also sets the quality level and it's called by the H.264 and H.265 Vulkan
encoders.
2024-09-23 13:42:34 +02:00
Marvin Scholz
9e1682761f
avcodec/cbs_h266: Fix copy paste mistake
...
The us macro expect the range_max here, which seems should be
MAX_UINT_BITS(hlen) here.
Fix CID1618757 Copy-paste error
2024-09-20 22:32:54 +02:00
James Almer
2eef902d38
avcodec/bsf/dts2pts: don't zero the node buffers when allocating them
...
It's unnecessary as the entire struct is written to immediately after it's
allocated.
Restores the behavior prior to fec6a8df31 .
Signed-off-by: James Almer <jamrial@gmail.com>
2024-09-19 23:23:14 -03:00
Marvin Scholz
720ae6b3f7
avcodec/vaapi_encode_h265: fix missing slice_block_cols assignment
...
Instead of assigning to unit_opts.slice_block_cols, the slice_block_cols
value from the context was incorrectly assigned to slice_block_rows.
Regression from 12f158ca8f
Fixes CID1619479 Unused value
Reviewed-by: Fei Wang <fei.w.wang@intel.com>
2024-09-20 09:30:11 +08:00
James Almer
df609af8e4
avcodec/packet: add an LCEVC enhancement data payload side data type
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-09-19 10:01:26 -03:00
James Almer
5896318229
avcodec/codec_id: add an LCEVC codec id for raw LCEVC data
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-09-19 10:01:02 -03:00
James Almer
9cea2410a1
avcodec/h2645_sei: export raw LCEVC metadata
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-09-19 10:01:00 -03:00
Fei Wang
5211ad1acd
lavc/vaapi_encode: Fix potential use of uninitialized value
...
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-09-19 12:01:28 +08:00
Fei Wang
061c86a717
lavc/vaapi_encode_av1: Fix encode fail since 9db68ed0
...
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-09-19 12:01:18 +08:00
Michael Niedermayer
6df9a0292c
avcodec/vc2enc: basic sanity check on slice_max_bytes
...
Fixes: left shift of 896021632 by 3 places cannot be represented in type 'int'
Fixes: 70544/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VC2_fuzzer-6685593652756480
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-09-19 00:10:32 +02:00
Lynne
0aa4ac0faf
lavc: bump minor and add Changelog entry for the Vulkan H265 encoder
2024-09-17 21:12:32 +02:00
Lynne
4b4f0b68f8
lavc: add hevc_vulkan hardware encoder
...
This commit adds a Vulkan hardware HEVC encoder, with full support
of the spec - I, P, and B-frames.
2024-09-17 21:12:32 +02:00
Dave Airlie
b4283f93e1
cbs_h265: add raw filler encoding
2024-09-17 21:12:31 +02:00
Lynne
12f158ca8f
hw_base_encode_h265: split off SPS/PPS/VPS generation from VAAPI
...
This commit splits off the base unit generation from VAAPI to allow
sharing with other encoders.
2024-09-17 21:11:06 +02:00
James Almer
fec6a8df31
avcodec/bsf/dts2pts: use a RefStruct pool to allocate nodes
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-09-16 16:45:00 -03:00
Lynne
ceb471cfde
lavc: bump minor version and add changelog for h264_vulkan
2024-09-16 14:04:06 +02:00
Lynne
f85d94730c
lavc: add h264_vulkan hardware encoder
...
This commit adds the first Vulkan hardware encoder.
Currently, P, and **B**-frames are supported. This marks the
first implementation to support both.
The encoder has feature-parity with VAAPI.
2024-09-16 14:04:06 +02:00
Lynne
37243b2a08
lavc: add Vulkan video encoding base code
...
This commit adds the common Vulkan video encoding framework.
It makes full use of the asynchronous features of our new common
hardware encoding code, and of Vulkan.
The code is able to handle anything from H264 to AV1 and MJPEG.
2024-09-16 14:04:05 +02:00
Zhao Zhili
5c66a3ab51
avcodec/vvc: Fix output and unref a frame which isn't decoding yet
...
ff_vvc_output_frame is called before actually decoding. It's possible
for ff_vvc_output_frame to select current frame to output. If current
frame is nonref frame, it will be released by ff_vvc_unref_frame.
Fix this by always marking the current frame with
VVC_FRAME_FLAG_SHORT_REF, as is done by the HEVC decoder.
2024-09-15 16:42:14 +08:00
Zhao Zhili
3f84d1d1fb
aarch64/vvc: Add avg
...
avg_8_2x2_c: 0.2 ( 1.00x)
avg_8_2x2_neon: 0.2 ( 1.00x)
avg_8_4x4_c: 0.2 ( 1.00x)
avg_8_4x4_neon: 0.2 ( 1.00x)
avg_8_8x8_c: 0.9 ( 1.00x)
avg_8_8x8_neon: 0.2 ( 5.29x)
avg_8_16x16_c: 3.7 ( 1.00x)
avg_8_16x16_neon: 0.7 ( 5.44x)
avg_8_32x32_c: 14.9 ( 1.00x)
avg_8_32x32_neon: 1.7 ( 8.91x)
avg_8_64x64_c: 59.7 ( 1.00x)
avg_8_64x64_neon: 6.9 ( 8.62x)
avg_8_128x128_c: 254.7 ( 1.00x)
avg_8_128x128_neon: 26.9 ( 9.46x)
avg_10_2x2_c: 0.2 ( 1.00x)
avg_10_2x2_neon: 0.2 ( 1.00x)
avg_10_4x4_c: 0.2 ( 1.00x)
avg_10_4x4_neon: 0.2 ( 1.00x)
avg_10_8x8_c: 0.9 ( 1.00x)
avg_10_8x8_neon: 0.2 ( 5.29x)
avg_10_16x16_c: 3.4 ( 1.00x)
avg_10_16x16_neon: 0.4 ( 8.06x)
avg_10_32x32_c: 13.9 ( 1.00x)
avg_10_32x32_neon: 1.9 ( 7.23x)
avg_10_64x64_c: 54.2 ( 1.00x)
avg_10_64x64_neon: 8.4 ( 6.43x)
avg_10_128x128_c: 232.4 ( 1.00x)
avg_10_128x128_neon: 30.9 ( 7.52x)
avg_12_2x2_c: 0.0 ( 0.00x)
avg_12_2x2_neon: 0.2 ( 0.00x)
avg_12_4x4_c: 0.4 ( 1.00x)
avg_12_4x4_neon: 0.2 ( 2.43x)
avg_12_8x8_c: 0.7 ( 1.00x)
avg_12_8x8_neon: 0.2 ( 3.86x)
avg_12_16x16_c: 3.7 ( 1.00x)
avg_12_16x16_neon: 0.4 ( 8.65x)
avg_12_32x32_c: 13.7 ( 1.00x)
avg_12_32x32_neon: 2.2 ( 6.29x)
avg_12_64x64_c: 53.9 ( 1.00x)
avg_12_64x64_neon: 7.7 ( 7.03x)
avg_12_128x128_c: 270.9 ( 1.00x)
avg_12_128x128_neon: 30.4 ( 8.90x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
1be5a2374f
aarch64/vvc: Add put_epel_hv
...
On Apple M1:
put_chroma_hv_8_4x4_c: 1.7 ( 1.00x)
put_chroma_hv_8_4x4_neon: 0.2 ( 7.67x)
put_chroma_hv_8_8x8_c: 5.5 ( 1.00x)
put_chroma_hv_8_8x8_neon: 0.5 (11.53x)
put_chroma_hv_8_16x16_c: 18.5 ( 1.00x)
put_chroma_hv_8_16x16_neon: 1.5 (12.53x)
put_chroma_hv_8_32x32_c: 72.5 ( 1.00x)
put_chroma_hv_8_32x32_neon: 4.7 (15.34x)
put_chroma_hv_8_64x64_c: 274.0 ( 1.00x)
put_chroma_hv_8_64x64_neon: 18.5 (14.83x)
put_chroma_hv_8_128x128_c: 1058.7 ( 1.00x)
put_chroma_hv_8_128x128_neon: 75.2 (14.07x)
On Android Pixel 8 Pro:
put_chroma_hv_8_4x4_c: 1.2 ( 1.00x)
put_chroma_hv_8_4x4_neon: 0.0 ( 0.00x)
put_chroma_hv_8_4x4_i8mm: 0.2 ( 5.00x)
put_chroma_hv_8_8x8_c: 4.0 ( 1.00x)
put_chroma_hv_8_8x8_neon: 0.5 ( 8.00x)
put_chroma_hv_8_8x8_i8mm: 0.5 ( 8.00x)
put_chroma_hv_8_16x16_c: 15.2 ( 1.00x)
put_chroma_hv_8_16x16_neon: 2.5 ( 6.10x)
put_chroma_hv_8_16x16_i8mm: 2.2 ( 6.78x)
put_chroma_hv_8_32x32_c: 61.0 ( 1.00x)
put_chroma_hv_8_32x32_neon: 9.8 ( 6.26x)
put_chroma_hv_8_32x32_i8mm: 8.5 ( 7.18x)
put_chroma_hv_8_64x64_c: 229.5 ( 1.00x)
put_chroma_hv_8_64x64_neon: 38.5 ( 5.96x)
put_chroma_hv_8_64x64_i8mm: 34.0 ( 6.75x)
put_chroma_hv_8_128x128_c: 919.8 ( 1.00x)
put_chroma_hv_8_128x128_neon: 154.5 ( 5.95x)
put_chroma_hv_8_128x128_i8mm: 140.0 ( 6.57x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
0dcf204e5d
aarch64/vvc: Add put_epel_h i8mm
...
put_chroma_h_8_4x4_c: 0.4 ( 1.00x)
put_chroma_h_8_4x4_neon: 0.0 ( 0.00x)
put_chroma_h_8_4x4_i8mm: 0.1 ( 2.67x)
put_chroma_h_8_8x8_c: 1.6 ( 1.00x)
put_chroma_h_8_8x8_neon: 0.1 (11.00x)
put_chroma_h_8_8x8_i8mm: 0.1 (11.00x)
put_chroma_h_8_16x16_c: 6.9 ( 1.00x)
put_chroma_h_8_16x16_neon: 1.1 ( 6.00x)
put_chroma_h_8_16x16_i8mm: 0.7 (10.62x)
put_chroma_h_8_32x32_c: 27.6 ( 1.00x)
put_chroma_h_8_32x32_neon: 4.7 ( 5.95x)
put_chroma_h_8_32x32_i8mm: 4.4 ( 6.28x)
put_chroma_h_8_64x64_c: 116.2 ( 1.00x)
put_chroma_h_8_64x64_neon: 19.1 ( 6.07x)
put_chroma_h_8_64x64_i8mm: 17.1 ( 6.77x)
put_chroma_h_8_128x128_c: 466.6 ( 1.00x)
put_chroma_h_8_128x128_neon: 81.4 ( 5.73x)
put_chroma_h_8_128x128_i8mm: 71.7 ( 6.51x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
41a1885f7a
aarch64/vvc: Add put_epel_h
...
put_chroma_h_8_4x4_c: 0.2 ( 1.00x)
put_chroma_h_8_4x4_neon: 0.2 ( 1.00x)
put_chroma_h_8_8x8_c: 0.8 ( 1.00x)
put_chroma_h_8_8x8_neon: 0.2 ( 3.00x)
put_chroma_h_8_16x16_c: 3.8 ( 1.00x)
put_chroma_h_8_16x16_neon: 0.8 ( 5.00x)
put_chroma_h_8_32x32_c: 12.5 ( 1.00x)
put_chroma_h_8_32x32_neon: 2.2 ( 5.56x)
put_chroma_h_8_64x64_c: 47.0 ( 1.00x)
put_chroma_h_8_64x64_neon: 8.8 ( 5.37x)
put_chroma_h_8_128x128_c: 200.2 ( 1.00x)
put_chroma_h_8_128x128_neon: 31.8 ( 6.31x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
260e1b4b62
aarch64/vvc: Add sad
...
sad_8x16_c: 0.8 ( 1.00x)
sad_8x16_neon: 0.2 ( 3.00x)
sad_16x8_c: 0.5 ( 1.00x)
sad_16x8_neon: 0.2 ( 2.00x)
sad_16x16_c: 1.5 ( 1.00x)
sad_16x16_neon: 0.2 ( 6.00x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
5ac6925803
aarch64/vvc: Add put_qpel_hv
...
With Apple M1 (no i8mm):
put_luma_hv_8_4x4_c: 2.2 ( 1.00x)
put_luma_hv_8_4x4_neon: 0.8 ( 3.00x)
put_luma_hv_8_8x8_c: 7.0 ( 1.00x)
put_luma_hv_8_8x8_neon: 0.8 ( 9.33x)
put_luma_hv_8_16x16_c: 22.8 ( 1.00x)
put_luma_hv_8_16x16_neon: 2.5 ( 9.10x)
put_luma_hv_8_32x32_c: 84.8 ( 1.00x)
put_luma_hv_8_32x32_neon: 9.5 ( 8.92x)
put_luma_hv_8_64x64_c: 333.0 ( 1.00x)
put_luma_hv_8_64x64_neon: 35.5 ( 9.38x)
put_luma_hv_8_128x128_c: 1294.5 ( 1.00x)
put_luma_hv_8_128x128_neon: 137.8 ( 9.40x)
With Pixel 8 Pro:
put_luma_hv_8_4x4_c: 5.0 ( 1.00x)
put_luma_hv_8_4x4_neon: 0.8 ( 6.67x)
put_luma_hv_8_4x4_i8mm: 0.2 (20.00x)
put_luma_hv_8_8x8_c: 13.2 ( 1.00x)
put_luma_hv_8_8x8_neon: 1.2 (10.60x)
put_luma_hv_8_8x8_i8mm: 1.2 (10.60x)
put_luma_hv_8_16x16_c: 44.2 ( 1.00x)
put_luma_hv_8_16x16_neon: 4.5 ( 9.83x)
put_luma_hv_8_16x16_i8mm: 4.2 (10.41x)
put_luma_hv_8_32x32_c: 160.8 ( 1.00x)
put_luma_hv_8_32x32_neon: 17.5 ( 9.19x)
put_luma_hv_8_32x32_i8mm: 16.0 (10.05x)
put_luma_hv_8_64x64_c: 611.2 ( 1.00x)
put_luma_hv_8_64x64_neon: 68.0 ( 8.99x)
put_luma_hv_8_64x64_i8mm: 62.2 ( 9.82x)
put_luma_hv_8_128x128_c: 2384.8 ( 1.00x)
put_luma_hv_8_128x128_neon: 268.8 ( 8.87x)
put_luma_hv_8_128x128_i8mm: 245.8 ( 9.70x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
a0b52afd32
aarch64/vvc: Add put_qpel_vx
...
put_luma_v_8_4x4_c: 1.0 ( 1.00x)
put_luma_v_8_4x4_neon: 0.0 ( 0.00x)
put_luma_v_8_8x8_c: 3.5 ( 1.00x)
put_luma_v_8_8x8_neon: 0.5 ( 7.00x)
put_luma_v_8_16x16_c: 13.8 ( 1.00x)
put_luma_v_8_16x16_neon: 1.2 (11.00x)
put_luma_v_8_32x32_c: 54.2 ( 1.00x)
put_luma_v_8_32x32_neon: 5.0 (10.85x)
put_luma_v_8_64x64_c: 217.5 ( 1.00x)
put_luma_v_8_64x64_neon: 18.8 (11.60x)
put_luma_v_8_128x128_c: 886.2 ( 1.00x)
put_luma_v_8_128x128_neon: 74.0 (11.98x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
b051bc7cb8
aarch64/h26x: Remove duplicate b.eq instruction
...
b.eq is added by calc_all after each calc.
2024-09-14 16:36:34 +08:00
Zhao Zhili
11443cc9b1
avcodec/hevc: ff_hevc_(qpel/epel)_filters are signed type
2024-09-14 16:36:34 +08:00
Zhao Zhili
9f6c8eb412
aarch64/vvc: Add put_qpel_hx i8mm
...
Benchmark on Android pixel 8 with -fno-vectorize
put_luma_h_8_4x4_c: 0.2 ( 1.00x)
put_luma_h_8_4x4_neon: 0.2 ( 1.00x)
put_luma_h_8_4x4_i8mm: 0.0 ( 0.00x)
put_luma_h_8_8x8_c: 1.5 ( 1.00x)
put_luma_h_8_8x8_neon: 0.5 ( 3.00x)
put_luma_h_8_8x8_i8mm: 0.5 ( 3.00x)
put_luma_h_8_16x16_c: 6.2 ( 1.00x)
put_luma_h_8_16x16_neon: 2.0 ( 3.12x)
put_luma_h_8_16x16_i8mm: 1.5 ( 4.17x)
put_luma_h_8_32x32_c: 25.5 ( 1.00x)
put_luma_h_8_32x32_neon: 9.0 ( 2.83x)
put_luma_h_8_32x32_i8mm: 6.8 ( 3.78x)
put_luma_h_8_64x64_c: 99.8 ( 1.00x)
put_luma_h_8_64x64_neon: 35.2 ( 2.83x)
put_luma_h_8_64x64_i8mm: 27.2 ( 3.66x)
put_luma_h_8_128x128_c: 422.0 ( 1.00x)
put_luma_h_8_128x128_neon: 138.5 ( 3.05x)
put_luma_h_8_128x128_i8mm: 109.2 ( 3.86x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
25448d1716
aarch64/vvc: Add put_pel/put_pel_uni/put_pel_uni_w
...
put_luma_pixels_8_4x4_c: 0.2 ( 1.00x)
put_luma_pixels_8_4x4_neon: 0.2 ( 1.00x)
put_luma_pixels_8_8x8_c: 0.7 ( 1.00x)
put_luma_pixels_8_8x8_neon: 0.2 ( 3.22x)
put_luma_pixels_8_16x16_c: 2.2 ( 1.00x)
put_luma_pixels_8_16x16_neon: 0.2 ( 9.89x)
put_luma_pixels_8_32x32_c: 8.2 ( 1.00x)
put_luma_pixels_8_32x32_neon: 1.2 ( 6.71x)
put_luma_pixels_8_64x64_c: 33.7 ( 1.00x)
put_luma_pixels_8_64x64_neon: 2.5 (13.63x)
put_luma_pixels_8_128x128_c: 145.5 ( 1.00x)
put_luma_pixels_8_128x128_neon: 10.2 (14.23x)
put_uni_pixels_luma_8_4x4_c: 0.5 ( 1.00x)
put_uni_pixels_luma_8_4x4_neon: 0.0 ( 0.00x)
put_uni_pixels_luma_8_8x8_c: 0.5 ( 1.00x)
put_uni_pixels_luma_8_8x8_neon: 0.2 ( 2.11x)
put_uni_pixels_luma_8_16x16_c: 1.2 ( 1.00x)
put_uni_pixels_luma_8_16x16_neon: 0.2 ( 5.44x)
put_uni_pixels_luma_8_32x32_c: 3.0 ( 1.00x)
put_uni_pixels_luma_8_32x32_neon: 0.5 ( 6.26x)
put_uni_pixels_luma_8_64x64_c: 3.0 ( 1.00x)
put_uni_pixels_luma_8_64x64_neon: 1.7 ( 1.72x)
put_uni_pixels_luma_8_128x128_c: 6.5 ( 1.00x)
put_uni_pixels_luma_8_128x128_neon: 6.5 ( 1.00x)
2024-09-14 16:36:34 +08:00
Zhao Zhili
20f2bf5530
aarch64/vvc: Add put_qpel_h_* and put_qpel_uni_h_*
...
Just share hevc implementation.
checkasm --test=vvc_mc --benchmark:
put_luma_h_8_4x4_c: 0.2 ( 1.00x)
put_luma_h_8_4x4_neon: 0.2 ( 1.00x)
put_luma_h_8_8x8_c: 1.0 ( 1.00x)
put_luma_h_8_8x8_neon: 0.2 ( 4.33x)
put_luma_h_8_16x16_c: 3.2 ( 1.00x)
put_luma_h_8_16x16_neon: 1.2 ( 2.63x)
put_luma_h_8_32x32_c: 13.7 ( 1.00x)
put_luma_h_8_32x32_neon: 4.0 ( 3.45x)
put_luma_h_8_64x64_c: 48.2 ( 1.00x)
put_luma_h_8_64x64_neon: 15.7 ( 3.07x)
put_luma_h_8_128x128_c: 203.5 ( 1.00x)
put_luma_h_8_128x128_neon: 62.0 ( 3.28x)
put_uni_h_luma_8_4x4_c: 0.2 ( 1.00x)
put_uni_h_luma_8_4x4_neon: 0.2 ( 1.00x)
put_uni_h_luma_8_8x8_c: 1.5 ( 1.00x)
put_uni_h_luma_8_8x8_neon: 0.2 ( 6.56x)
put_uni_h_luma_8_16x16_c: 5.7 ( 1.00x)
put_uni_h_luma_8_16x16_neon: 1.2 ( 4.67x)
put_uni_h_luma_8_32x32_c: 24.0 ( 1.00x)
put_uni_h_luma_8_32x32_neon: 4.7 ( 5.07x)
put_uni_h_luma_8_64x64_c: 90.0 ( 1.00x)
put_uni_h_luma_8_64x64_neon: 17.0 ( 5.30x)
put_uni_h_luma_8_128x128_c: 357.7 ( 1.00x)
put_uni_h_luma_8_128x128_neon: 67.5 ( 5.30x)
2024-09-14 16:36:34 +08:00