ffmpeg/libavcodec/hevc
Jun Zhao 75838b9c89 lavc/hevc: add aarch64 NEON for reference sample filtering
3-tap [1,2,1]>>2: shared implementation body across size-specialized
entry points (8x8/16x16/32x32) to reduce code size. Fold the 3-tap
kernel into uhadd + urhadd: uhadd gives floor((prev+next)/2), then
urhadd rounds with curr to produce (prev + 2*curr + next + 2) >> 2
on 16 bytes in-place (no widen/narrow needed). Overlap-last technique
for tail avoids partial stores. Caller pads input arrays by 16 bytes
to guarantee safe over-read.

Strong smoothing (32x32): preloaded weight tables, interleaved
umull/umlal pairs (two 16-byte blocks at a time) to hide
rshrn-to-store latency, with paired st1 for 32-byte writes.

checkasm --bench --runs=15 (Apple M4, average of 3 trials):
  ref_filter_3tap_8x8_8_neon:    4.1x
  ref_filter_3tap_16x16_8_neon:  3.3x
  ref_filter_3tap_32x32_8_neon:  2.5x
  ref_filter_strong_8_neon:      1.9x

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-04-21 07:50:49 +00:00
..
cabac.c
data.c
data.h
dsp.c
dsp.h
dsp_template.c avcodec/hevc/dsp_template: Add restrict to add_residual functions 2026-04-06 11:28:49 +02:00
filter.c
hevc.h
hevcdec.c
hevcdec.h
Makefile
mvs.c
parse.c
parse.h
parser.c
pred.c lavc/hevc: extract reference sample filter into function pointers 2026-04-21 07:50:49 +00:00
pred.h lavc/hevc: extract reference sample filter into function pointers 2026-04-21 07:50:49 +00:00
pred_template.c lavc/hevc: add aarch64 NEON for reference sample filtering 2026-04-21 07:50:49 +00:00
ps.c avcodec/hevc: workaround hevc-alpha videos generated by VideoToolbox 2026-04-01 22:54:36 +08:00
ps.h
ps_enc.c
refs.c
sei.c
sei.h