Doubling the register size allowed to avoid two pmaddubsw.
It is also ABI compliant (the old version lacked an emms)
and the average versions no longer rely on padding (the old versions
used pavgb with a memory operand reading eight bytes,
although only four are needed).
Old benchmarks (the latter four refer to RV40):
avg_h264_chroma_mc4_8_c: 145.7 ( 1.00x)
avg_h264_chroma_mc4_8_ssse3: 32.3 ( 4.51x)
put_h264_chroma_mc4_8_c: 136.1 ( 1.00x)
put_h264_chroma_mc4_8_ssse3: 29.0 ( 4.70x)
avg_chroma_mc4_c: 162.1 ( 1.00x)
avg_chroma_mc4_ssse3: 31.1 ( 5.22x)
put_chroma_mc4_c: 137.5 ( 1.00x)
put_chroma_mc4_ssse3: 28.6 ( 4.81x)
New benchmarks:
avg_h264_chroma_mc4_8_c: 146.7 ( 1.00x)
avg_h264_chroma_mc4_8_ssse3: 26.5 ( 5.53x)
put_h264_chroma_mc4_8_c: 136.8 ( 1.00x)
put_h264_chroma_mc4_8_ssse3: 22.5 ( 6.09x)
avg_chroma_mc4_c: 165.5 ( 1.00x)
avg_chroma_mc4_ssse3: 27.2 ( 6.08x)
put_chroma_mc4_c: 138.1 ( 1.00x)
put_chroma_mc4_ssse3: 23.2 ( 5.96x)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Wrong enum value was used to check unit_elems. While
AV_FRAME_DATA_MASTERING_DISPLAY_METADATA (11) would trigger when
UNIT_MASTERING_DISPLAY (2) was set, it also would match
UNIT_CONTENT_LIGHT_LEVEL (1) which is not expected.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
They are overridden by SSE2 and no longer needed by the no longer
existing nsse MMX functions. Saves 240B here.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Even nsse8 has to operate on eight words and therefore gains
a lot from xmm registers (and pabsw).
Old benchmarks:
nsse_0_c: 359.2 ( 1.00x)
nsse_0_mmx: 151.8 ( 2.37x)
nsse_1_c: 151.2 ( 1.00x)
nsse_1_mmx: 77.5 ( 1.95x)
New benchmarks:
nsse_0_c: 358.8 ( 1.00x)
nsse_0_ssse3: 62.2 ( 5.77x)
nsse_1_c: 151.2 ( 1.00x)
nsse_1_ssse3: 33.6 ( 4.50x)
The MMX nsse functions have been removed.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This will avoid using xmm registers that are volatile for Win64
in the next commit.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is a post-processing codec: given delta-x/y coordinates and a run length,
the r/g/b components of the 4 surrounding pixels are summed up, and the resulting
15bit value is used as index into a color quantization table to derive the
resulting pixel at the center.
It is only used in 10-20 frames of the Rebel Assault 2 LxxRETRY.SAN files
to slightly blur the outline of the "opening aperture" effect.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Fix several indexing errors in attack detection logic and refine transient handling in the AAC psychoacoustic model.
- Change PSY_LAME_NUM_SUBBLOCKS from 3 to 2 to ensure full coverage of all 1024 MDCT samples, with each subblock containing exactly 1024 / (8 * 2) = 64 samples—matching LAME’s empirical design.
- Introduce next_attack0_zero state flag to stabilize attack[0] prediction across frames.
- Adjust attack threshold presets.
These changes improve the handling of periodic signals such as trumpet, especially under low bitrate conditions.
Makes it easier to see that width and height in DecodeContext is
actually a lcevc field.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is not necessary to do it more than once, as none of the fields
set change after init.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Forgotten in eefec06634.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The AVX and SSE2 functions are identical except for the VEX encodings
used since e9abef437f and
8b8492452d.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It's used by other parts of the module that will fail to build otherwise after
the aforementioned removal.
Signed-off-by: James Almer <jamrial@gmail.com>
wingdi.h defines its own PASSTHROUGH and it is included implicitly
by the VC-1 parser (which is mpegvideo-based and therefore includes
a lot of stuff).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
AVCodecParser has several fields which are not really meant
to be accessed by users, but it has no public-private
demarkation line, so these fields are technically public
and can therefore not simply be made private like
20f9727018 did for AVCodec.*
This commit therefore deprecates these fields and
schedules them to become private. All parsers have already
been switched to FFCodecParser, which (for now) is a union
of AVCodecParser and an unnamed clone of AVCodecParser
(new fields can be added at the end of this clone).
*: This is also the reason why split has never been removed despite
not being set for several years now.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The current code relies on AV_CODEC_ID_NONE being zero, so that
unused codec ids are set to their proper value. This commit adds
a macro to set unset ids to AV_CODEC_ID_NONE.
(The actual rationale for this macro is to simplify
the transition to making the private fields that are
currently public in avcodec.h really private.)
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It only contains declarations for some auxiliary functions for parsing
that parsers that only work with complete packets don't need.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The decode API can handle outputting delayed frames without relying on the
parser splitting off the ENDOFSEQ marker.
Signed-off-by: James Almer <jamrial@gmail.com>