Commit graph

13 commits

Author SHA1 Message Date
Ramiro Polla
2d1358a84d swscale/range_convert: saturate output instead of limiting input
For bit depths <= 14, the result is saturated to 15 bits.
For bit depths > 14, the result is saturated to 19 bits.

x86_64:
chrRangeFromJpeg8_1920_c:    2126.5   2127.4  (1.00x)
chrRangeFromJpeg16_1920_c:   2331.4   2325.2  (1.00x)
chrRangeToJpeg8_1920_c:      3163.0   3166.9  (1.00x)
chrRangeToJpeg16_1920_c:     3163.7   2152.4  (1.47x)
lumRangeFromJpeg8_1920_c:    1262.2   1263.0  (1.00x)
lumRangeFromJpeg16_1920_c:   1079.5   1080.5  (1.00x)
lumRangeToJpeg8_1920_c:      1860.5   1886.8  (0.99x)
lumRangeToJpeg16_1920_c:     1910.2   1077.0  (1.77x)

aarch64 A55:
chrRangeFromJpeg8_1920_c:   28836.2  28835.2  (1.00x)
chrRangeFromJpeg16_1920_c:  28840.1  28839.8  (1.00x)
chrRangeToJpeg8_1920_c:     44196.2  23074.7  (1.92x)
chrRangeToJpeg16_1920_c:    36527.3  17318.9  (2.11x)
lumRangeFromJpeg8_1920_c:   15388.5  15389.7  (1.00x)
lumRangeFromJpeg16_1920_c:  15389.3  15388.2  (1.00x)
lumRangeToJpeg8_1920_c:     23069.7  19227.8  (1.20x)
lumRangeToJpeg16_1920_c:    19227.8  15387.0  (1.25x)

aarch64 A76:
chrRangeFromJpeg8_1920_c:    6334.7   6324.4  (1.00x)
chrRangeFromJpeg16_1920_c:   6336.0   6339.9  (1.00x)
chrRangeToJpeg8_1920_c:     11474.5   9656.0  (1.19x)
chrRangeToJpeg16_1920_c:     9640.5   6340.4  (1.52x)
lumRangeFromJpeg8_1920_c:    4453.2   4422.0  (1.01x)
lumRangeFromJpeg16_1920_c:   4414.2   4420.9  (1.00x)
lumRangeToJpeg8_1920_c:      6645.0   5949.1  (1.12x)
lumRangeToJpeg16_1920_c:     6005.2   4446.8  (1.35x)

NOTE: all simd optimizations for range_convert have been disabled
      except for x86, which already had the same behaviour.
      they will be re-enabled when they are fixed for each architecture.
2024-12-05 21:10:29 +01:00
Niklas Haas
2d077f9acd swscale/internal: group user-facing options together
This is a preliminary step to separating these into a new struct. This
commit contains no functional changes, it is a pure search-and-replace.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-11-21 12:49:56 +01:00
Ramiro Polla
f7ee0195df swscale/range_convert: drop redundant conditionals from arch-specific init functions
These conditions are already checked for in the main init function.
2024-10-27 13:20:56 +01:00
Ramiro Polla
7728b3357d swscale/range_convert: call arch-specific init functions from main init function
This commit also fixes the issue that the call to ff_sws_init_range_convert()
from sws_init_swscale() was not setting up the arch-specific optimizations.
2024-10-27 13:20:56 +01:00
Niklas Haas
67adb30322 swscale: rename SwsContext to SwsInternal
And preserve the public SwsContext as separate name. The motivation here
is that I want to turn SwsContext into a public struct, while keeping the
internal implementation hidden. Additionally, I also want to be able to
use multiple internal implementations, e.g. for GPU devices.

This commit does not include any functional changes. For the most part, it is
a simple rename. The only complications arise from the public facing API
functions, which preserve their current type (and hence require an additional
unwrapping step internally), and the checkasm test framework, which directly
accesses SwsInternal.

For consistency, the affected functions that need to maintain a distionction
have generally been changed to refer to the SwsContext as *sws, and the
SwsInternal as *c.

In an upcoming commit, I will provide a backing definition for the public
SwsContext, and update `sws_internal()` to dereference the internal struct
instead of merely casting it.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-24 22:50:00 +02:00
Rémi Denis-Courmont
210877c5fd sws/riscv: depend on RVB and simplify accordingly 2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
417957ec5e sws/range_convert: R-V V to/from JPEG
C908   X60
chrRangeFromJpeg_8_c:          2.7    2.5
chrRangeFromJpeg_8_rvv_i32:    1.7    1.5
chrRangeFromJpeg_24_c:         7.5    6.7
chrRangeFromJpeg_24_rvv_i32:   1.7    1.5
chrRangeFromJpeg_128_c:       55.2   34.7
chrRangeFromJpeg_128_rvv_i32:  6.5    3.0
chrRangeFromJpeg_144_c:       44.0   39.2
chrRangeFromJpeg_144_rvv_i32:  7.7    4.5
chrRangeFromJpeg_256_c:       78.2   69.5
chrRangeFromJpeg_256_rvv_i32: 12.2    6.0
chrRangeFromJpeg_512_c:      172.2  138.5
chrRangeFromJpeg_512_rvv_i32: 24.5   11.7
chrRangeToJpeg_8_c:            4.7    4.2
chrRangeToJpeg_8_rvv_i32:      2.0    1.7
chrRangeToJpeg_24_c:          13.7   12.2
chrRangeToJpeg_24_rvv_i32:     2.0    1.5
chrRangeToJpeg_128_c:         72.0   63.7
chrRangeToJpeg_128_rvv_i32:    6.7    3.2
chrRangeToJpeg_144_c:         80.7   71.7
chrRangeToJpeg_144_rvv_i32:    8.5    4.7
chrRangeToJpeg_256_c:        143.2  127.2
chrRangeToJpeg_256_rvv_i32:   13.5    6.5
chrRangeToJpeg_512_c:        285.7  253.7
chrRangeToJpeg_512_rvv_i32:   27.0   13.0
lumRangeFromJpeg_8_c:          1.7    1.5
lumRangeFromJpeg_8_rvv_i32:    1.2    1.0
lumRangeFromJpeg_24_c:         4.2    3.7
lumRangeFromJpeg_24_rvv_i32:   1.2    1.0
lumRangeFromJpeg_128_c:       21.7   19.2
lumRangeFromJpeg_128_rvv_i32:  3.7    1.7
lumRangeFromJpeg_144_c:       24.7   22.0
lumRangeFromJpeg_144_rvv_i32:  4.7    2.7
lumRangeFromJpeg_256_c:       43.7   39.0
lumRangeFromJpeg_256_rvv_i32:  7.5    3.2
lumRangeFromJpeg_512_c:       87.0   77.2
lumRangeFromJpeg_512_rvv_i32: 14.5    6.7
lumRangeToJpeg_8_c:            2.7    2.2
lumRangeToJpeg_8_rvv_i32:      1.0    1.0
lumRangeToJpeg_24_c:           7.2    6.5
lumRangeToJpeg_24_rvv_i32:     1.2    1.0
lumRangeToJpeg_128_c:         37.7   33.7
lumRangeToJpeg_128_rvv_i32:    3.7    2.0
lumRangeToJpeg_144_c:         42.5   37.7
lumRangeToJpeg_144_rvv_i32:    4.7    2.7
lumRangeToJpeg_256_c:         75.0   66.7
lumRangeToJpeg_256_rvv_i32:    7.5    3.5
lumRangeToJpeg_512_c:        149.5  133.0
lumRangeToJpeg_512_rvv_i32:   14.7    7.0
2024-06-10 22:48:52 +03:00
Rémi Denis-Courmont
7a3369398f sws/input: R-V V 32-bit RGB to halved UV
T-Head C908:
abgr_to_uv_half_8_c:            2.2
abgr_to_uv_half_8_rvv_i32:      3.5
abgr_to_uv_half_128_c:         44.0
abgr_to_uv_half_128_rvv_i32:   13.0
abgr_to_uv_half_1080_c:       245.0
abgr_to_uv_half_1080_rvv_i32: 107.2
abgr_to_uv_half_1920_c:       406.2
abgr_to_uv_half_1920_rvv_i32: 188.7
bgra_to_uv_half_8_c:            2.2
bgra_to_uv_half_8_rvv_i32:      3.5
bgra_to_uv_half_128_c:         26.5
bgra_to_uv_half_128_rvv_i32:   13.0
bgra_to_uv_half_1080_c:       219.7
bgra_to_uv_half_1080_rvv_i32: 107.0
bgra_to_uv_half_1920_c:       406.7
bgra_to_uv_half_1920_rvv_i32: 188.7

SpacemiT X60:
abgr_to_uv_half_8_c:           2.2
abgr_to_uv_half_8_rvv_i32:     3.0
abgr_to_uv_half_128_c:        28.2
abgr_to_uv_half_128_rvv_i32:   5.7
abgr_to_uv_half_1080_c:      235.5
abgr_to_uv_half_1080_rvv_i32: 47.7
abgr_to_uv_half_1920_c:      418.2
abgr_to_uv_half_1920_rvv_i32: 84.0
bgra_to_uv_half_8_c:           2.0
bgra_to_uv_half_8_rvv_i32:     3.0
bgra_to_uv_half_128_c:        23.7
bgra_to_uv_half_128_rvv_i32:   5.7
bgra_to_uv_half_1080_c:      195.5
bgra_to_uv_half_1080_rvv_i32: 47.7
bgra_to_uv_half_1920_c:      346.5
bgra_to_uv_half_1920_rvv_i32: 84.0
2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont
e2f069905e sws/input: R-V V 32-bit RGB to UV 2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont
f5555cb106 sws/input: R-V V 32-bit RGB to Y
T-Head C908:
abgr_to_y_8_c:            2.5
abgr_to_y_8_rvv_i32:      2.2
abgr_to_y_128_c:         37.0
abgr_to_y_128_rvv_i32:    8.5
abgr_to_y_1080_c:       327.0
abgr_to_y_1080_rvv_i32:  69.5
abgr_to_y_1920_c:       552.0
abgr_to_y_1920_rvv_i32: 122.2
bgra_to_y_8_c:            2.5
bgra_to_y_8_rvv_i32:      2.2
bgra_to_y_128_c:         37.2
bgra_to_y_128_rvv_i32:    8.5
bgra_to_y_1080_c:       310.2
bgra_to_y_1080_rvv_i32:  69.5
bgra_to_y_1920_c:       568.2
bgra_to_y_1920_rvv_i32: 122.5

SpacemiT X60:
abgr_to_y_8_c:            2.5
abgr_to_y_8_rvv_i32:      2.0
abgr_to_y_128_c:         33.0
abgr_to_y_128_rvv_i32:    3.7
abgr_to_y_1080_c:       276.0
abgr_to_y_1080_rvv_i32:  31.5
abgr_to_y_1920_c:       493.7
abgr_to_y_1920_rvv_i32:  55.5
bgra_to_y_8_c:            2.2
bgra_to_y_8_rvv_i32:      2.0
bgra_to_y_128_c:         33.0
bgra_to_y_128_rvv_i32:    3.7
bgra_to_y_1080_c:       276.0
bgra_to_y_1080_rvv_i32:  31.5
bgra_to_y_1920_c:       490.7
bgra_to_y_1920_rvv_i32:  55.5
2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont
e0f4d185f1 sws/input: R-V V rgb24ToUV_half and bgr24ToUV_half
T-Head C908:
rgb24_to_uv_half_4_c:           2.0
rgb24_to_uv_half_4_rvv_i32:     3.5
rgb24_to_uv_half_64_c:         27.0
rgb24_to_uv_half_64_rvv_i32:   12.5
rgb24_to_uv_half_540_c:       223.7
rgb24_to_uv_half_540_rvv_i32: 105.2
rgb24_to_uv_half_640_c:       265.5
rgb24_to_uv_half_640_rvv_i32: 123.7
rgb24_to_uv_half_960_c:       414.5
rgb24_to_uv_half_960_rvv_i32: 249.5

SpacemiT X60:
rgb24_to_uv_half_4_c:           1.7
rgb24_to_uv_half_4_rvv_i32:     4.2
rgb24_to_uv_half_64_c:         24.0
rgb24_to_uv_half_64_rvv_i32:    8.7
rgb24_to_uv_half_540_c:       199.2
rgb24_to_uv_half_540_rvv_i32:  72.5
rgb24_to_uv_half_640_c:       235.7
rgb24_to_uv_half_640_rvv_i32:  85.2
rgb24_to_uv_half_960_c:       353.5
rgb24_to_uv_half_960_rvv_i32: 127.5
2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont
3ef5867e4b sws/input: R-V V rgb24ToUV and bgr24ToUV
T-Head C908:
rgb24_to_uv_8_c:            2.7
rgb24_to_uv_8_rvv_i32:      3.2
rgb24_to_uv_128_c:         41.0
rgb24_to_uv_128_rvv_i32:   12.7
rgb24_to_uv_1080_c:       342.5
rgb24_to_uv_1080_rvv_i32: 105.7
rgb24_to_uv_1280_c:       406.0
rgb24_to_uv_1280_rvv_i32: 124.2
rgb24_to_uv_1920_c:       626.0
rgb24_to_uv_1920_rvv_i32: 186.0

SpacemiT X60:
rgb24_to_uv_8_c:            2.5
rgb24_to_uv_8_rvv_i32:      3.0
rgb24_to_uv_128_c:         36.5
rgb24_to_uv_128_rvv_i32:    5.7
rgb24_to_uv_1080_c:       304.2
rgb24_to_uv_1080_rvv_i32:  49.0
rgb24_to_uv_1280_c:       360.5
rgb24_to_uv_1280_rvv_i32:  57.5
rgb24_to_uv_1920_c:       540.7
rgb24_to_uv_1920_rvv_i32:  86.2
2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont
79dfdac4db sws/input: R-V V rgb24ToY & bgr24ToY
T-Head C908:
rgb24_to_y_8_c:            2.0
rgb24_to_y_8_rvv_i32:      2.7
rgb24_to_y_128_c:         26.2
rgb24_to_y_128_rvv_i32:    9.2
rgb24_to_y_1080_c:       219.5
rgb24_to_y_1080_rvv_i32:  76.2
rgb24_to_y_1280_c:       276.2
rgb24_to_y_1280_rvv_i32:  89.7
rgb24_to_y_1920_c:       389.7
rgb24_to_y_1920_rvv_i32: 134.2

SpacemiT X60:
rgb24_to_y_8_c:            1.7
rgb24_to_y_8_rvv_i32:      2.2
rgb24_to_y_128_c:         23.2
rgb24_to_y_128_rvv_i32:    4.2
rgb24_to_y_1080_c:       195.0
rgb24_to_y_1080_rvv_i32:  33.7
rgb24_to_y_1280_c:       231.0
rgb24_to_y_1280_rvv_i32:  40.0
rgb24_to_y_1920_c:       346.2
rgb24_to_y_1920_rvv_i32:  59.7
2024-06-08 18:30:43 +03:00