ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-11-01 22:00:58 +00:00

Author	SHA1	Message	Date
Simon Wanner	7f3b457e62	LibTextCodec: Add EUC-KR decoder	2024-05-31 07:56:26 +02:00
Simon Wanner	ded6512ca8	LibTextCodec: Add Shift_JIS decoder	2024-05-31 07:56:26 +02:00
Simon Wanner	06f7c393b2	LibTextCodec: Add ISO-2022-JP decoder	2024-05-31 07:56:26 +02:00
Simon Wanner	45f0ae52be	LibTextCodec: Add EUC-JP decoder	2024-05-31 07:56:26 +02:00
Simon Wanner	9943bb1d8e	LibTextCodec: Add Big5 decoder	2024-05-31 07:56:26 +02:00
Simon Wanner	2ce61fe6ea	LibTextCodec: Add GBK/GB18030 decoder Includes changes from GB-18030-2022, which are not yet included in the Encoding Specification, but WebKit, Blink and WPT are already updated.	2024-05-31 07:56:26 +02:00
Simon Wanner	9ed52504ab	LibTextCodec: Delegate to process() in default validate() implementation	2024-05-31 07:56:26 +02:00
Simon Wanner	b79815c5a5	LibTextCodec: Add x-mac-cyrillic decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	07a9435da5	LibTextCodec: Add windows-1258 decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	275b89720b	LibTextCodec: Add windows-1257 decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	c76308c7e6	LibTextCodec: Add windows-1256 decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	eb9ed10573	LibTextCodec: Add windows-1253 decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	2d35687db0	LibTextCodec: Add windows-874 decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	1b6878b6ca	LibTextCodec: Add KOI8-U decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	1fd3a6f48c	LibTextCodec: Add ISO-8859-16 decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	3e882f26db	LibTextCodec: Sort checks in decoder_for mostly alphabetically Keeps checks for common encodings (Latin1 & UTF-*) at the top.	2024-05-27 20:50:50 +02:00
Simon Wanner	56241df604	LibTextCodec: Add ISO-8859-14 decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	4188e328ac	LibTextCodec: Add ISO-8859-13 decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	cc640f4363	LibTextCodec: Add ISO-8859-10 decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	d73220837e	LibTextCodec: Add ISO-8859-8(-I) decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	24028e353e	LibTextCodec: Add ISO-8859-7 decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	01c3b8091a	LibTextCodec: Add ISO-8859-6 decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	763d904ad5	LibTextCodec: Add ISO-8859-5 decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	c6b17320db	LibTextCodec: Add ISO-8859-4 decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	6c84edaaa2	LibTextCodec: Add ISO-8859-3 decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	fc783199f1	LibTextCodec: Add IBM866 decoder	2024-05-27 20:50:50 +02:00
Simon Wanner	96b3c35358	LibTextCodec: Implement table based decoders as SingleByteDecoder Instead of copy-pasting the implementation, let's use a single class. This "Single Byte Decoder" concept even exists in the Encoding Spec :^)	2024-05-27 20:50:50 +02:00
Michal Grich	7a6d84d036	LibTextCodec: Add Windows-1250 text decoder This commit is adding Windows-1250 decoding based on unicode.org mapping table.	2024-04-23 16:26:16 +02:00
Andreas Kling	3c039903fb	LibTextCodec+AK: Don't validate UTF-8 strings twice UTF8Decoder was already converting invalid data into replacement characters while converting, so we know for sure we have valid UTF-8 by the time conversion is finished. This patch adds a new StringBuilder::to_string_without_validation() and uses it to make UTF8Decoder avoid half the work it was doing.	2023-12-30 13:49:50 +01:00
Nico Weber	8f47acee6a	LibTextCodec: Add PDFDocEncoding decoder	2023-11-22 09:08:06 -07:00
Idan Horowitz	079c96376c	LibTextCodec: Support validating encoded inputs	2023-11-17 16:02:36 +01:00
Luke Wilde	eaa4048870	LibTextCodec: Add "get output encoding" from the Encoding specification	2023-06-19 06:12:26 +02:00
Timothy Flynn	00fa23237a	LibTextCodec: Change UTF-8's decoder to replace invalid code points The UTF-8 decoder will currently crash if it is provided invalid UTF-8 input. Instead, change its behavior to match that of all other decoders to replace invalid code points with U+FFFD. This is required by the web.	2023-05-12 05:47:36 +02:00
Andreas Kling	a504ac3e2a	Everywhere: Rename equals_ignoring_case => equals_ignoring_ascii_case Let's make it clear that these functions deal with ASCII case only.	2023-03-10 13:15:44 +01:00
Luke Wilde	e864444fe3	LibTextCodec/Latin1: Iterate over input string with u8 instead of char Using char causes bytes equal to or over 0x80 to be treated as a negative value and produce incorrect results when implicitly casting to u32. For example, `atob` in LibWeb uses this decoder to convert non-ASCII values to UTF-8, but non-ASCII values are >= 0x80 and thus produces incorrect results in such cases: ```js Uint8Array.from(atob("u660"), c => c.charCodeAt(0)); ``` This used to produce [253, 253, 253] instead of [187, 174, 180]. Required by Cloudflare's IUAM challenges.	2023-02-28 08:46:06 +00:00
Sam Atkins	2db168acc1	LibTextCodec+Everywhere: Port Decoders to new Strings	2023-02-19 17:15:47 +01:00
Sam Atkins	3c5090e172	LibTextCodec: Return Optional<Decoder&> from `bom_sniff_to_decoder()`	2023-02-19 17:15:47 +01:00
Sam Atkins	f2a9426885	LibTextCodec+Everywhere: Return Optional<Decoder&> from `decoder_for()`	2023-02-19 17:15:47 +01:00
Sam Atkins	d6075ef5b5	LibTextCodec+Everywhere: Make TextCodec::decoder_for() take a StringView We don't need a full String/DeprecatedString inside this function, so we might as well not force users to create one.	2023-02-15 12:48:26 -05:00
Nico Weber	eac2b2382c	LibTextCodec: Add a MacRoman decoder Allows displaying `<meta charset="x-mac-roman">` html files. (`:set fenc=macroman`, `:w` in vim to save in that encoding.)	2023-01-24 14:37:20 +00:00
Nico Weber	b14b5a4d06	LibTextCodec: Simplify Latin1Decoder::process() a tiny bit	2023-01-24 14:37:20 +00:00
Nico Weber	3423b54eb9	LibTextCodec: Make utf-16be and utf-16le codecs actually work There were two problems: 1. They didn't handle surrogates 2. They used signed chars, leading to eg 0x00e4 being treated as 0xffe4 Also add a basic test that catches both issues. There's some code duplication with Utf16CodePointIterator::operator*(), but let's get things working first.	2023-01-22 21:30:44 +00:00
Linus Groh	57dc179b1f	Everywhere: Rename to_{string => deprecated_string}() where applicable This will make it easier to support both string types at the same time while we convert code, and tracking down remaining uses. One big exception is Value::to_string() in LibJS, where the name is dictated by the ToString AO.	2022-12-06 08:54:33 +01:00
Linus Groh	6e19ab2bbc	AK+Everywhere: Rename String to DeprecatedString We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)	2022-12-06 08:54:33 +01:00
sin-ack	3f3f45580a	Everywhere: Add sv suffix to strings relying on StringView(char const) Each of these strings would previously rely on StringView's char const constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.	2022-07-12 23:11:35 +02:00
Idan Horowitz	086969277e	Everywhere: Run clang-format	2022-04-01 21:24:45 +01:00
Karol Kosek	b006a60366	LibTextCodec: Pass code points instead of bytes on UTF-8 string process Previously we were passing raw UTF-8 bytes as code points, which caused CSS content properties to display incorrect characters. This makes bullet separators in Wikipedia templates display correctly.	2022-03-29 01:01:32 +02:00
Hendiadyoin1	6a95df2526	LibTextCodec: Don't allocate Strings on encoding normalisation This ripples down to LibWeb's HTML and XHR decoders, which therefore become less allocation heavy.	2022-03-21 10:48:17 +01:00
Jelle Raaijmakers	9c2a7c0e03	LibTextCodec: Add support for the UTF16-LE encoding	2022-03-08 14:51:06 +01:00
Luke Wilde	0e0f98a45e	LibTextCodec: Add x-user-defined decoder It's a pretty simple charset: the bottom 128 bytes (0x00-0x7F) are standard ASCII, while the top 128 bytes (0x80-0xFF) are mapped to a portion of the Unicode Private Use Area, specifically 0xF780-0xF7FF. This is used by Google Maps for certain blobs.	2022-02-12 12:53:28 +01:00

1 2

70 commits