Delete Lexer.cpp/h and Token.cpp, replacing all tokenization with a
new rust_tokenize() FFI function that calls back for each token.
Rewrite SyntaxHighlighter.cpp and js.cpp REPL to use the Rust
tokenizer. The token type and category enums in Token.h now mirror
the Rust definitions in token.rs.
Move is_syntax_character/is_whitespace/is_line_terminator helpers
into RegExpConstructor.cpp as static functions, since they were only
used there.
Trivia is whatever whitespace and comments appear before a token.
Previously this was always given a TokenCategory of Invalid, so it
would be displayed as an error in the view-source page, with red wiggly
underlines. Instead, treat it as what it actually is: whitespace and
comments!
The previous code to determine the SourceDocument's lines was too naive:
the source text can contain other newline characters and sequences, and
the HTML/CSS/JS syntax highlighters would take those into account when
determining what line a token is on. This disagreement would cause
incorrect highlighting, or even crashes, if the source didn't solely use
`\n` for its newlines.
In order to have everyone agree on what a line is, this patch first
processes the source to replace all newlines with `\n`. The need to
copy the source like this is unfortunate, but viewing the source is a
rare enough action that this should not cause any noticeable
performance problems.
As the callers have a String, and we want a String, this also changes
the function parameters to keep the source as a String instead of
converting it to StringView and back.
Fixes https://github.com/LadybirdBrowser/ladybird/issues/3169