The Rust FFI requires UTF-16 source data, so ASCII-stored source code
must be widened to UTF-16. Previously, this conversion was done into a
temporary buffer on every call to compile_function, meaning the entire
source file was converted for each lazily-compiled function. For large
modules with many functions, this caused heavy spinning.
Move the conversion into SourceCode::utf16_data() which lazily converts
and caches the result once per source file. Subsequent compilations of
functions from the same file reuse the cached data.
We were spending way too much time converting unrealized source ranges
into line/column pairs on real web content.
This improves JS parsing speed on x.com by 1.13x
This ports the lexer to UTF-16 and deals with the immediate fallout up
to the AST. The AST will be dealt with in upcoming commits.
The lexer will still accept UTF-8 strings as input, and will transcode
them to UTF-16 for lexing. This doesn't actually incur a new allocation,
as we were already converting the input StringView to a ByteString for
each lexer.
One immediate logical benefit here is that we do not need to know off-
hand how many UTF-8 bytes some special code points occupy. They all
happen to be a single UTF-16 code unit. So instead of advancing the
lexer by 3 positions in some cases, we can just always advance by 1.
The source code position cache was moved from a line based approach
to a "chunk"-based approach to improve performance on large, minified
JavaScript files with few lines, but this has had an adverse effect
on _multi-line_ source files.
Reintroduce some of the old behaviour by caching lines again, with
some added sanity limits to avoid caching empty/overly small lines.
Source code positions in files with few lines will still be cached
less often, since minified JavaScript files can be assumed to be
unusually large, and since stack traces for minified JavaScript
are less useful as well.
On WPT tests with large JavaScript dependencies like
`css/css-masking/animations/clip-interpolation.html` this reduces the
amount of time spent in `SourceCode::range_from_offsets` by as much as
99.98%, for the small small price of 80KB extra memory usage.