Store source map locations as bytecode offset, line, and column.
Runtime consumers only emit the start line and column, so source end
positions and source text offsets do not need to be carried through
Executable source maps, bytecode cache serialization, or the Rust FFI.
Keep SourceCode's internal position cache able to track source text
offsets so callers can still translate source offsets to line and
column pairs when needed. Hash dump-bytecode IDs from the name, first
source position, and bytecode size instead of source slices that need
end offsets.
Bump the bytecode cache format version for the slimmer serialized
source map entry shape.
The Rust FFI requires UTF-16 source data, so ASCII-stored source code
must be widened to UTF-16. Previously, this conversion was done into a
temporary buffer on every call to compile_function, meaning the entire
source file was converted for each lazily-compiled function. For large
modules with many functions, this caused heavy spinning.
Move the conversion into SourceCode::utf16_data() which lazily converts
and caches the result once per source file. Subsequent compilations of
functions from the same file reuse the cached data.
We were spending way too much time converting unrealized source ranges
into line/column pairs on real web content.
This improves JS parsing speed on x.com by 1.13x
This ports the lexer to UTF-16 and deals with the immediate fallout up
to the AST. The AST will be dealt with in upcoming commits.
The lexer will still accept UTF-8 strings as input, and will transcode
them to UTF-16 for lexing. This doesn't actually incur a new allocation,
as we were already converting the input StringView to a ByteString for
each lexer.
One immediate logical benefit here is that we do not need to know off-
hand how many UTF-8 bytes some special code points occupy. They all
happen to be a single UTF-16 code unit. So instead of advancing the
lexer by 3 positions in some cases, we can just always advance by 1.
The source code position cache was moved from a line based approach
to a "chunk"-based approach to improve performance on large, minified
JavaScript files with few lines, but this has had an adverse effect
on _multi-line_ source files.
Reintroduce some of the old behaviour by caching lines again, with
some added sanity limits to avoid caching empty/overly small lines.
Source code positions in files with few lines will still be cached
less often, since minified JavaScript files can be assumed to be
unusually large, and since stack traces for minified JavaScript
are less useful as well.
On WPT tests with large JavaScript dependencies like
`css/css-masking/animations/clip-interpolation.html` this reduces the
amount of time spent in `SourceCode::range_from_offsets` by as much as
99.98%, for the small small price of 80KB extra memory usage.