Commit graph

235 commits

Author SHA1 Message Date
Andreas Kling
8bf1d749a1 LibJS: Suppress global identifier optimization for dynamic functions
Functions created via new Function() cannot assume that unresolved
identifiers refer to global variables, since they may be called in
an arbitrary scope. Pass a flag through the scope collector analysis
to suppress the global identifier optimization in this case.
2026-02-24 09:39:42 +01:00
Andreas Kling
f84d54971b LibJS: Restore ancestor scope flags after failed arrow parsing
When the parser speculatively tries to parse an arrow function
expression, encountering `this` inside a default parameter value
like `(a = this)` propagates uses_this flags to ancestor function
scopes via set_uses_this(). If the arrow attempt fails (no =>
follows), these flags were left behind, incorrectly marking
ancestor scopes as using `this`.

Fix this by saving and restoring the uses_this and
uses_this_from_environment flags on all ancestor function scopes
around speculative arrow function parsing.
2026-02-24 09:39:42 +01:00
Andreas Kling
8db93d0f2c LibJS: Register using declarations in scope collector for for-loops
The parser was not calling add_declaration() for using declarations in
for-loop initializers, causing the scope collector to miss them. This
meant identifiers declared with `using` in for-loops were incorrectly
resolved as globals instead of local variables.
2026-02-24 09:39:42 +01:00
Andreas Kling
6f60a6ba72 LibJS: Defer scope registration of object property identifiers
When parsing object expressions like {x: 42}, the parser was eagerly
registering the identifier "x" in the scope collector even though it's
only used as a property key and not a variable reference.

Fix this by deferring the registration until we know we're dealing with
a shorthand property ({x}), where the identifier is actually a variable
reference that needs to be captured by the scope collector.
2026-02-24 09:39:42 +01:00
Andreas Kling
d3a295a1e1 LibJS: Fix export default of parenthesized named class expression
For `export default (class Name { })`, two things were wrong:

The parser extracted the class expression's name as the export's
local binding name instead of `*default*`. Per the spec, this is
`export default AssignmentExpression ;` whose BoundNames is
`*default*`, not the class name.

The bytecode generator had a special case for ClassExpression that
skipped emitting InitializeLexicalBinding for named classes.

These two bugs compensated for each other (no crash, but wrong
behavior). Fix both: always use `*default*` as the local binding
name for expression exports, and always emit InitializeLexicalBinding
for the `*default*` binding.
2026-02-21 19:27:03 +01:00
Andreas Kling
190b127981 LibJS: Consume semicolons after import statements
Both return paths from parse_import_statement() were missing a call to
consume_or_insert_semicolon(), causing explicit semicolons to be left
unconsumed and parsed as spurious EmptyStatements.
2026-02-19 12:02:50 +01:00
Andreas Kling
0bd893d64f LibJS: Fix source range of TaggedTemplateLiteral in class extends
Use extends_start instead of rule_start so the TaggedTemplateLiteral
gets the source position of the extends expression, not the class
declaration.
2026-02-19 12:02:50 +01:00
Andreas Kling
0332af5c6d LibJS: Check lookahead before consuming async in static class method
When parsing class elements, after consuming the `static` keyword, the
parser would unconditionally consume `async` if it appeared next. This
meant that for `static async()`, where `async` is the method name (not
a modifier), the `async` token was consumed too early and its source
position was lost.

Fix this by applying the same lookahead check used for top-level async
detection: only consume `async` as a modifier if the following token is
not `(`, `;`, `}`, or preceded by a line terminator. This lets `async`
be parsed as a property key with its correct source position.
2026-02-19 02:45:37 +01:00
Andreas Kling
51639fd555 LibJS: Move labels_in_scope out of ParserState
Label scoping is already managed manually at function and arrow
function boundaries via move + ScopeGuard, and speculative parse
paths never modify labels before their potential failure points.

Move the HashMap to a direct Parser member so it is no longer
copied on every save_state()/load_state() cycle.
2026-02-10 02:05:20 +01:00
Andreas Kling
fa8f9a6b2c LibJS: Remove invalid_property_range HashMap from ParserState
Replace the HashMap<size_t, Position> used to communicate invalid
object literal property ranges from parse_object_expression() to
parse_expression() with a field on PrimaryExpressionParseResult.

The range now flows through the parser's return values instead of
being stashed in persistent ParserState, eliminating a HashMap that
was never cleared and got bulk-copied on every save_state() call
during speculative parsing.
2026-02-10 02:05:20 +01:00
Andreas Kling
46d49a378e LibJS: Remove const_cast from create_identifier_and_register
Create the Identifier as non-const so it can be passed directly to
register_identifier without a const_cast.
2026-02-10 02:05:20 +01:00
Andreas Kling
a8a1aba3ba LibJS: Replace ScopePusher with ScopeCollector
Replace the ScopePusher RAII class (which performed scope analysis
in its destructor chain during parsing) with a two-phase approach:

1. ScopeCollector builds a tree of ScopeRecord nodes during parsing
   via RAII ScopeHandle objects. It records declarations, identifier
   references, and flags, but does not resolve anything.

2. After parsing completes, ScopeCollector::analyze() walks the tree
   bottom-up and performs all resolution: propagate eval/with
   poisoning, resolve identifiers to locals/globals/arguments, hoist
   functions (Annex B.3.3), and build FunctionScopeData.

Key design decisions:
- ScopeRecord::ast_node is a RefPtr<ScopeNode> to prevent
  use-after-free when synthesize_binding_pattern re-parses an
  expression as a binding pattern (the original parse's scope records
  survive with stale AST node pointers).
- Parser::scope_collector() returns the override collector if set
  (for synthesize_binding_pattern's nested parser), ensuring all
  scope operations route to the outer parser's scope tree.
- FunctionNode::local_variables_names() delegates to its body's
  ScopeNode rather than copying at parse time, since analysis runs
  after parsing.
2026-02-10 02:05:20 +01:00
Andreas Kling
cf1413e4ee LibJS: Extract ScopePusher into its own file
Move ScopePusher from Parser.cpp into ScopePusher.h and
ScopePusher.cpp. No behavioral changes, pure code move.
2026-02-10 02:05:20 +01:00
Andreas Kling
40430d6087 LibJS: Detect direct eval calls in parse_call_expression()
The parser previously detected direct eval() calls at the end of
parse_expression(), by checking if the final expression was a
CallExpression with "eval" as the callee. This missed cases where
eval() appeared as a subexpression, e.g. `eval(code) | 0`, since
the final expression would be a BinaryExpression, not a
CallExpression.

Move the detection into parse_call_expression() where the
CallExpression is actually created. This ensures we always set the
contains_direct_call_to_eval flag regardless of surrounding
operators, so local variables are correctly placed in the
declarative environment where eval'd code can find them.
2026-02-07 18:05:41 +01:00
Luke Wilde
e5f5329f9d LibJS: Consume identifier when matching an identifier name in bindings
Otherwise we'll fail to recognise that the identifier token is invalid
(e.g. a keyword such as null) in a binding pattern and that it may not
be a binding pattern after all, but an object expression.

Fixes some scripts on Discord failing to parse.
2026-02-06 16:09:33 +01:00
Andreas Kling
5238841da2 LibJS: Mark named function expression identifiers at individual level
Previously, when parsing a named function expression like
`Oops = function Oops() { Oops }`, the parser set a group-level flag
`might_be_variable_in_lexical_scope_in_named_function_assignment` that
propagated to the parent scope. This incorrectly prevented ALL `Oops`
identifiers from being marked as global, including those outside the
function expression.

Fix this by marking identifiers individually using
`set_is_inside_scope_with_eval()` only for identifiers inside the
function scope. This allows identifiers outside the function expression
to correctly use GetGlobal/SetGlobal while identifiers inside still
use GetBinding (since they may refer to the function's name binding).
2026-01-27 10:58:39 +01:00
Andreas Kling
871d93355b LibJS: Stop propagating is_inside_scope_with_eval across functions
Previously, when a nested function contained eval(), the parser would
mark all identifiers in parent functions as "inside scope with eval".
This prevented those identifiers from being marked as global, forcing
them to use GetBinding instead of GetGlobal.

However, eval() can only inject variables into its containing function's
scope, not into parent function scopes. So a parent function's reference
to a global like `Number` should still be able to use GetGlobal even if
a nested function contains eval().

This change adds a new flag `m_eval_in_current_function` that propagates
through block scopes within the same function but stops at function
boundaries. This flag is used for marking identifiers, while the
existing `m_screwed_by_eval_in_scope_chain` continues to propagate
across functions for local variable deoptimization (since eval can
access closure variables).

Before: `new Number(42)` in outer() with eval in inner() -> GetBinding
After:  `new Number(42)` in outer() with eval in inner() -> GetGlobal
2026-01-27 10:58:39 +01:00
Andreas Kling
88d715fc68 LibJS: Eliminate HashMap operations in SFID by caching parser data
Cache necessary data during parsing to eliminate HashMap operations
in SharedFunctionInstanceData construction.

Before: 2 HashMap copies + N HashMap insertions with hash computations
After: Direct vector iteration with no hashing

Build FunctionScopeData for function scopes in the parser containing:
- functions_to_initialize: deduplicated var-scoped function decls
- vars_to_initialize: var decls with is_parameter/is_function_name
- var_names: HashTable for AnnexB extension checks
- Pre-computed counts for environment size calculation
- Flags for "arguments" handling

Add ScopeNode::ensure_function_scope_data() to compute the data
on-demand for edge cases that don't go through normal parser flow
(synthetic class constructors, static initializers, module wrappers).

Use this cached data directly in SFID with zero HashMap operations.
2026-01-25 23:08:36 +01:00
Jelle Raaijmakers
ae20ecf857 AK+Everywhere: Add Vector::contains(predicate) and use it
No functional changes.
2026-01-08 15:27:30 +00:00
Luke Wilde
d766e41c94 LibJS: Store tagged template literal raw strings as StringLiterals 2026-01-06 23:25:36 +01:00
Andreas Kling
d6fbde43f8 LibJS: Track eval() scope membership per-identifier
The previous fix prevented eval() in sibling function scopes from
affecting each other, but it still had a limitation: when identifiers
from multiple scopes were merged into the same identifier group at
Program scope, the presence of eval() anywhere would taint all
identifiers in the group.

This change tracks per-identifier whether it was inside a scope with
eval() in the scope chain. When a scope closes, if it contains eval()
or has eval() in its parent chain, each identifier in that scope is
marked with `is_inside_scope_with_eval`. At Program scope finalization,
only identifiers that are NOT marked can be optimized to global lookups.

This allows code like:
```js
var x = undefined;  // Can be optimized (program scope)
(function() {
    function walk() { undefined; }  // Cannot be optimized
    eval('');
})();
```

Before: Neither `undefined` could be optimized
After: The program-scope `undefined` is optimized, while the one inside
       the function with eval() correctly uses dynamic lookup.
2026-01-06 00:11:28 +01:00
Andreas Kling
1f5e209032 LibJS: Fix overly conservative eval() deopt for undeclared identifiers
The parser was preventing identifiers like `undefined` from being
optimized to global constants when eval() was present in any function
scope, even when eval() could not affect the identifier's binding.

This change makes the parser smarter by preventing eval() in one
function scope from affecting global identifier optimization in sibling
function scopes. The key insight is that eval() in function A cannot
shadow globals in function B.
2026-01-04 13:13:26 +01:00
Andreas Kling
63eccc5640 LibJS: Don't make extra copies of every JS function's source code
Instead, let functions have a view into the AST's SourceCode object's
underlying string data. The source string is kept alive by the AST, so
it's fine to have views into it as long as the AST exists.

Reduces memory footprint on my x.com home feed by 65 MiB.
2025-12-21 10:06:04 -06:00
Andreas Kling
ece0b72e3c LibJS: Don't set [[HomeObject]] for non-method object properties
This fixes an issue where we'd incorrectly retain objects via the
[[HomeObject]] slot. This common pattern was affected:

    Object.defineProperty(o, "foo", {
        get: function() { return 123; }
    });

Above, the object literal would get assigned to the [[HomeObject]]
slot even though "get" is not a "method" per the spec.

This frees about 30,000 objects on my x.com home feed.
2025-12-17 12:50:17 -06:00
Andreas Kling
fa44fd58d8 LibJS: Remove ParserState::lookahead_lexer
The lookahead lexer used by next_token() no longer needs to be kept
alive, since tokens created by Parser::next_token() now have any string
views guaranteed safe by the fact that they point into the one true
SourceCode provided by whoever set up the lexer.
2025-11-09 12:14:03 +01:00
Andreas Kling
0dacc94edd LibJS: Have JS::Lexer take a JS::SourceCode as input
This moves the responsibility of setting up a SourceCode object to the
users of JS::Lexer.

This means Lexer and Parser are free to use string views into the
SourceCode internally while working.

It also means Lexer no longer has to think about anything other than
UTF-16 (or ASCII) inputs. So the unit test for parsing various invalid
UTF-8 sequences is deleted here.
2025-11-09 12:14:03 +01:00
Andreas Kling
fdd9413e71 LibJS: Avoid some unnecessary ref count churn in parser 2025-11-09 12:14:03 +01:00
Andreas Kling
841fe0b51c LibJS: Don't store current token in both Lexer and Parser
Just give Parser a way to access the one stored in Lexer.
2025-11-09 12:14:03 +01:00
Andreas Kling
d3e8fbd9cd LibJS: Don't create unique FunctionParameters for every empty param set 2025-11-09 12:14:03 +01:00
Andreas Kling
72aa90312a LibJS: Make JS::Token::message an enum instead of a StringView
Just to make JS::Token a little smaller.
2025-11-09 12:14:03 +01:00
Andreas Kling
fb05063dde LibJS: Let bytecode instructions know whether they are in strict mode
This commits puts the strict mode flag in the header of every bytecode
instruction. This allows us to check for strict mode without looking at
the currently running execution context.
2025-10-29 21:20:10 +01:00
Andreas Kling
5a7b0a07cb LibJS: Mark global function declarations as globals
This allows us to use the GetGlobal and SetGlobal bytecode instructions
for them, enabling cached accesses.

2.62x speed-up on this Fibonacci program:

    function fib(n) {
        return n < 2 ? n : fib(n - 1) + fib(n - 2);
    }
    for (let i = 0; i < 50_000; ++i)
        fib(10);
2025-10-13 17:15:44 +02:00
Timothy Flynn
62d85dd90a LibJS: Port RegExp flags and patterns to UTF-16 2025-08-13 09:56:13 -04:00
Timothy Flynn
b955c9b2a9 LibJS: Port the Identifier AST (and related) nodes to UTF-16
This eliminates quite a lot of UTF-8 / UTF-16 churn.
2025-08-13 09:56:13 -04:00
Timothy Flynn
00182a2405 LibJS: Port the JS lexer and parser to UTF-16
This ports the lexer to UTF-16 and deals with the immediate fallout up
to the AST. The AST will be dealt with in upcoming commits.

The lexer will still accept UTF-8 strings as input, and will transcode
them to UTF-16 for lexing. This doesn't actually incur a new allocation,
as we were already converting the input StringView to a ByteString for
each lexer.

One immediate logical benefit here is that we do not need to know off-
hand how many UTF-8 bytes some special code points occupy. They all
happen to be a single UTF-16 code unit. So instead of advancing the
lexer by 3 positions in some cases, we can just always advance by 1.
2025-08-13 09:56:13 -04:00
Timothy Flynn
eb74781a2d LibJS: Keep the lookahead lexer alive after parsing its next token
Currently, the lexer holds a ByteString, which is always heap-allocated.
When we create a copy of the lexer for the lookahead token, that token
will outlive the lexer copy. The token holds a couple of string views
into the lexer's source string. This is fine for now, because the source
string will be kept alive by the original lexer.

But if the lexer were to hold a String or Utf16String, short strings
will be stored on the stack due to SSO. Thus the token will hold views
into released stack data. We need to keep the lookahead lexer alive to
prevent UAF on views into its source string.
2025-08-13 09:56:13 -04:00
ayeteadoe
2e2484257d LibJS: Enable EXPLICIT_SYMBOL_EXPORT and annotate minimum symbol set 2025-07-22 11:51:29 -04:00
ayeteadoe
539a675802 LibJS: Revert Enable EXPLICIT_SYMBOL_EXPORT
This reverts commit c14173f651. We
should only annotate the minimum number of symbols that external
consumers actually use, so I am starting from scratch to do that
2025-07-22 11:51:29 -04:00
Timothy Flynn
66006d3812 AK+LibJS: Extract some UTF-16 helpers for use in an outside class
An upcoming Utf16String will need access to these helpers. Let's make
them publicly available.
2025-07-03 09:51:56 -04:00
ayeteadoe
c14173f651 LibJS: Enable EXPLICIT_SYMBOL_EXPORT 2025-06-30 10:50:36 -06:00
Luke Wilde
f12b6b258f LibJS: Don't use presence of function params to identify function scope
Instead, we can just use the scope type to determine if a scope is a
function scope.

This fixes using `this` for parameter default values in arrow functions
crashing. This happened by `uses_this_from_environment` was not set in
`set_uses_this`, as it didn't think it was in a function scope whilst
parsing parameters.

Fixes closing modal dialogs causing a crash on https://www.ikea.com/

No test262 diff.

Reverts the functional part of 08cfd5f, because it was a workaround for
this issue.
2025-06-17 20:48:45 +02:00
Viktor Szépe
19f88f96dc Everywhere: Fix typos - act III 2025-06-16 14:20:48 +01:00
Aliaksandr Kalenik
dcfc515cd0 LibJS: Fix arrow function parsing bug
In the following example:
```js
const f = (i) => ({
    obj: { a: { x: i }, b: { x: i } },
    g: () => {},
});
```

The body of function `f` is initially parsed as an arrow function. As a
result, what is actually an object expression is interpreted as a formal
parameter with a binding pattern. Since duplicate identifiers are not
allowed in this context (`i` in the example), the parser generates an
error, causing the entire script to fail parsing.

This change ignores the "Duplicate parameter names in bindings" error
during arrow function parameter parsing, allowing the parser to continue
and recognize the object expression of the outer arrow function with an
implicit return.

Fixes error on https://chat.openai.com/
2025-05-26 12:44:21 +03:00
Aliaksandr Kalenik
db480b1f0c LibJS: Preserve information about local variables declaration kind
This is required for upcoming change where we want to emit ThrowIfTDZ
for assignment expressions only for lexical declarations.
2025-05-06 12:06:23 +02:00
Andreas Kling
bf1b754e91 LibJS: Optimize reading known-to-be-initialized var bindings
`var` bindings are never in the temporal dead zone (TDZ), and so we
know accessing them will not throw.

We now take advantage of this by having a specialized environment
binding value getter that doesn't check for exceptional cases.

1.08x speedup on JetStream.
2025-05-04 02:31:18 +02:00
Timothy Flynn
3867a192a1 LibJS: Update spec steps / links for the import-assertions proposal
This proposal has reached stage 4 and been merged into the main ECMA-262
spec. See:

4e3450e
2025-04-29 07:33:08 -04:00
Aliaksandr Kalenik
2d732b2251 LibJS: Skip allocating locals for arguments that allowed to be local
This allows us to get rid of instructions that move arguments to locals
and allocate smaller JS::Value vector in ExecutionContext by reusing
slots that were already allocated for arguments.

With this change for following function:
```js
function f(x, y) {
    return x + y;
}
```

we now produce following bytecode:
```
[   0]    0: Add dst:reg6, lhs:arg0, rhs:arg1
[  10]       Return value:reg6
```

instead of:
```
[   0]    0: GetArgument 0, dst:x~1
[  10]       GetArgument 1, dst:y~0
[  20]       Add dst:reg6, lhs:x~1, rhs:y~0
[  30]       Return value:reg6
```
2025-04-26 11:02:29 +02:00
Aliaksandr Kalenik
81a3bfd492 LibJS: Allow using locals if arguments is used in strict mode
Previously we blocked using locals for function arguments whenever
`arguments` was mentioned in function body, however, this is not
necessary in strict mode, where mutations to the arguments object are
not reflected in the function arguments and vice versa.
2025-04-25 21:08:24 +02:00
Aliaksandr Kalenik
7932091e02 LibJS: Allow using local variable for catch parameters
Local variables are faster to access and if all catch parameters are
locals we can skip lexical environment allocation.
2025-04-22 21:57:25 +02:00
Aliaksandr Kalenik
0f14c70252 LibJS: Use Identifier to represent CatchClause parameter names
By doing that we consistently use Identifier node for identifiers and
also enable mechanism that registers identifiers in a corresponding
ScopePusher for catch parameters, which is necessary for work in the
upcoming changes.
2025-04-22 21:57:25 +02:00