LibJS: Add bytecode validator scaffolding driven from Bytecode.def
The plan is to start caching compiled JS bytecode on disk. Before
loading anything from a cache we need confidence that the bytes are
structurally well-formed, since a corrupted or tampered-with cache
file could otherwise hand the interpreter an out-of-bounds jump or a
constant-pool index that points past the end of the table.
This commit lays down the scaffolding for that validator. The walker
lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so
that it can share the existing Bytecode.def-driven layout machinery
with the encoder. C++ calls into it through cbindgen, the same way
the rest of the Rust pipeline is wired up.
For now, the validator only does Pass 1: walk the byte stream,
verify each instruction is 8-byte aligned, the opcode byte is in
range, and the reported length keeps us inside the buffer. The
length lookup is generated from Bytecode.def so fixed-length and
variable-length instructions stay in sync with the rest of the
codegen automatically. Per-field bounds checks (operands, labels,
table indices, cache indices) and structural extras (basic block
offsets, exception handlers, source map) come in follow-up commits.
The validator runs after every successful compilation in debug and
sanitizer builds, gated on !NDEBUG || HAS_ADDRESS_SANITIZER, so we
get an extra sanity check on every executable the encoder produces
without paying for it in release builds. Failure trips a
VERIFY_NOT_REACHED with the offset, opcode, and error category
logged via dbgln().
2026-05-02 09:49:32 +02:00
|
|
|
/*
|
|
|
|
|
* Copyright (c) 2026-present, the Ladybird developers.
|
|
|
|
|
*
|
|
|
|
|
* SPDX-License-Identifier: BSD-2-Clause
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
#include <AK/Debug.h>
|
|
|
|
|
#include <AK/Format.h>
|
|
|
|
|
#include <AK/NumericLimits.h>
|
|
|
|
|
#include <AK/StringView.h>
|
|
|
|
|
#include <LibJS/Bytecode/Executable.h>
|
2026-05-02 10:53:15 +02:00
|
|
|
#include <LibJS/Bytecode/Instruction.h>
|
|
|
|
|
#include <LibJS/Bytecode/PutKind.h>
|
LibJS: Add bytecode validator scaffolding driven from Bytecode.def
The plan is to start caching compiled JS bytecode on disk. Before
loading anything from a cache we need confidence that the bytes are
structurally well-formed, since a corrupted or tampered-with cache
file could otherwise hand the interpreter an out-of-bounds jump or a
constant-pool index that points past the end of the table.
This commit lays down the scaffolding for that validator. The walker
lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so
that it can share the existing Bytecode.def-driven layout machinery
with the encoder. C++ calls into it through cbindgen, the same way
the rest of the Rust pipeline is wired up.
For now, the validator only does Pass 1: walk the byte stream,
verify each instruction is 8-byte aligned, the opcode byte is in
range, and the reported length keeps us inside the buffer. The
length lookup is generated from Bytecode.def so fixed-length and
variable-length instructions stay in sync with the rest of the
codegen automatically. Per-field bounds checks (operands, labels,
table indices, cache indices) and structural extras (basic block
offsets, exception handlers, source map) come in follow-up commits.
The validator runs after every successful compilation in debug and
sanitizer builds, gated on !NDEBUG || HAS_ADDRESS_SANITIZER, so we
get an extra sanity check on every executable the encoder produces
without paying for it in release builds. Failure trips a
VERIFY_NOT_REACHED with the offset, opcode, and error category
logged via dbgln().
2026-05-02 09:49:32 +02:00
|
|
|
#include <LibJS/Bytecode/Validator.h>
|
2026-05-02 10:53:15 +02:00
|
|
|
#include <LibJS/Runtime/Completion.h>
|
|
|
|
|
#include <LibJS/Runtime/Iterator.h>
|
LibJS: Add bytecode validator scaffolding driven from Bytecode.def
The plan is to start caching compiled JS bytecode on disk. Before
loading anything from a cache we need confidence that the bytes are
structurally well-formed, since a corrupted or tampered-with cache
file could otherwise hand the interpreter an out-of-bounds jump or a
constant-pool index that points past the end of the table.
This commit lays down the scaffolding for that validator. The walker
lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so
that it can share the existing Bytecode.def-driven layout machinery
with the encoder. C++ calls into it through cbindgen, the same way
the rest of the Rust pipeline is wired up.
For now, the validator only does Pass 1: walk the byte stream,
verify each instruction is 8-byte aligned, the opcode byte is in
range, and the reported length keeps us inside the buffer. The
length lookup is generated from Bytecode.def so fixed-length and
variable-length instructions stay in sync with the rest of the
codegen automatically. Per-field bounds checks (operands, labels,
table indices, cache indices) and structural extras (basic block
offsets, exception handlers, source map) come in follow-up commits.
The validator runs after every successful compilation in debug and
sanitizer builds, gated on !NDEBUG || HAS_ADDRESS_SANITIZER, so we
get an extra sanity check on every executable the encoder produces
without paying for it in release builds. Failure trips a
VERIFY_NOT_REACHED with the offset, opcode, and error category
logged via dbgln().
2026-05-02 09:49:32 +02:00
|
|
|
#include <LibJS/RustFFI.h>
|
|
|
|
|
|
|
|
|
|
namespace JS::Bytecode {
|
|
|
|
|
|
|
|
|
|
static StringView validation_error_kind_to_string(JS::FFI::ValidationErrorKind kind)
|
|
|
|
|
{
|
|
|
|
|
switch (kind) {
|
|
|
|
|
case JS::FFI::ValidationErrorKind::Ok:
|
|
|
|
|
return "Ok"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::BufferNotAligned:
|
|
|
|
|
return "BufferNotAligned"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::InstructionMisaligned:
|
|
|
|
|
return "InstructionMisaligned"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::UnknownOpcode:
|
|
|
|
|
return "UnknownOpcode"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::TruncatedInstruction:
|
|
|
|
|
return "TruncatedInstruction"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::InvalidLength:
|
|
|
|
|
return "InvalidLength"sv;
|
LibJS: Add per-field bytecode validation generated from Bytecode.def
Pass 2 of the validator now runs a per-instruction check that walks
each opcode's fields and verifies every reference points somewhere
sensible. Operand indices, label addresses, identifier/string/
property-key/regex table indices, cache indices, and trailing
operand arrays are all bound-checked against the values the C++
side carries on the Executable. Fields whose bound depends on an
enum variant count or other type information not present in
Bytecode.def are left for a follow-up.
The codegen lives in build.rs and reuses the existing layout
machinery from the bytecode_def crate, so each opcode gets a match
arm whose body reads each field at its known byte offset and calls
the right hand-written validate_* helper. Variable-length
instructions cross-check the count field against m_length before
iterating the trailing array, which guards against an attacker
sneaking a count that walks off the end of the instruction.
Note that the encoded operand format is a flat u32 index into the
runtime [registers | locals | constants | arguments] array, since
Operand::offset_index_by zeroes the 3-bit type tag during assembly.
The validator therefore range-checks the flat index rather than
reading the type tag and dispatching per kind.
The argument-count upper bound isn't tracked on Executable yet, so
arguments remain effectively unbounded; tightening that bound is
left for a later commit.
Cache pointer fields are validated only when before_cache_fixup is
true, since after the fixup pass they hold real pointers and must
be left alone. NewFunction and NewClass have plain u32 fields for
shared-function-data and class-blueprint indices; those are
recognized by name in the codegen so the indices still get
range-checked.
The error category enum is renumbered to drop the per-operand-kind
codes, since at the bytecode level we no longer differentiate.
2026-05-02 10:00:32 +02:00
|
|
|
case JS::FFI::ValidationErrorKind::OperandOutOfRange:
|
|
|
|
|
return "OperandOutOfRange"sv;
|
LibJS: Add bytecode validator scaffolding driven from Bytecode.def
The plan is to start caching compiled JS bytecode on disk. Before
loading anything from a cache we need confidence that the bytes are
structurally well-formed, since a corrupted or tampered-with cache
file could otherwise hand the interpreter an out-of-bounds jump or a
constant-pool index that points past the end of the table.
This commit lays down the scaffolding for that validator. The walker
lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so
that it can share the existing Bytecode.def-driven layout machinery
with the encoder. C++ calls into it through cbindgen, the same way
the rest of the Rust pipeline is wired up.
For now, the validator only does Pass 1: walk the byte stream,
verify each instruction is 8-byte aligned, the opcode byte is in
range, and the reported length keeps us inside the buffer. The
length lookup is generated from Bytecode.def so fixed-length and
variable-length instructions stay in sync with the rest of the
codegen automatically. Per-field bounds checks (operands, labels,
table indices, cache indices) and structural extras (basic block
offsets, exception handlers, source map) come in follow-up commits.
The validator runs after every successful compilation in debug and
sanitizer builds, gated on !NDEBUG || HAS_ADDRESS_SANITIZER, so we
get an extra sanity check on every executable the encoder produces
without paying for it in release builds. Failure trips a
VERIFY_NOT_REACHED with the offset, opcode, and error category
logged via dbgln().
2026-05-02 09:49:32 +02:00
|
|
|
case JS::FFI::ValidationErrorKind::OperandInvalid:
|
|
|
|
|
return "OperandInvalid"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::LabelNotAtInstructionBoundary:
|
|
|
|
|
return "LabelNotAtInstructionBoundary"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::IdentifierIndexOutOfRange:
|
|
|
|
|
return "IdentifierIndexOutOfRange"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::StringIndexOutOfRange:
|
|
|
|
|
return "StringIndexOutOfRange"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::PropertyKeyIndexOutOfRange:
|
|
|
|
|
return "PropertyKeyIndexOutOfRange"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::RegexIndexOutOfRange:
|
|
|
|
|
return "RegexIndexOutOfRange"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::PropertyLookupCacheIndexOutOfRange:
|
|
|
|
|
return "PropertyLookupCacheIndexOutOfRange"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::GlobalVariableCacheIndexOutOfRange:
|
|
|
|
|
return "GlobalVariableCacheIndexOutOfRange"sv;
|
2026-05-19 11:22:33 +02:00
|
|
|
case JS::FFI::ValidationErrorKind::EnvironmentCoordinateCacheIndexOutOfRange:
|
|
|
|
|
return "EnvironmentCoordinateCacheIndexOutOfRange"sv;
|
LibJS: Add bytecode validator scaffolding driven from Bytecode.def
The plan is to start caching compiled JS bytecode on disk. Before
loading anything from a cache we need confidence that the bytes are
structurally well-formed, since a corrupted or tampered-with cache
file could otherwise hand the interpreter an out-of-bounds jump or a
constant-pool index that points past the end of the table.
This commit lays down the scaffolding for that validator. The walker
lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so
that it can share the existing Bytecode.def-driven layout machinery
with the encoder. C++ calls into it through cbindgen, the same way
the rest of the Rust pipeline is wired up.
For now, the validator only does Pass 1: walk the byte stream,
verify each instruction is 8-byte aligned, the opcode byte is in
range, and the reported length keeps us inside the buffer. The
length lookup is generated from Bytecode.def so fixed-length and
variable-length instructions stay in sync with the rest of the
codegen automatically. Per-field bounds checks (operands, labels,
table indices, cache indices) and structural extras (basic block
offsets, exception handlers, source map) come in follow-up commits.
The validator runs after every successful compilation in debug and
sanitizer builds, gated on !NDEBUG || HAS_ADDRESS_SANITIZER, so we
get an extra sanity check on every executable the encoder produces
without paying for it in release builds. Failure trips a
VERIFY_NOT_REACHED with the offset, opcode, and error category
logged via dbgln().
2026-05-02 09:49:32 +02:00
|
|
|
case JS::FFI::ValidationErrorKind::TemplateObjectCacheIndexOutOfRange:
|
|
|
|
|
return "TemplateObjectCacheIndexOutOfRange"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::ObjectShapeCacheIndexOutOfRange:
|
|
|
|
|
return "ObjectShapeCacheIndexOutOfRange"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::ObjectPropertyIteratorCacheIndexOutOfRange:
|
|
|
|
|
return "ObjectPropertyIteratorCacheIndexOutOfRange"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::SharedFunctionDataIndexOutOfRange:
|
|
|
|
|
return "SharedFunctionDataIndexOutOfRange"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::ClassBlueprintIndexOutOfRange:
|
|
|
|
|
return "ClassBlueprintIndexOutOfRange"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::EnumOutOfRange:
|
|
|
|
|
return "EnumOutOfRange"sv;
|
2026-05-02 10:04:12 +02:00
|
|
|
case JS::FFI::ValidationErrorKind::BasicBlockOffsetInvalid:
|
|
|
|
|
return "BasicBlockOffsetInvalid"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::ExceptionHandlerStartInvalid:
|
|
|
|
|
return "ExceptionHandlerStartInvalid"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::ExceptionHandlerEndInvalid:
|
|
|
|
|
return "ExceptionHandlerEndInvalid"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::ExceptionHandlerHandlerInvalid:
|
|
|
|
|
return "ExceptionHandlerHandlerInvalid"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::ExceptionHandlerRangeInvalid:
|
|
|
|
|
return "ExceptionHandlerRangeInvalid"sv;
|
|
|
|
|
case JS::FFI::ValidationErrorKind::SourceMapOffsetInvalid:
|
|
|
|
|
return "SourceMapOffsetInvalid"sv;
|
LibJS: Add bytecode validator scaffolding driven from Bytecode.def
The plan is to start caching compiled JS bytecode on disk. Before
loading anything from a cache we need confidence that the bytes are
structurally well-formed, since a corrupted or tampered-with cache
file could otherwise hand the interpreter an out-of-bounds jump or a
constant-pool index that points past the end of the table.
This commit lays down the scaffolding for that validator. The walker
lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so
that it can share the existing Bytecode.def-driven layout machinery
with the encoder. C++ calls into it through cbindgen, the same way
the rest of the Rust pipeline is wired up.
For now, the validator only does Pass 1: walk the byte stream,
verify each instruction is 8-byte aligned, the opcode byte is in
range, and the reported length keeps us inside the buffer. The
length lookup is generated from Bytecode.def so fixed-length and
variable-length instructions stay in sync with the rest of the
codegen automatically. Per-field bounds checks (operands, labels,
table indices, cache indices) and structural extras (basic block
offsets, exception handlers, source map) come in follow-up commits.
The validator runs after every successful compilation in debug and
sanitizer builds, gated on !NDEBUG || HAS_ADDRESS_SANITIZER, so we
get an extra sanity check on every executable the encoder produces
without paying for it in release builds. Failure trips a
VERIFY_NOT_REACHED with the offset, opcode, and error category
logged via dbgln().
2026-05-02 09:49:32 +02:00
|
|
|
}
|
|
|
|
|
VERIFY_NOT_REACHED();
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-02 10:53:15 +02:00
|
|
|
// Variant counts for the C++ enums referenced by Bytecode.def fields. The
|
|
|
|
|
// static_asserts pin the last variant so adding a new one without bumping
|
|
|
|
|
// the count here breaks the build instead of silently outdating the
|
|
|
|
|
// validator.
|
|
|
|
|
static constexpr u32 completion_type_variant_count = to_underlying(Completion::Type::Throw) + 1;
|
|
|
|
|
static_assert(completion_type_variant_count == 6);
|
|
|
|
|
static constexpr u32 iterator_hint_variant_count = to_underlying(IteratorHint::Async) + 1;
|
|
|
|
|
static_assert(iterator_hint_variant_count == 2);
|
|
|
|
|
static constexpr u32 environment_mode_variant_count = to_underlying(Op::EnvironmentMode::Var) + 1;
|
|
|
|
|
static_assert(environment_mode_variant_count == 2);
|
|
|
|
|
static constexpr u32 put_kind_variant_count = to_underlying(PutKind::Own) + 1;
|
|
|
|
|
static_assert(put_kind_variant_count == 5);
|
|
|
|
|
static constexpr u32 arguments_kind_variant_count = to_underlying(Op::ArgumentsKind::Unmapped) + 1;
|
|
|
|
|
static_assert(arguments_kind_variant_count == 2);
|
|
|
|
|
|
2026-05-18 13:26:41 +02:00
|
|
|
ErrorOr<void> validate_bytecode(Executable const& executable, ReadonlySpan<u32> basic_block_offsets)
|
LibJS: Add bytecode validator scaffolding driven from Bytecode.def
The plan is to start caching compiled JS bytecode on disk. Before
loading anything from a cache we need confidence that the bytes are
structurally well-formed, since a corrupted or tampered-with cache
file could otherwise hand the interpreter an out-of-bounds jump or a
constant-pool index that points past the end of the table.
This commit lays down the scaffolding for that validator. The walker
lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so
that it can share the existing Bytecode.def-driven layout machinery
with the encoder. C++ calls into it through cbindgen, the same way
the rest of the Rust pipeline is wired up.
For now, the validator only does Pass 1: walk the byte stream,
verify each instruction is 8-byte aligned, the opcode byte is in
range, and the reported length keeps us inside the buffer. The
length lookup is generated from Bytecode.def so fixed-length and
variable-length instructions stay in sync with the rest of the
codegen automatically. Per-field bounds checks (operands, labels,
table indices, cache indices) and structural extras (basic block
offsets, exception handlers, source map) come in follow-up commits.
The validator runs after every successful compilation in debug and
sanitizer builds, gated on !NDEBUG || HAS_ADDRESS_SANITIZER, so we
get an extra sanity check on every executable the encoder produces
without paying for it in release builds. Failure trips a
VERIFY_NOT_REACHED with the offset, opcode, and error category
logged via dbgln().
2026-05-02 09:49:32 +02:00
|
|
|
{
|
|
|
|
|
JS::FFI::FFIValidatorBounds bounds {
|
|
|
|
|
.number_of_registers = executable.number_of_registers,
|
|
|
|
|
.number_of_locals = static_cast<u32>(executable.local_variable_names.size()),
|
|
|
|
|
.number_of_constants = static_cast<u32>(executable.constants.size()),
|
2026-05-02 10:49:32 +02:00
|
|
|
.number_of_arguments = executable.number_of_arguments,
|
LibJS: Add bytecode validator scaffolding driven from Bytecode.def
The plan is to start caching compiled JS bytecode on disk. Before
loading anything from a cache we need confidence that the bytes are
structurally well-formed, since a corrupted or tampered-with cache
file could otherwise hand the interpreter an out-of-bounds jump or a
constant-pool index that points past the end of the table.
This commit lays down the scaffolding for that validator. The walker
lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so
that it can share the existing Bytecode.def-driven layout machinery
with the encoder. C++ calls into it through cbindgen, the same way
the rest of the Rust pipeline is wired up.
For now, the validator only does Pass 1: walk the byte stream,
verify each instruction is 8-byte aligned, the opcode byte is in
range, and the reported length keeps us inside the buffer. The
length lookup is generated from Bytecode.def so fixed-length and
variable-length instructions stay in sync with the rest of the
codegen automatically. Per-field bounds checks (operands, labels,
table indices, cache indices) and structural extras (basic block
offsets, exception handlers, source map) come in follow-up commits.
The validator runs after every successful compilation in debug and
sanitizer builds, gated on !NDEBUG || HAS_ADDRESS_SANITIZER, so we
get an extra sanity check on every executable the encoder produces
without paying for it in release builds. Failure trips a
VERIFY_NOT_REACHED with the offset, opcode, and error category
logged via dbgln().
2026-05-02 09:49:32 +02:00
|
|
|
.identifier_table_size = static_cast<u32>(executable.identifier_table->identifiers().size()),
|
|
|
|
|
.string_table_size = static_cast<u32>(executable.string_table->size()),
|
|
|
|
|
.property_key_table_size = static_cast<u32>(executable.property_key_table->property_keys().size()),
|
|
|
|
|
// The regex table is not consulted at runtime; m_regex_index fields
|
|
|
|
|
// are skipped during validation.
|
|
|
|
|
.regex_table_size = 0,
|
|
|
|
|
.property_lookup_cache_count = static_cast<u32>(executable.property_lookup_caches.size()),
|
|
|
|
|
.global_variable_cache_count = static_cast<u32>(executable.global_variable_caches.size()),
|
2026-05-19 11:22:33 +02:00
|
|
|
.environment_coordinate_cache_count = static_cast<u32>(executable.environment_coordinate_caches.size()),
|
LibJS: Add bytecode validator scaffolding driven from Bytecode.def
The plan is to start caching compiled JS bytecode on disk. Before
loading anything from a cache we need confidence that the bytes are
structurally well-formed, since a corrupted or tampered-with cache
file could otherwise hand the interpreter an out-of-bounds jump or a
constant-pool index that points past the end of the table.
This commit lays down the scaffolding for that validator. The walker
lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so
that it can share the existing Bytecode.def-driven layout machinery
with the encoder. C++ calls into it through cbindgen, the same way
the rest of the Rust pipeline is wired up.
For now, the validator only does Pass 1: walk the byte stream,
verify each instruction is 8-byte aligned, the opcode byte is in
range, and the reported length keeps us inside the buffer. The
length lookup is generated from Bytecode.def so fixed-length and
variable-length instructions stay in sync with the rest of the
codegen automatically. Per-field bounds checks (operands, labels,
table indices, cache indices) and structural extras (basic block
offsets, exception handlers, source map) come in follow-up commits.
The validator runs after every successful compilation in debug and
sanitizer builds, gated on !NDEBUG || HAS_ADDRESS_SANITIZER, so we
get an extra sanity check on every executable the encoder produces
without paying for it in release builds. Failure trips a
VERIFY_NOT_REACHED with the offset, opcode, and error category
logged via dbgln().
2026-05-02 09:49:32 +02:00
|
|
|
.template_object_cache_count = static_cast<u32>(executable.template_object_caches.size()),
|
|
|
|
|
.object_shape_cache_count = static_cast<u32>(executable.object_shape_caches.size()),
|
|
|
|
|
.object_property_iterator_cache_count = static_cast<u32>(executable.object_property_iterator_caches.size()),
|
|
|
|
|
.class_blueprint_count = static_cast<u32>(executable.class_blueprints.size()),
|
|
|
|
|
.shared_function_data_count = static_cast<u32>(executable.shared_function_data.size()),
|
2026-05-02 10:53:15 +02:00
|
|
|
.completion_type_variant_count = completion_type_variant_count,
|
|
|
|
|
.iterator_hint_variant_count = iterator_hint_variant_count,
|
|
|
|
|
.environment_mode_variant_count = environment_mode_variant_count,
|
|
|
|
|
.put_kind_variant_count = put_kind_variant_count,
|
|
|
|
|
.arguments_kind_variant_count = arguments_kind_variant_count,
|
LibJS: Add bytecode validator scaffolding driven from Bytecode.def
The plan is to start caching compiled JS bytecode on disk. Before
loading anything from a cache we need confidence that the bytes are
structurally well-formed, since a corrupted or tampered-with cache
file could otherwise hand the interpreter an out-of-bounds jump or a
constant-pool index that points past the end of the table.
This commit lays down the scaffolding for that validator. The walker
lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so
that it can share the existing Bytecode.def-driven layout machinery
with the encoder. C++ calls into it through cbindgen, the same way
the rest of the Rust pipeline is wired up.
For now, the validator only does Pass 1: walk the byte stream,
verify each instruction is 8-byte aligned, the opcode byte is in
range, and the reported length keeps us inside the buffer. The
length lookup is generated from Bytecode.def so fixed-length and
variable-length instructions stay in sync with the rest of the
codegen automatically. Per-field bounds checks (operands, labels,
table indices, cache indices) and structural extras (basic block
offsets, exception handlers, source map) come in follow-up commits.
The validator runs after every successful compilation in debug and
sanitizer builds, gated on !NDEBUG || HAS_ADDRESS_SANITIZER, so we
get an extra sanity check on every executable the encoder produces
without paying for it in release builds. Failure trips a
VERIFY_NOT_REACHED with the offset, opcode, and error category
logged via dbgln().
2026-05-02 09:49:32 +02:00
|
|
|
};
|
|
|
|
|
|
2026-05-02 10:04:12 +02:00
|
|
|
// Project Executable's exception handlers down to plain offsets; the
|
|
|
|
|
// structural metadata's source-position parts aren't validated here.
|
|
|
|
|
Vector<JS::FFI::FFIExceptionHandlerOffsets> handler_offsets;
|
|
|
|
|
handler_offsets.ensure_capacity(executable.exception_handlers.size());
|
|
|
|
|
for (auto const& h : executable.exception_handlers) {
|
|
|
|
|
handler_offsets.append({
|
|
|
|
|
.start = static_cast<u32>(h.start_offset),
|
|
|
|
|
.end = static_cast<u32>(h.end_offset),
|
|
|
|
|
.handler = static_cast<u32>(h.handler_offset),
|
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
Vector<u32> source_map_offsets;
|
|
|
|
|
source_map_offsets.ensure_capacity(executable.source_map.size());
|
|
|
|
|
for (auto const& entry : executable.source_map)
|
|
|
|
|
source_map_offsets.append(entry.bytecode_offset);
|
|
|
|
|
|
|
|
|
|
JS::FFI::FFIValidatorExtras extras {
|
|
|
|
|
.basic_block_offsets = basic_block_offsets.data(),
|
|
|
|
|
.basic_block_count = basic_block_offsets.size(),
|
|
|
|
|
.exception_handlers = handler_offsets.data(),
|
|
|
|
|
.exception_handler_count = handler_offsets.size(),
|
|
|
|
|
.source_map_offsets = source_map_offsets.data(),
|
|
|
|
|
.source_map_count = source_map_offsets.size(),
|
|
|
|
|
};
|
|
|
|
|
|
LibJS: Add bytecode validator scaffolding driven from Bytecode.def
The plan is to start caching compiled JS bytecode on disk. Before
loading anything from a cache we need confidence that the bytes are
structurally well-formed, since a corrupted or tampered-with cache
file could otherwise hand the interpreter an out-of-bounds jump or a
constant-pool index that points past the end of the table.
This commit lays down the scaffolding for that validator. The walker
lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so
that it can share the existing Bytecode.def-driven layout machinery
with the encoder. C++ calls into it through cbindgen, the same way
the rest of the Rust pipeline is wired up.
For now, the validator only does Pass 1: walk the byte stream,
verify each instruction is 8-byte aligned, the opcode byte is in
range, and the reported length keeps us inside the buffer. The
length lookup is generated from Bytecode.def so fixed-length and
variable-length instructions stay in sync with the rest of the
codegen automatically. Per-field bounds checks (operands, labels,
table indices, cache indices) and structural extras (basic block
offsets, exception handlers, source map) come in follow-up commits.
The validator runs after every successful compilation in debug and
sanitizer builds, gated on !NDEBUG || HAS_ADDRESS_SANITIZER, so we
get an extra sanity check on every executable the encoder produces
without paying for it in release builds. Failure trips a
VERIFY_NOT_REACHED with the offset, opcode, and error category
logged via dbgln().
2026-05-02 09:49:32 +02:00
|
|
|
JS::FFI::FFIValidationError error {};
|
|
|
|
|
auto ok = rust_validate_bytecode(
|
|
|
|
|
executable.bytecode.data(),
|
|
|
|
|
executable.bytecode.size(),
|
|
|
|
|
&bounds,
|
2026-05-02 10:04:12 +02:00
|
|
|
&extras,
|
LibJS: Add bytecode validator scaffolding driven from Bytecode.def
The plan is to start caching compiled JS bytecode on disk. Before
loading anything from a cache we need confidence that the bytes are
structurally well-formed, since a corrupted or tampered-with cache
file could otherwise hand the interpreter an out-of-bounds jump or a
constant-pool index that points past the end of the table.
This commit lays down the scaffolding for that validator. The walker
lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so
that it can share the existing Bytecode.def-driven layout machinery
with the encoder. C++ calls into it through cbindgen, the same way
the rest of the Rust pipeline is wired up.
For now, the validator only does Pass 1: walk the byte stream,
verify each instruction is 8-byte aligned, the opcode byte is in
range, and the reported length keeps us inside the buffer. The
length lookup is generated from Bytecode.def so fixed-length and
variable-length instructions stay in sync with the rest of the
codegen automatically. Per-field bounds checks (operands, labels,
table indices, cache indices) and structural extras (basic block
offsets, exception handlers, source map) come in follow-up commits.
The validator runs after every successful compilation in debug and
sanitizer builds, gated on !NDEBUG || HAS_ADDRESS_SANITIZER, so we
get an extra sanity check on every executable the encoder produces
without paying for it in release builds. Failure trips a
VERIFY_NOT_REACHED with the offset, opcode, and error category
logged via dbgln().
2026-05-02 09:49:32 +02:00
|
|
|
&error);
|
|
|
|
|
if (ok)
|
|
|
|
|
return {};
|
|
|
|
|
|
|
|
|
|
auto kind = validation_error_kind_to_string(error.kind);
|
|
|
|
|
dbgln("Bytecode validation failed at offset {} (opcode {}): {}",
|
|
|
|
|
error.offset, error.opcode, kind);
|
|
|
|
|
return AK::Error::from_string_view(kind);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
}
|