feat(react_compiler): vendor the React Compiler on a native oxc AST (no Babel AST)#23683
Draft
Boshen wants to merge 79 commits into
Draft
feat(react_compiler): vendor the React Compiler on a native oxc AST (no Babel AST)#23683Boshen wants to merge 79 commits into
Boshen wants to merge 79 commits into
Conversation
Merging this PR will improve performance by 24%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | react_compiler[App.tsx] |
33.4 ms | 39.6 ms | -15.76% |
| ❌ | Simulation | linter[App.tsx] |
108.1 ms | 126 ms | -14.15% |
| ❌ | Simulation | linter[kitchen-sink.tsx] |
219 ms | 231 ms | -5.2% |
| ⚡ | Simulation | react_compiler[RadixUIAdoptionSection.jsx] |
8.4 ms | 3.9 ms | ×2.1 |
| ⚡ | Simulation | linter[RadixUIAdoptionSection.jsx] |
9 ms | 4.5 ms | +99.87% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing vendor-react-compiler (c18d084) with main (10b96c6)
Footnotes
-
9 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
5f2ee1c to
87b175e
Compare
…ompiler Collapse the upstream multi-crate `react_compiler` workspace into modules of `oxc_react_compiler` and replace cross-crate dependencies with normal in-crate module paths. - Move the 12 `react_compiler*` crates (114 files) into `src/<name>/`, with each `lib.rs` becoming `mod.rs`. - Rewrite `crate::` self-references and `react_compiler_x::` cross-references to `crate::react_compiler_x::`. - Drop the `forked_react_compiler*` crates.io dependencies; add the `serde`, `serde-transcode`, and `hmac-sha256` deps they relied on, plus the `serde_json` `unbounded_depth` feature. - Relax clippy on the vendored modules in `lib.rs` so the source stays byte-identical to upstream; the hand-written conversion code stays linted.
Since the oxc integration converts oxc AST -> react_compiler AST -> oxc AST and never touches the babel JSON format, the serde/JSON layer was dead weight. Drop it entirely (serde, serde_json, serde-transcode deps and all code). - RawNode no longer wraps a `serde_json::RawValue`; it carries pre-extracted typed metadata (identifiers with span/loc, `contains_hook_or_jsx`, the type tag, source span and a coarse type classification). - convert_ast extracts that metadata straight from the oxc AST (via `oxc_ast_visit`) instead of building babel JSON. - The loc index drops the full-AST `serde_json::to_value(&body)` walk; the AST walker gains a `visit_raw_node` hook to reach type-annotation/class-body identifiers. - convert_ast_reverse re-emits TS types by re-parsing their source span and applying identifier renames (e.g. `typeof x` -> `typeof x_0`) as text edits, replacing the hand-written JSON -> oxc type converters. - program.rs / build_hir read the typed metadata instead of walking JSON. - All serde derives, `#[serde]` attributes and manual impls are removed. Net -1100 lines; all crate tests pass and clippy/doc are clean.
The HIR / reactive-function debug printers only run when `PluginOptions.debug` is set (the `debugLogIRs` path), yet they were always compiled in — ~113 KiB of the release binary. Put them behind a non-default `debug` feature. The three printer modules (`react_compiler_hir::print`, `react_compiler::debug_print`, `react_compiler_reactive_scopes::print_reactive_function`) are now `#[cfg]`-gated, with tiny stubs for the feature-off case that keep the exact signatures the pipeline's `if debug_enabled` blocks call — so the ~40 call sites are untouched and only no-op when the feature is off. CI builds with `--all-features`, so the real printers are still compiled, tested and linted. Stripped release binary: -113 KiB (3.73 -> 3.62 MiB).
The vendored React Compiler core uses legitimate short identifiers (`pn`, `oce`, `ome`, `froms`, ...) that `typos` flags as misspellings. Exclude the `src/react_compiler*` modules (kept close to upstream); the hand-written conversion code stays spell-checked.
… not yet compiling) Stage 1a of removing the Babel-shaped AST: the lowering module now consumes the oxc AST directly instead of the converted Babel AST. - FunctionNode re-pointed to oxc Function/ArrowFunctionExpression - new source_loc::LineOffsets (oxc span -> HIR SourceLocation, byte-identical to convert_ast's table), threaded through HirBuilder - build_hir.rs reduced to the orchestration skeleton on oxc (lower/lower_inner/lower_block_statement*); the big expression/statement matches are catch-all stubs, with arms ported incrementally next - identifier_loc_index + find_context_identifiers stubbed to empty pending their oxc walks Does NOT compile yet: the discovery layer (program.rs/pipeline.rs/fixture_utils/validate_source_locations) still builds FunctionNode from Babel nodes (24 cargo errors). WIP checkpoint on the vendor-react-compiler branch.
…tage 1a Milestone 0) The crate now compiles and runs end-to-end with lowering driven by the oxc AST. Every expression/statement arm currently bails (Primitive undefined / no-op); arms are filled in next, with the differential driving correctness. - span-bridge: lib.rs builds a node_id -> oxc FunctionNode map from oxc_semantic and threads it through compile_program -> process_fn -> try_compile_function, so the still-Babel function discovery hands the oxc node to lower(). FunctionNode is now Copy; CompileSource stores original_kind instead of a Babel fn_node. - compile_fn builds LineOffsets from the source and passes it to lower(); compile_outlined_fn stubbed (outlining re-lowers a synthesized Babel fn; unreachable while bailing). - fixture_utils::extract_function + validate_source_locations collection stubbed (dead / off-by-default; re-port to oxc later). Differential vs baseline over 1796 upstream fixtures: same=1438 diff=358 (the 358 are real-component fixtures awaiting arm fills). Back-end still emits the Babel AST via convert_ast_reverse (Stage 2).
…a arm-fill) lower_expression now handles Identifier, Null/Boolean/Numeric/String literals, BinaryExpression, UnaryExpression (delete TODO), and LogicalExpression on the oxc AST; re-added lower_identifier (AST-agnostic) + oxc binary/unary operator converters. Remaining arms still bail via the catch-all. Differential unchanged at same=1438 diff=358 (no regression). The metric stays flat until member/call/jsx/return/var-decl reach the critical mass a full component needs.
…ments (arm-fill) lower_expression: Static/Computed/PrivateField member access (lower_member_expression over oxc's 3 member kinds), CallExpression (method vs regular via lower_arguments). lower_statement: empty/debugger/expression/return/throw/block (was a no-op stub). Differential: same=1438 -> 1450 (+12 fixtures), diff 358 -> 346.
Added lower_binding_assignment (BindingIdentifier -> StoreLocal/StoreContext; destructuring/default deferred) + re-added lower_identifier_for_assignment, and the VariableDeclaration statement arm (with-init). Differential same=1450 -> 1454.
…quence/template/new/await/etc.) to oxc
…capture analysis)
… (arm-fill residual)
The de-Babel skeleton left Expression::RegExpLiteral falling through to the catch-all (Primitive(Undefined)), so a memoized regex like `str.replace(/:/g, '')` emitted `str.replace(undefined, '')` — silently wrong. Port the lowering arm to emit InstructionValue::RegExpLiteral (pattern text + flags via to_inline_string). Also exclude tasks/react_compiler_compare from the cargo workspace glob.
ox_binding_pattern_to_assignment_target only handled BindingIdentifier and
raised an invariant for Object/Array patterns, so any component whose codegen
needed a destructuring assignment (e.g. `({a: t1, ...rest} = t0)` from a
destructured param with a default + rest) hit the empty-body fallback shim and
compiled to `() => {}`. Port the recursive BindingPattern -> AssignmentTarget
conversion (object/array targets, rest, defaults, shorthand identifiers).
The Delete unary arm bailed to Primitive(Undefined), so `delete obj.x` / `delete obj[k]` were dropped. Port it to emit PropertyDelete (static member) and ComputedDelete (computed member); non-member delete targets record a syntax error as before.
The ImportExpression arm synthesized a 6-char (`import` keyword) span for the "Handle Import expressions" Todo, but Babel's Import node carried the loc of the whole `import(...)` expression. Pass the full imp.span so the diagnostic label matches the original Rust.
The parser sets pife=true on parenthesized function/arrow expressions so codegen
preserves the source parens. Non-recompiled code kept that flag through clone_in,
so oxc emitted callee parens (`(async function(){})()`) the original Babel path
never produced. Clear pife on the spliced program via a VisitMut pass.
The ClassDeclaration statement arm recorded the unsupported error but did not allocate the temporary the original lowered (as UnsupportedNode), so IdentifierId numbering drifted and tripped an InferMutationAliasingEffects invariant downstream. Emit the same Primitive::Undefined placeholder used at the other former-UnsupportedNode sites to keep numbering aligned.
The original pipeline decoded JSX text entities on parse and re-encoded them on codegen. The de-Babeled path only ran that round-trip for recompiled JSX; passed-through (non-recompiled) JSX text was left raw, so e.g. `>e;` was not re-escaped to `&gte;`. Run the same decode->encode over every JSXText in the spliced program so passed-through text matches the original Rust compiler.
The old Babel scope analysis did not record pure type-position references that live in a variable-declarator type annotation (`const v: T`), so they never fed the hoisting analysis. The de-Babeled path reads oxc_semantic directly, which DOES record them, causing the hoisting scan to treat a type parameter `T` as a hoistable "Unknown" binding (referenced before declared, since type params are not statements) and bail with "Unsupported declaration type for hoisting" — passing the whole generic function through verbatim instead of recompiling it. Mirror the old path: when building the scope reference map, skip pure-type references (`is_type() && !is_value()`) whose structural host is a VariableDeclarator. Parameter/return annotations and `as`/`satisfies` casts are kept (the old path recorded those too), so only the declarator-annotation case is excluded. Resolves 13 ecosystem mismatches with no fixture or ecosystem regressions.
Extends the declarator-annotation fix: the old Babel scope analysis also did not record pure type-position references that appear as type arguments on a call or new expression (`obj.get<T>()`, `new Foo<T>()`). Like declarator annotations, these must not feed the hoisting scan, else a type parameter is mis-hoisted as an "Unknown" binding and the generic function bails. Resolves 13 more ecosystem mismatches (71 -> 58), no fixture or ecosystem regressions; all type-position probes still match the baseline.
Extends the type-reference scope fix to JSX element type arguments (`<Table<T> .../>`), another position the old Babel scope analysis did not record. Resolves the final type-parameter hoisting mismatches (58 -> 47), completing the 37-file generic-function hoisting class with no fixture or ecosystem regressions.
`(obj.x as T)++` (and `satisfies`/`!`/`<T>` casts) bailed with "UpdateExpression with unsupported argument type" because the lowering only handled bare member/identifier targets. TS casts are transparent for an update target, so unwrap them to the inner member expression and lower as a normal member update (matching the original, which stripped the casts). Resolves 1 ecosystem mismatch; no fixture or other regressions.
Run just fmt over the react_compiler de-Babel work and the compare tool so the branch passes the CI format check. Formatting only, no behavior change.
typos-cli flagged the short locals `ba`/`ot` in the compare tool as misspellings; rename to descriptive names (baseLines/oxcLines, baseTemps/ oxcTemps) so the CI typos check passes. No behavior change.
…orary The compound-assignment (`x += y`) identifier branch read the LHS via bare `lower_identifier` (the binding Place) instead of through a LoadLocal/LoadContext temporary like every other identifier read (and like the original Babel-path lowering did via `lower_expression_to_temporary`). Without the load instruction, ConstantPropagation could not substitute a known value into the compound op (`x = "" + y` became `x = x + y`), and the missing temporary shifted IdentifierId numbering away from the baseline. Emit the load temporary to match. Resolves 12 ecosystem mismatches (46 -> 34), no fixture or other regressions.
…ntainer
Outlined FunctionDeclarations were inserted only when the original function was a
direct `program.body` statement; nested originals (e.g. a component declared
inside a `describe(() => { ... })` callback) fell back to appending at module top
level, so the outlined helpers landed in the wrong place. Replace the
program-body-only search with a VisitMut over every statement list (function
bodies, blocks, and arrow/function argument bodies) so the outlined declarations
are inserted right after the original wherever it is nested, mirroring Babel's
path-based `insertAfter`. Resolves 7 ecosystem mismatches (34 -> 27), no fixture
or other regressions.
oxc parses `a?.b!` as `ChainExpression(TSNonNull(member))` (the assertion nested inside the chain), which codegen prints as `a?.b!`. The original Babel round-trip produced `TSNonNull(Paren(Chain(member)))`, printed as `(a?.b)!`. Add a splice-time visitor that rewrites the former into the latter so passed-through non-null assertions over optional chains match the baseline. Resolves 6 ecosystem mismatches (27 -> 21), no fixture or other regressions.
`expression_type_name` reported every `ChainExpression` as "OptionalMemberExpression", but Babel distinguished an optional call (`a?.b()` -> OptionalCallExpression) from an optional member access. Inspect the chain head element so the "cannot be safely reordered" Todo matches the baseline wording. Resolves 1 ecosystem mismatch (21 -> 20), no fixture or other regressions.
The de-Babel refactor deleted `set_raw_type_renames` (rename recording) and stopped populating `RawNode.idents` (passed `Vec::new()`), but kept `RawNode`, `env.renames`, `env.reference_node_ids`, and the type re-parse (`ox_reparse_ts_type`). So re-parsed `as`/`typeof` types kept pre-rename names while the value binding was renamed (e.g. `typeof field` instead of `typeof field_3`). Restore both halves: collect the type's identifier references into `RawNode.idents` during lowering (`collect_type_idents`), and in `ox_reparse_ts_type` apply renames as right-to-left text edits for idents that are real references (`reference_node_ids`) with a matching nearest-enclosing binding rename — porting the old `set_raw_type_renames` + `convert_type_from_raw` logic. `raw.idents` is consumed only here, so no other pass is affected. Resolves 2 ecosystem mismatches (20 -> 18), no fixture or other regressions.
The react_compiler_compare differential tool is a standalone dev CLI (like tasks/coverage) that intentionally uses `console` for output; exclude it from oxlint's no-console/curly rules so CI's lint job passes.
…s in codegen Thread a lifetime `'a` through the HIR and reactive layers so `InstructionValue::TypeCastExpression` stores the borrowed oxc `&'a TSType<'a>` node directly instead of source-span metadata (`RawNode`). Lowering now stashes the AST node as-is, and codegen `clone_in`s it into the output allocator with no parser in the common case. The text-edit + reparse path is kept only as a fallback for the rare case where an identifier inside the type was renamed by a binding rename. This removes the per-TS-type parser invocation that codegen previously performed on every type-cast, the main source of the codegen-side cost. Output is byte-identical: differential over the 1795 upstream fixtures is unchanged (the single remaining diff is the pre-existing `ts-enum-inline.tsx` case, unrelated to type casts).
c1e7d2a to
e5650b7
Compare
…d output Inline TS `enum` declarations inside a component/hook body have runtime semantics, but the de-Babeled lowering dropped them (the original captured them via the now-removed `UnsupportedNode`), so the compiled output lost the `enum` and its references became undeclared globals. Add a value-less `InstructionValue::PassthroughStatement` variant holding the borrowed oxc statement node: lowering emits it for `TSEnumDeclaration` at its original position, and codegen clones it into the output allocator verbatim. Modeled on the existing `Debugger` instruction (no operands, no value, retained through dead-code elimination). This brings the upstream fixture suite to byte-identical parity with the pre-de-Babel build: differential `same=1796 diff=0` (was 1795/1), and fixes one inline-enum ecosystem file (18 -> 17).
`transform` cloned the entire source to an owned `String` (`options.source_code = source_text.to_string()`) BEFORE the early-bail checks, so every file with no React-like functions paid a full-source copy for nothing. On large no-component files (e.g. `binder.ts`) this dominated: ~7x slower locally (408ns -> 3.2us wall-clock) and ~35x in CodSpeed's instruction-counting mode (9.8us -> 349us). Move the clone after the bail checks — `source_code` is only read when the file is actually compiled. Behavior is unchanged (differential 1796/0).
…ning the whole program `transform`/`compile_program`/`ox_splice_program` now mutate the caller-owned `&mut Program` in place rather than deep-cloning the entire program (`oxc_program.clone_in`) just to replace a few compiled functions — mirroring how `oxc_transformer` mutates in place. `transform` returns a `changed: bool`; callers (napi transform, `oxc_transformer`) use their own now-mutated program. `lint` keeps a shared `&Program` (lint never emits, so it analyzes a throwaway clone) so the `oxc_linter` rule is unaffected. All front-end analysis (semantic, scope info, discovery, lowering) finishes and the replacement set is fully materialized before the borrow flips to `&mut`, so there is no read-after-write hazard. `CompileResult`/ `TransformResult` drop the `Option<Program>` (and their now-unused `'a`) in favor of `changed`. Behavior is byte-identical (differential same=1796 diff=0, 37 tests). Removes a whole-program deep copy on every compiled file; the gain is in allocation / instruction count (wall-clock is dominated by the compile pipeline, so it is unchanged on low-compile-ratio files).
…scendant scans `find_block_scope_by_bindings` (and two sibling descendant queries) computed a scope's descendant set with a fixpoint that rescanned every scope on each pass — O(scopes²) per call, run once per block statement. Precompute a `children` adjacency once in `convert_scope_info` and walk descendants in O(descendants). The resulting `FxHashSet` is identical, so scope selection is unchanged (differential same=1796 diff=0).
…ation `validate_ts_this_parameters_in_function_range` scanned every scope in the program once per discovered function — O(functions × all-scopes) — only to call a check that no-ops unless a scope declares a `this` binding. Precompute the (usually empty) set of `this`-binding scopes once and iterate that, in the same order, so error reporting is unchanged. ~7% faster on App.tsx; differential same=1796 diff=0.
`build_declaration_node_ids` scanned every reference in the program to build a program-wide set, but it ran inside `find_context_identifiers` — once per discovered function — rebuilding the identical set N times (O(functions × all-refs)). Build it once in `convert_scope_info` and store it on `ScopeInfo`. ~10% faster on App.tsx; differential same=1796 diff=0.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Vendors the React Compiler into
oxc_react_compilerand rebuilds it to speak oxc natively — lowering reads the oxc AST +oxc_semanticdirectly and codegen emits the oxc AST directly, with no vendored Babel-shaped AST in between. Previously every compile did a double AST round-trip (oxc AST → Babel-shaped AST → HIR → Babel AST → oxc AST); this removes that entirely (less code, less memory, faster, no conversion layers).Also exposes the compiler through the
oxc-transformnapi binding (reactCompilerSync) and adds a differential tool for validating output fidelity.What's included
oxc_semantic; codegen to the oxc AST; delete the Babel-shaped AST,convert_ast,convert_ast_reverse, andconvert_scope's Babel bridge. The HIR (the analysis/transform middle) is untouched — this is a front-end + back-end rewrite around an unchanged core.reactCompilerSync(filename, sourceText, options?)innapi/transform.tasks/react_compiler_comparecompares output against the upstream React Compiler / the pre-de-Babel build.debugfeature so release builds stay lean.TSTypenode directly (a borrowed&'a TSType), so codegenclone_ins it into the output arena with no parser in the common case, instead of re-parsing every TS type from its source span. The text-edit + re-parse path remains only as a fallback for the rare case where a binding rename rewrites an identifier inside the type.Fidelity
Gated at every step by a differential harness over the 1,796 upstream
babel-plugin-react-compilerfixtures (byte-identical:same=1795, the 1 remaining is an intentional ts-enum-inline difference) plus ~1,800 real-world ecosystem files.Driving the de-Babeled output toward the pre-Babel-removal build over the ecosystem corpus took it from ~2,156 → 18 differing files. The biggest cluster — a 37-file generic-function (type-parameter) hoisting class — is fully resolved. Of the 18 residual diffs:
enumre-emit (OriginalNode-as-statement) and one component-discovery heuristic.Local checks green:
cargo test -p oxc_react_compiler(37),cargo clippy(dev-no-debug-assertions),just fmt,typos.Draft while the full
just ready/ CI conformance run completes and the remaining residual diffs are triaged.AI usage disclosure
Per the project's AI policy: AI tooling (Claude Code) was used extensively on this branch — the de-Babel port and the ecosystem-fidelity fixes. All changes are gated by the differential + unit tests and have been reviewed by the author.
🤖 Generated with Claude Code