Skip to content

feat(react_compiler): vendor the React Compiler on a native oxc AST (no Babel AST)#23683

Draft
Boshen wants to merge 79 commits into
mainfrom
vendor-react-compiler
Draft

feat(react_compiler): vendor the React Compiler on a native oxc AST (no Babel AST)#23683
Boshen wants to merge 79 commits into
mainfrom
vendor-react-compiler

Conversation

@Boshen

@Boshen Boshen commented Jun 21, 2026

Copy link
Copy Markdown
Member

Summary

Vendors the React Compiler into oxc_react_compiler and rebuilds it to speak oxc natively — lowering reads the oxc AST + oxc_semantic directly and codegen emits the oxc AST directly, with no vendored Babel-shaped AST in between. Previously every compile did a double AST round-trip (oxc AST → Babel-shaped AST → HIR → Babel AST → oxc AST); this removes that entirely (less code, less memory, faster, no conversion layers).

Also exposes the compiler through the oxc-transform napi binding (reactCompilerSync) and adds a differential tool for validating output fidelity.

What's included

  • Vendor + de-Babel: lower from the oxc AST/oxc_semantic; codegen to the oxc AST; delete the Babel-shaped AST, convert_ast, convert_ast_reverse, and convert_scope's Babel bridge. The HIR (the analysis/transform middle) is untouched — this is a front-end + back-end rewrite around an unchanged core.
  • napi binding: reactCompilerSync(filename, sourceText, options?) in napi/transform.
  • Differential tooling: tasks/react_compiler_compare compares output against the upstream React Compiler / the pre-de-Babel build.
  • Debug IR printers gated behind a debug feature so release builds stay lean.
  • No TS-type re-parse in codegen: the HIR holds the oxc TSType node directly (a borrowed &'a TSType), so codegen clone_ins it into the output arena with no parser in the common case, instead of re-parsing every TS type from its source span. The text-edit + re-parse path remains only as a fallback for the rare case where a binding rename rewrites an identifier inside the type.

Fidelity

Gated at every step by a differential harness over the 1,796 upstream babel-plugin-react-compiler fixtures (byte-identical: same=1795, the 1 remaining is an intentional ts-enum-inline difference) plus ~1,800 real-world ecosystem files.

Driving the de-Babeled output toward the pre-Babel-removal build over the ecosystem corpus took it from ~2,156 → 18 differing files. The biggest cluster — a 37-file generic-function (type-parameter) hoisting class — is fully resolved. Of the 18 residual diffs:

  • ~13 are cases where the new path is more correct than the Babel-roundtrip baseline (the baseline bails with an internal invariant; the new path compiles).
  • 2 are genuine remaining gaps needing deferred infra: a local enum re-emit (OriginalNode-as-statement) and one component-discovery heuristic.
  • 1 is cosmetic (a JSX attribute entity, semantically identical).

Local checks green: cargo test -p oxc_react_compiler (37), cargo clippy (dev-no-debug-assertions), just fmt, typos.

Draft while the full just ready / CI conformance run completes and the remaining residual diffs are triaged.

AI usage disclosure

Per the project's AI policy: AI tooling (Claude Code) was used extensively on this branch — the de-Babel port and the ecosystem-fidelity fixes. All changes are gated by the differential + unit tests and have been reviewed by the author.

🤖 Generated with Claude Code

@github-actions github-actions Bot added the A-transformer Area - Transformer / Transpiler label Jun 21, 2026
@codspeed-hq

codspeed-hq Bot commented Jun 21, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 24%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 2 improved benchmarks
❌ 3 regressed benchmarks
✅ 57 untouched benchmarks
⏩ 9 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation react_compiler[App.tsx] 33.4 ms 39.6 ms -15.76%
Simulation linter[App.tsx] 108.1 ms 126 ms -14.15%
Simulation linter[kitchen-sink.tsx] 219 ms 231 ms -5.2%
Simulation react_compiler[RadixUIAdoptionSection.jsx] 8.4 ms 3.9 ms ×2.1
Simulation linter[RadixUIAdoptionSection.jsx] 9 ms 4.5 ms +99.87%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing vendor-react-compiler (c18d084) with main (10b96c6)

Open in CodSpeed

Footnotes

  1. 9 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@Boshen Boshen force-pushed the vendor-react-compiler branch from 5f2ee1c to 87b175e Compare June 21, 2026 12:47
Boshen added 27 commits June 22, 2026 10:50
…ompiler

Collapse the upstream multi-crate `react_compiler` workspace into modules of
`oxc_react_compiler` and replace cross-crate dependencies with normal in-crate
module paths.

- Move the 12 `react_compiler*` crates (114 files) into `src/<name>/`,
  with each `lib.rs` becoming `mod.rs`.
- Rewrite `crate::` self-references and `react_compiler_x::` cross-references
  to `crate::react_compiler_x::`.
- Drop the `forked_react_compiler*` crates.io dependencies; add the
  `serde`, `serde-transcode`, and `hmac-sha256` deps they relied on, plus the
  `serde_json` `unbounded_depth` feature.
- Relax clippy on the vendored modules in `lib.rs` so the source stays
  byte-identical to upstream; the hand-written conversion code stays linted.
Since the oxc integration converts oxc AST -> react_compiler AST -> oxc AST and
never touches the babel JSON format, the serde/JSON layer was dead weight. Drop
it entirely (serde, serde_json, serde-transcode deps and all code).

- RawNode no longer wraps a `serde_json::RawValue`; it carries pre-extracted
  typed metadata (identifiers with span/loc, `contains_hook_or_jsx`, the type
  tag, source span and a coarse type classification).
- convert_ast extracts that metadata straight from the oxc AST (via
  `oxc_ast_visit`) instead of building babel JSON.
- The loc index drops the full-AST `serde_json::to_value(&body)` walk; the AST
  walker gains a `visit_raw_node` hook to reach type-annotation/class-body
  identifiers.
- convert_ast_reverse re-emits TS types by re-parsing their source span and
  applying identifier renames (e.g. `typeof x` -> `typeof x_0`) as text edits,
  replacing the hand-written JSON -> oxc type converters.
- program.rs / build_hir read the typed metadata instead of walking JSON.
- All serde derives, `#[serde]` attributes and manual impls are removed.

Net -1100 lines; all crate tests pass and clippy/doc are clean.
The HIR / reactive-function debug printers only run when `PluginOptions.debug`
is set (the `debugLogIRs` path), yet they were always compiled in — ~113 KiB of
the release binary. Put them behind a non-default `debug` feature.

The three printer modules (`react_compiler_hir::print`, `react_compiler::debug_print`,
`react_compiler_reactive_scopes::print_reactive_function`) are now `#[cfg]`-gated,
with tiny stubs for the feature-off case that keep the exact signatures the
pipeline's `if debug_enabled` blocks call — so the ~40 call sites are untouched
and only no-op when the feature is off.

CI builds with `--all-features`, so the real printers are still compiled, tested
and linted. Stripped release binary: -113 KiB (3.73 -> 3.62 MiB).
The vendored React Compiler core uses legitimate short identifiers (`pn`,
`oce`, `ome`, `froms`, ...) that `typos` flags as misspellings. Exclude the
`src/react_compiler*` modules (kept close to upstream); the hand-written
conversion code stays spell-checked.
… not yet compiling)

Stage 1a of removing the Babel-shaped AST: the lowering module now consumes the oxc AST directly instead of the converted Babel AST.

- FunctionNode re-pointed to oxc Function/ArrowFunctionExpression
- new source_loc::LineOffsets (oxc span -> HIR SourceLocation, byte-identical to convert_ast's table), threaded through HirBuilder
- build_hir.rs reduced to the orchestration skeleton on oxc (lower/lower_inner/lower_block_statement*); the big expression/statement matches are catch-all stubs, with arms ported incrementally next
- identifier_loc_index + find_context_identifiers stubbed to empty pending their oxc walks

Does NOT compile yet: the discovery layer (program.rs/pipeline.rs/fixture_utils/validate_source_locations) still builds FunctionNode from Babel nodes (24 cargo errors). WIP checkpoint on the vendor-react-compiler branch.
…tage 1a Milestone 0)

The crate now compiles and runs end-to-end with lowering driven by the oxc AST. Every expression/statement arm currently bails (Primitive undefined / no-op); arms are filled in next, with the differential driving correctness.

- span-bridge: lib.rs builds a node_id -> oxc FunctionNode map from oxc_semantic and threads it through compile_program -> process_fn -> try_compile_function, so the still-Babel function discovery hands the oxc node to lower(). FunctionNode is now Copy; CompileSource stores original_kind instead of a Babel fn_node.
- compile_fn builds LineOffsets from the source and passes it to lower(); compile_outlined_fn stubbed (outlining re-lowers a synthesized Babel fn; unreachable while bailing).
- fixture_utils::extract_function + validate_source_locations collection stubbed (dead / off-by-default; re-port to oxc later).

Differential vs baseline over 1796 upstream fixtures: same=1438 diff=358 (the 358 are real-component fixtures awaiting arm fills). Back-end still emits the Babel AST via convert_ast_reverse (Stage 2).
…a arm-fill)

lower_expression now handles Identifier, Null/Boolean/Numeric/String literals, BinaryExpression, UnaryExpression (delete TODO), and LogicalExpression on the oxc AST; re-added lower_identifier (AST-agnostic) + oxc binary/unary operator converters. Remaining arms still bail via the catch-all.

Differential unchanged at same=1438 diff=358 (no regression). The metric stays flat until member/call/jsx/return/var-decl reach the critical mass a full component needs.
…ments (arm-fill)

lower_expression: Static/Computed/PrivateField member access (lower_member_expression over oxc's 3 member kinds), CallExpression (method vs regular via lower_arguments). lower_statement: empty/debugger/expression/return/throw/block (was a no-op stub).

Differential: same=1438 -> 1450 (+12 fixtures), diff 358 -> 346.
Added lower_binding_assignment (BindingIdentifier -> StoreLocal/StoreContext; destructuring/default deferred) + re-added lower_identifier_for_assignment, and the VariableDeclaration statement arm (with-init). Differential same=1450 -> 1454.
Boshen and others added 21 commits June 22, 2026 10:50
The de-Babel skeleton left Expression::RegExpLiteral falling through to the
catch-all (Primitive(Undefined)), so a memoized regex like `str.replace(/:/g, '')`
emitted `str.replace(undefined, '')` — silently wrong. Port the lowering arm to
emit InstructionValue::RegExpLiteral (pattern text + flags via to_inline_string).
Also exclude tasks/react_compiler_compare from the cargo workspace glob.
ox_binding_pattern_to_assignment_target only handled BindingIdentifier and
raised an invariant for Object/Array patterns, so any component whose codegen
needed a destructuring assignment (e.g. `({a: t1, ...rest} = t0)` from a
destructured param with a default + rest) hit the empty-body fallback shim and
compiled to `() => {}`. Port the recursive BindingPattern -> AssignmentTarget
conversion (object/array targets, rest, defaults, shorthand identifiers).
The Delete unary arm bailed to Primitive(Undefined), so `delete obj.x` /
`delete obj[k]` were dropped. Port it to emit PropertyDelete (static member)
and ComputedDelete (computed member); non-member delete targets record a syntax
error as before.
The ImportExpression arm synthesized a 6-char (`import` keyword) span for the
"Handle Import expressions" Todo, but Babel's Import node carried the loc of the
whole `import(...)` expression. Pass the full imp.span so the diagnostic label
matches the original Rust.
The parser sets pife=true on parenthesized function/arrow expressions so codegen
preserves the source parens. Non-recompiled code kept that flag through clone_in,
so oxc emitted callee parens (`(async function(){})()`) the original Babel path
never produced. Clear pife on the spliced program via a VisitMut pass.
The ClassDeclaration statement arm recorded the unsupported error but did not
allocate the temporary the original lowered (as UnsupportedNode), so IdentifierId
numbering drifted and tripped an InferMutationAliasingEffects invariant
downstream. Emit the same Primitive::Undefined placeholder used at the other
former-UnsupportedNode sites to keep numbering aligned.
The original pipeline decoded JSX text entities on parse and re-encoded them
on codegen. The de-Babeled path only ran that round-trip for recompiled JSX;
passed-through (non-recompiled) JSX text was left raw, so e.g. `&gte;` was not
re-escaped to `&amp;gte;`. Run the same decode->encode over every JSXText in
the spliced program so passed-through text matches the original Rust compiler.
The old Babel scope analysis did not record pure type-position references
that live in a variable-declarator type annotation (`const v: T`), so they
never fed the hoisting analysis. The de-Babeled path reads oxc_semantic
directly, which DOES record them, causing the hoisting scan to treat a type
parameter `T` as a hoistable "Unknown" binding (referenced before declared,
since type params are not statements) and bail with "Unsupported declaration
type for hoisting" — passing the whole generic function through verbatim
instead of recompiling it.

Mirror the old path: when building the scope reference map, skip pure-type
references (`is_type() && !is_value()`) whose structural host is a
VariableDeclarator. Parameter/return annotations and `as`/`satisfies` casts
are kept (the old path recorded those too), so only the declarator-annotation
case is excluded. Resolves 13 ecosystem mismatches with no fixture or
ecosystem regressions.
Extends the declarator-annotation fix: the old Babel scope analysis also did
not record pure type-position references that appear as type arguments on a
call or new expression (`obj.get<T>()`, `new Foo<T>()`). Like declarator
annotations, these must not feed the hoisting scan, else a type parameter is
mis-hoisted as an "Unknown" binding and the generic function bails. Resolves
13 more ecosystem mismatches (71 -> 58), no fixture or ecosystem regressions;
all type-position probes still match the baseline.
Extends the type-reference scope fix to JSX element type arguments
(`<Table<T> .../>`), another position the old Babel scope analysis did not
record. Resolves the final type-parameter hoisting mismatches (58 -> 47),
completing the 37-file generic-function hoisting class with no fixture or
ecosystem regressions.
`(obj.x as T)++` (and `satisfies`/`!`/`<T>` casts) bailed with
"UpdateExpression with unsupported argument type" because the lowering only
handled bare member/identifier targets. TS casts are transparent for an update
target, so unwrap them to the inner member expression and lower as a normal
member update (matching the original, which stripped the casts). Resolves 1
ecosystem mismatch; no fixture or other regressions.
Run just fmt over the react_compiler de-Babel work and the compare tool so the
branch passes the CI format check. Formatting only, no behavior change.
typos-cli flagged the short locals `ba`/`ot` in the compare tool as
misspellings; rename to descriptive names (baseLines/oxcLines, baseTemps/
oxcTemps) so the CI typos check passes. No behavior change.
…orary

The compound-assignment (`x += y`) identifier branch read the LHS via bare
`lower_identifier` (the binding Place) instead of through a LoadLocal/LoadContext
temporary like every other identifier read (and like the original Babel-path
lowering did via `lower_expression_to_temporary`). Without the load instruction,
ConstantPropagation could not substitute a known value into the compound op
(`x = "" + y` became `x = x + y`), and the missing temporary shifted IdentifierId
numbering away from the baseline. Emit the load temporary to match. Resolves 12
ecosystem mismatches (46 -> 34), no fixture or other regressions.
…ntainer

Outlined FunctionDeclarations were inserted only when the original function was a
direct `program.body` statement; nested originals (e.g. a component declared
inside a `describe(() => { ... })` callback) fell back to appending at module top
level, so the outlined helpers landed in the wrong place. Replace the
program-body-only search with a VisitMut over every statement list (function
bodies, blocks, and arrow/function argument bodies) so the outlined declarations
are inserted right after the original wherever it is nested, mirroring Babel's
path-based `insertAfter`. Resolves 7 ecosystem mismatches (34 -> 27), no fixture
or other regressions.
oxc parses `a?.b!` as `ChainExpression(TSNonNull(member))` (the assertion nested
inside the chain), which codegen prints as `a?.b!`. The original Babel round-trip
produced `TSNonNull(Paren(Chain(member)))`, printed as `(a?.b)!`. Add a splice-time
visitor that rewrites the former into the latter so passed-through non-null
assertions over optional chains match the baseline. Resolves 6 ecosystem
mismatches (27 -> 21), no fixture or other regressions.
`expression_type_name` reported every `ChainExpression` as
"OptionalMemberExpression", but Babel distinguished an optional call
(`a?.b()` -> OptionalCallExpression) from an optional member access. Inspect the
chain head element so the "cannot be safely reordered" Todo matches the baseline
wording. Resolves 1 ecosystem mismatch (21 -> 20), no fixture or other
regressions.
The de-Babel refactor deleted `set_raw_type_renames` (rename recording) and
stopped populating `RawNode.idents` (passed `Vec::new()`), but kept `RawNode`,
`env.renames`, `env.reference_node_ids`, and the type re-parse
(`ox_reparse_ts_type`). So re-parsed `as`/`typeof` types kept pre-rename names
while the value binding was renamed (e.g. `typeof field` instead of
`typeof field_3`).

Restore both halves: collect the type's identifier references into `RawNode.idents`
during lowering (`collect_type_idents`), and in `ox_reparse_ts_type` apply renames
as right-to-left text edits for idents that are real references
(`reference_node_ids`) with a matching nearest-enclosing binding rename — porting
the old `set_raw_type_renames` + `convert_type_from_raw` logic. `raw.idents` is
consumed only here, so no other pass is affected. Resolves 2 ecosystem mismatches
(20 -> 18), no fixture or other regressions.
The react_compiler_compare differential tool is a standalone dev CLI (like
tasks/coverage) that intentionally uses `console` for output; exclude it from
oxlint's no-console/curly rules so CI's lint job passes.
…s in codegen

Thread a lifetime `'a` through the HIR and reactive layers so
`InstructionValue::TypeCastExpression` stores the borrowed oxc
`&'a TSType<'a>` node directly instead of source-span metadata
(`RawNode`).

Lowering now stashes the AST node as-is, and codegen `clone_in`s it into
the output allocator with no parser in the common case. The text-edit +
reparse path is kept only as a fallback for the rare case where an
identifier inside the type was renamed by a binding rename.

This removes the per-TS-type parser invocation that codegen previously
performed on every type-cast, the main source of the codegen-side cost.

Output is byte-identical: differential over the 1795 upstream fixtures is
unchanged (the single remaining diff is the pre-existing
`ts-enum-inline.tsx` case, unrelated to type casts).
@Boshen Boshen force-pushed the vendor-react-compiler branch from c1e7d2a to e5650b7 Compare June 22, 2026 02:52
Boshen added 8 commits June 22, 2026 12:52
…d output

Inline TS `enum` declarations inside a component/hook body have runtime
semantics, but the de-Babeled lowering dropped them (the original captured
them via the now-removed `UnsupportedNode`), so the compiled output lost
the `enum` and its references became undeclared globals.

Add a value-less `InstructionValue::PassthroughStatement` variant holding
the borrowed oxc statement node: lowering emits it for `TSEnumDeclaration`
at its original position, and codegen clones it into the output allocator
verbatim. Modeled on the existing `Debugger` instruction (no operands, no
value, retained through dead-code elimination).

This brings the upstream fixture suite to byte-identical parity with the
pre-de-Babel build: differential `same=1796 diff=0` (was 1795/1), and
fixes one inline-enum ecosystem file (18 -> 17).
`transform` cloned the entire source to an owned `String`
(`options.source_code = source_text.to_string()`) BEFORE the early-bail
checks, so every file with no React-like functions paid a full-source
copy for nothing. On large no-component files (e.g. `binder.ts`) this
dominated: ~7x slower locally (408ns -> 3.2us wall-clock) and ~35x in
CodSpeed's instruction-counting mode (9.8us -> 349us).

Move the clone after the bail checks — `source_code` is only read when the
file is actually compiled. Behavior is unchanged (differential 1796/0).
…ning the whole program

`transform`/`compile_program`/`ox_splice_program` now mutate the
caller-owned `&mut Program` in place rather than deep-cloning the entire
program (`oxc_program.clone_in`) just to replace a few compiled functions
— mirroring how `oxc_transformer` mutates in place. `transform` returns a
`changed: bool`; callers (napi transform, `oxc_transformer`) use their own
now-mutated program. `lint` keeps a shared `&Program` (lint never emits, so
it analyzes a throwaway clone) so the `oxc_linter` rule is unaffected.

All front-end analysis (semantic, scope info, discovery, lowering) finishes
and the replacement set is fully materialized before the borrow flips to
`&mut`, so there is no read-after-write hazard. `CompileResult`/
`TransformResult` drop the `Option<Program>` (and their now-unused `'a`) in
favor of `changed`.

Behavior is byte-identical (differential same=1796 diff=0, 37 tests).
Removes a whole-program deep copy on every compiled file; the gain is in
allocation / instruction count (wall-clock is dominated by the compile
pipeline, so it is unchanged on low-compile-ratio files).
…scendant scans

`find_block_scope_by_bindings` (and two sibling descendant queries) computed
a scope's descendant set with a fixpoint that rescanned every scope on each
pass — O(scopes²) per call, run once per block statement. Precompute a
`children` adjacency once in `convert_scope_info` and walk descendants in
O(descendants). The resulting `FxHashSet` is identical, so scope selection is
unchanged (differential same=1796 diff=0).
…ation

`validate_ts_this_parameters_in_function_range` scanned every scope in the
program once per discovered function — O(functions × all-scopes) — only to
call a check that no-ops unless a scope declares a `this` binding. Precompute
the (usually empty) set of `this`-binding scopes once and iterate that, in the
same order, so error reporting is unchanged. ~7% faster on App.tsx;
differential same=1796 diff=0.
`build_declaration_node_ids` scanned every reference in the program to build
a program-wide set, but it ran inside `find_context_identifiers` — once per
discovered function — rebuilding the identical set N times (O(functions ×
all-refs)). Build it once in `convert_scope_info` and store it on `ScopeInfo`.
~10% faster on App.tsx; differential same=1796 diff=0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-transformer Area - Transformer / Transpiler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant