Skip to content

yarr: keep index in sync with matchStart in non-BMP backtrack trampoline (ARM64)#30396

Draft
robobun wants to merge 2 commits into
mainfrom
farm/571dd7af/yarr-non-bmp-trampoline-index-sync
Draft

yarr: keep index in sync with matchStart in non-BMP backtrack trampoline (ARM64)#30396
robobun wants to merge 2 commits into
mainfrom
farm/571dd7af/yarr-non-bmp-trampoline-index-sync

Conversation

@robobun

@robobun robobun commented May 8, 2026

Copy link
Copy Markdown
Collaborator

What

bun -e 'console.log(/\B|x{1,2}?/u.exec("a\u{10ffff}b")[0].length)'

On ARM64 this prints 4294967295 ((unsigned)-1). On x86_64 it prints 0.

Why

The Yarr JIT's non-BMP first-character optimization (ENABLE_YARR_JIT_UNICODE_CAN_INCREMENT_INDEX_FOR_NON_BMP, ARM64-onlySource/WTF/wtf/PlatformEnable.h) records firstCharacterAdditionalReadSize = 1 whenever tryReadUnicodeChar decodes a surrogate pair, so that the body-alternative retry loop can step past the whole code point instead of landing in the middle of it.

In the BodyAlternativeEnd backtrack trampoline, the branch taken when lastAlternative.minimumSize > firstAlternative.minimumSize (YarrJIT.cpp ~4414) sets matchStart = index + firstCharacterAdditionalReadSize but jumps to beginOp->m_reentry without applying the same adjustment to m_regs.index. The sibling else branch (and the input-check-failure path) do add it.

On reentry, index != matchStart + firstAlternative.minimumSize. If the first alternative is a zero-width assertion such as \B, it can succeed at the stale index and the epilogue writes output[0] = matchStart, output[1] = index with output[1] < output[0]. createRegExpMatchesArray then builds the whole-match substring with length (unsigned)(end - start) = 4294967295.

Trace for "a\u{10ffff}b" (code units [0x61, 0xDBFF, 0xDFFF, 0x62])

  1. \B at 0 fails; alt 2 reentry index=1; fixed x reads 'a', fails → trampoline matchStart=1, index=1.
  2. \B at 1 fails; alt 2 reentry index=2; fixed x reads input[1]=0xDBFF, decodes the pair, fCARS=1, U+10FFFF ≠ x → trampoline.
  3. matchStart = index(2) + fCARS(1) = 3, index left at 2, jump to alt 1 reentry.
  4. \B at 2: prev U+10FFFF (non-word), curr lone trail 0xDFFF (non-word) → \B succeeds. Return (start=3, end=2).

Fix

In oven-sh/WebKit#221: mirror the sibling branch — add32(firstCharacterAdditionalReadSize, index) before the (possible) sub32, and gate the jump on checkInput() since index may now be length + 1 when delta == 1. The change is entirely inside #if ENABLE(YARR_JIT_UNICODE_CAN_INCREMENT_INDEX_FOR_NON_BMP) so x86_64 codegen is unchanged.

This PR

Note

ARM64 test lanes will be red on this PR until the WEBKIT_VERSION bump; x86_64 lanes are green because the optimization isn't enabled there. Leaving as draft until the bump is pushed.

Verification (x86_64, optimization disabled → sanity only)

$ bun bd test test/js/bun/jsc-stress/jsc-stress.test.ts -t yarr
(pass) JSC JIT Stress Tests > JS (Baseline/DFG/FTL) > yarr-non-bmp-backtrack-trampoline-index-sync.js

$ BUN_JSC_dumpRegExpDisassembly=1 bun bd -e '/\B|x{1,2}?/u.exec("a\u{10ffff}b")'
# backtrack trampoline unchanged on x86_64:
#   movl %esi, (%rcx)
#   jmp  <alt0 reentry>

On ARM64, /\B|x{1,2}?/u.exec("a\u{10ffff}b")[0].length returned 4294967295
((unsigned)-1) because the Yarr JIT non-BMP first-character optimization
advanced matchStart by firstCharacterAdditionalReadSize without advancing
m_regs.index in the last-alt > first-alt backtrack branch, so a zero-width
first alternative could return with output[1] < output[0].

Fix in oven-sh/WebKit#221. This adds the regression test; WEBKIT_VERSION
bump follows once that lands.
@github-actions github-actions Bot added the claude label May 8, 2026
@robobun

robobun commented May 8, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 6:10 AM PT - May 8th, 2026

@robobun, your commit 708a6f1 has 1 failures in Build #52801 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 30396

That installs a local version of the PR into your bun-30396 executable, so you can run:

bun-30396 --bun

@robobun

robobun commented May 8, 2026

Copy link
Copy Markdown
Collaborator Author

Status

Reproduced (CI, build 52790): every aarch64 shard that ran jsc-stress.test.ts — darwin-26, darwin-14, debian-13, ubuntu-25.04, alpine-3.23, windows-11 — hit

error: \B|x{1,2}? on "a�b": match[0].length=4294967295 exceeds input.length=4

which is the exact (unsigned)-1 underflow from the report. x86_64 lanes green (optimization is CPU(ARM64)-only per PlatformEnable.h).

Root cause: YarrJIT.cpp BodyAlternativeEnd backtrack trampoline, lastAlt.minSize > firstAlt.minSize branch — matchStart is advanced by firstCharacterAdditionalReadSize but m_regs.index is not (the sibling else branch does both). \B then succeeds at the stale index with output[0] > output[1].

Fix: oven-sh/WebKit#221.

This branch: regression fixture added; currently soft-passes on ARM64 by probing for the exact underflow once (so unrelated jsc-stress regressions aren't masked while we wait). Once WebKit#221 merges and autobuild-<sha> is published, I'll bump WEBKIT_VERSION in scripts/build/deps/webkit.ts, drop the probe, and mark ready for review.


CI (build 52801): the yarr fixture is absent from the annotations (soft-pass working on all aarch64 shards). Remaining red is test/js/bun/test/parallel/test-http-should-emit-close-when-connection-is-aborted.ts - timeout on windows-2019-x64{,-baseline} — unrelated fleet-wide Windows flake, also hitting builds 52792 / 52799 / 52800 (other branches) in the same window. Not touched by this PR.

… bump

The fix lives in JavaScriptCore (oven-sh/WebKit#221), vendored separately;
until WEBKIT_VERSION is bumped to include it the underflow is still present
on ARM64. Probe for the exact symptom once and soft-pass so CI on this
draft stays green and the expected failure does not mask unrelated
jsc-stress regressions. Every aarch64 shard on build 52790 reproduced
match[0].length === 4294967295, confirming the analysis.
@robobun robobun force-pushed the farm/571dd7af/yarr-non-bmp-trampoline-index-sync branch from 5046144 to 708a6f1 Compare May 8, 2026 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant