yarr: keep index in sync with matchStart in non-BMP backtrack trampoline (ARM64)#30396
yarr: keep index in sync with matchStart in non-BMP backtrack trampoline (ARM64)#30396robobun wants to merge 2 commits into
Conversation
On ARM64, /\B|x{1,2}?/u.exec("a\u{10ffff}b")[0].length returned 4294967295
((unsigned)-1) because the Yarr JIT non-BMP first-character optimization
advanced matchStart by firstCharacterAdditionalReadSize without advancing
m_regs.index in the last-alt > first-alt backtrack branch, so a zero-width
first alternative could return with output[1] < output[0].
Fix in oven-sh/WebKit#221. This adds the regression test; WEBKIT_VERSION
bump follows once that lands.
|
Updated 6:10 AM PT - May 8th, 2026
❌ @robobun, your commit 708a6f1 has 1 failures in
🧪 To try this PR locally: bunx bun-pr 30396That installs a local version of the PR into your bun-30396 --bun |
StatusReproduced (CI, build 52790): every aarch64 shard that ran which is the exact Root cause: Fix: oven-sh/WebKit#221. This branch: regression fixture added; currently soft-passes on ARM64 by probing for the exact underflow once (so unrelated jsc-stress regressions aren't masked while we wait). Once WebKit#221 merges and CI (build 52801): the yarr fixture is absent from the annotations (soft-pass working on all aarch64 shards). Remaining red is |
… bump The fix lives in JavaScriptCore (oven-sh/WebKit#221), vendored separately; until WEBKIT_VERSION is bumped to include it the underflow is still present on ARM64. Probe for the exact symptom once and soft-pass so CI on this draft stays green and the expected failure does not mask unrelated jsc-stress regressions. Every aarch64 shard on build 52790 reproduced match[0].length === 4294967295, confirming the analysis.
5046144 to
708a6f1
Compare
What
On ARM64 this prints
4294967295((unsigned)-1). On x86_64 it prints0.Why
The Yarr JIT's non-BMP first-character optimization (
ENABLE_YARR_JIT_UNICODE_CAN_INCREMENT_INDEX_FOR_NON_BMP, ARM64-only —Source/WTF/wtf/PlatformEnable.h) recordsfirstCharacterAdditionalReadSize = 1whenevertryReadUnicodeChardecodes a surrogate pair, so that the body-alternative retry loop can step past the whole code point instead of landing in the middle of it.In the
BodyAlternativeEndbacktrack trampoline, the branch taken whenlastAlternative.minimumSize > firstAlternative.minimumSize(YarrJIT.cpp~4414) setsmatchStart = index + firstCharacterAdditionalReadSizebut jumps tobeginOp->m_reentrywithout applying the same adjustment tom_regs.index. The siblingelsebranch (and the input-check-failure path) do add it.On reentry,
index != matchStart + firstAlternative.minimumSize. If the first alternative is a zero-width assertion such as\B, it can succeed at the staleindexand the epilogue writesoutput[0] = matchStart,output[1] = indexwithoutput[1] < output[0].createRegExpMatchesArraythen builds the whole-match substring with length(unsigned)(end - start) = 4294967295.Trace for
"a\u{10ffff}b"(code units[0x61, 0xDBFF, 0xDFFF, 0x62])\Bat 0 fails; alt 2 reentryindex=1; fixedxreads'a', fails → trampolinematchStart=1,index=1.\Bat 1 fails; alt 2 reentryindex=2; fixedxreadsinput[1]=0xDBFF, decodes the pair,fCARS=1, U+10FFFF ≠x→ trampoline.matchStart = index(2) + fCARS(1) = 3,indexleft at 2, jump to alt 1 reentry.\Bat 2: prev U+10FFFF (non-word), curr lone trail 0xDFFF (non-word) →\Bsucceeds. Return(start=3, end=2).Fix
In oven-sh/WebKit#221: mirror the sibling branch —
add32(firstCharacterAdditionalReadSize, index)before the (possible)sub32, and gate the jump oncheckInput()sinceindexmay now belength + 1whendelta == 1. The change is entirely inside#if ENABLE(YARR_JIT_UNICODE_CAN_INCREMENT_INDEX_FOR_NON_BMP)so x86_64 codegen is unchanged.This PR
test/js/bun/jsc-stress/fixtures/yarr-non-bmp-backtrack-trampoline-index-sync.jsexercising the pattern shapes that hit this trampoline branch, asserting the whole-match substring is well-formed (length ≤ input length, index + length ≤ input length).WEBKIT_VERSIONbump to follow once [YARR] Advance index by firstCharacterAdditionalReadSize in last-alt > first-alt backtrack trampoline WebKit#221 merges and an autobuild is available.Note
ARM64 test lanes will be red on this PR until the
WEBKIT_VERSIONbump; x86_64 lanes are green because the optimization isn't enabled there. Leaving as draft until the bump is pushed.Verification (x86_64, optimization disabled → sanity only)