Skip to content
/ mark.js Public

Conversation

@abartov
Copy link

@abartov abartov commented Jan 11, 2026

No description provided.

Copilot AI review requested due to automatic review settings January 11, 2026 21:38
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for Hebrew and Arabic diacritics to Mark.js, enabling the library to match text with diacritical marks when searching for base characters. The implementation adds regex patterns to handle Hebrew nikud (U+0591-U+05C7) and Arabic harakat (U+064B-U+065F, U+0670, U+06D6-U+06ED) in the createDiacriticsRegExp method.

Changes:

  • Added Hebrew and Arabic diacritics support with Unicode character ranges in the regex creator
  • Created comprehensive test suites for both Hebrew and Arabic diacritics matching
  • Updated copyright year from 2018 to 2025 across all distribution files
  • Fixed a bug in the wrapMatches method to prevent incorrect separateGroups calls

Reviewed changes

Copilot reviewed 5 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/lib/regexpcreator.js Added Hebrew and Arabic diacritics regex patterns to createDiacriticsRegExp method
test/specs/basic/diacritics-hebrew.js Test suite for Hebrew diacritics matching functionality
test/specs/basic/diacritics-arabic.js Test suite for Arabic diacritics matching functionality
test/fixtures/basic/diacritics-hebrew.html HTML fixture with Hebrew text containing diacritics
test/fixtures/basic/diacritics-arabic.html HTML fixture with Arabic text containing diacritics
dist/*.js Updated distribution files with new functionality and copyright year

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@julkue
Copy link
Owner

julkue commented Jan 11, 2026

Hi @abartov
Thanks for your PR!
Would you mind elaborating your changes a little bit?
For example, what is it solving exactly and why can't the existing array be extended?

@abartov
Copy link
Author

abartov commented Jan 11, 2026

Oh, perhaps the existing array can be extended. I wasn't sure. The goal is to allow diacriticized Hebrew or Arabic text (i.e. text with additional combining Unicode characters denoting vowels) to be matched and highlighted by Mark.js. Without this change, searching for a string like שלום (the Hebrew word 'shalom') won't highlight שָׁלוֹם (the same Hebrew word 'shalom' with full diacritics).

@julkue
Copy link
Owner

julkue commented Jan 12, 2026

Could you please check if the array can be extended, instead of adding additional RegExp replacements? Thank you.

@abartov
Copy link
Author

abartov commented Jan 17, 2026

I've pondered this a bit, but I'm not seeing how it can be done by extending the array: the array lists variant characters for letter-and-accent combinations, but Hebrew and Arabic diacritics are combining Unicode characters, not separate characters for every letter-and-diacritic combination. Every one of the letters of the alphabet can be combined with any one of the diacritic characters, and in some cases, can even be combined with two or three such characters (hence the * in the pattern my patch creates).

@abartov
Copy link
Author

abartov commented Feb 10, 2026

I should add I have been using Mark.js with my patch in production at https://benyehuda.org, and it works very well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants