-
-
Notifications
You must be signed in to change notification settings - Fork 218
Add Hebrew and Arabic diacritics support to Mark.js #495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: abartov <[email protected]>
Co-authored-by: abartov <[email protected]>
…diacritics Add Hebrew and Arabic diacritics support to Mark.js
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds support for Hebrew and Arabic diacritics to Mark.js, enabling the library to match text with diacritical marks when searching for base characters. The implementation adds regex patterns to handle Hebrew nikud (U+0591-U+05C7) and Arabic harakat (U+064B-U+065F, U+0670, U+06D6-U+06ED) in the createDiacriticsRegExp method.
Changes:
- Added Hebrew and Arabic diacritics support with Unicode character ranges in the regex creator
- Created comprehensive test suites for both Hebrew and Arabic diacritics matching
- Updated copyright year from 2018 to 2025 across all distribution files
- Fixed a bug in the
wrapMatchesmethod to prevent incorrectseparateGroupscalls
Reviewed changes
Copilot reviewed 5 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/lib/regexpcreator.js | Added Hebrew and Arabic diacritics regex patterns to createDiacriticsRegExp method |
| test/specs/basic/diacritics-hebrew.js | Test suite for Hebrew diacritics matching functionality |
| test/specs/basic/diacritics-arabic.js | Test suite for Arabic diacritics matching functionality |
| test/fixtures/basic/diacritics-hebrew.html | HTML fixture with Hebrew text containing diacritics |
| test/fixtures/basic/diacritics-arabic.html | HTML fixture with Arabic text containing diacritics |
| dist/*.js | Updated distribution files with new functionality and copyright year |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @abartov |
|
Oh, perhaps the existing array can be extended. I wasn't sure. The goal is to allow diacriticized Hebrew or Arabic text (i.e. text with additional combining Unicode characters denoting vowels) to be matched and highlighted by Mark.js. Without this change, searching for a string like שלום (the Hebrew word 'shalom') won't highlight שָׁלוֹם (the same Hebrew word 'shalom' with full diacritics). |
|
Could you please check if the array can be extended, instead of adding additional RegExp replacements? Thank you. |
|
I've pondered this a bit, but I'm not seeing how it can be done by extending the array: the array lists variant characters for letter-and-accent combinations, but Hebrew and Arabic diacritics are combining Unicode characters, not separate characters for every letter-and-diacritic combination. Every one of the letters of the alphabet can be combined with any one of the diacritic characters, and in some cases, can even be combined with two or three such characters (hence the * in the pattern my patch creates). |
|
I should add I have been using Mark.js with my patch in production at https://benyehuda.org, and it works very well. |
No description provided.