Skip to content

Comments

Add punctuation tagger, custom rules, and comprehensive edge case tests#1

Merged
Alex-Wengg merged 3 commits intomainfrom
feat/punctuation-custom-rules-edge-cases
Feb 13, 2026
Merged

Add punctuation tagger, custom rules, and comprehensive edge case tests#1
Alex-Wengg merged 3 commits intomainfrom
feat/punctuation-custom-rules-edge-cases

Conversation

@Alex-Wengg
Copy link
Contributor

Summary

  • Punctuation tagger: 30+ spoken→symbol mappings with exact-match-only to avoid false positives (e.g., "at sign" → "@" instead of "at" → "@")
  • Custom rules engine: Runtime-configurable spoken→written mappings with highest priority (score 110), thread-safe via RwLock
  • FFI exports: nemo_normalize_sentence, nemo_add_rule, nemo_remove_rule, nemo_clear_rules, nemo_rule_count + C header updates
  • Swift wrapper: Updated with sentence-mode normalization, custom rules API, and rule management
  • Edge case tests: 143 total tests covering ASR-realistic dictation, large numbers, negative numbers, multi-type sentences, punctuation in dictation, ordinal false positives, money/time edge cases, and special values
  • Swift test harness: NLTagger edge case verification (10 tests) for context-aware punctuation spotting

Test plan

  • cargo test — all 143 tests pass (83 unit + 57 integration + 3 doc)
  • Swift NLTagger tests — 10/10 pass
  • Swift integration tests — 16/16 pass via swift-test harness
  • No regressions in existing tests

🤖 Generated with Claude Code

Adds normalize_sentence() and normalize_sentence_with_max_span() that
scan for normalizable spans within larger sentences using a longest-match-first
sliding window. Includes FFI exports and integration tests.
- Punctuation tagger: 30+ spoken→symbol mappings (exact match only to avoid false positives)
- Custom rules: runtime-configurable spoken→written mappings with highest priority (RwLock thread-safe)
- FFI exports: nemo_normalize_sentence, custom rules API, wired into C header
- Swift wrapper: updated with sentence-mode, custom rules, and rule management API
- Swift test harness: NLTagger edge case verification (10 tests)
- Edge case tests: ASR-realistic dictation, large numbers, negative numbers, multi-type sentences,
  punctuation in dictation, ordinal false positives, money/time edge cases, special values (143 total tests)
- GitHub Actions CI: cargo fmt --check + cargo test (with and without ffi feature)
- Apply default rustfmt formatting across all source files
@Alex-Wengg Alex-Wengg force-pushed the feat/punctuation-custom-rules-edge-cases branch from cc26706 to 66fb1da Compare February 13, 2026 18:27
@Alex-Wengg Alex-Wengg merged commit 444c5d6 into main Feb 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant