[codex] Add configurable literal bracket patterns by cmungall · Pull Request #44 · linkml/linkml-reference-validator

cmungall · 2026-04-03T21:59:10Z

Summary

add literal_bracket_patterns to ReferenceValidationConfig
preserve bracketed source text in _split_query() when configured patterns match bracket contents
add validator/config/model coverage and document the new configuration option

Why

SupportingTextValidator._split_query() stripped every [...] segment unconditionally. That works for editorial insertions, but it breaks literal scientific bracket notation like [2Fe-2S], [30S], [Ca2+], and similar corpus-specific patterns.

This change keeps the existing default behavior for backward compatibility and lets projects opt into preserving literal brackets with explicit regex patterns.

Impact

Projects that do nothing keep the current strip-all behavior. Projects with literal bracketed source text can now configure preservation rules in YAML or Python config.

Validation

uv run pytest tests/test_supporting_text_validator.py tests/test_validation_config.py tests/test_models.py
just test

Copilot

Pull request overview

Adds an opt-in configuration to preserve bracketed literal source text (e.g., scientific notation like [2Fe-2S]) during supporting-text validation, while keeping the existing default behavior of stripping all [...] segments for backward compatibility.

Changes:

Introduces literal_bracket_patterns on ReferenceValidationConfig and wires it into SupportingTextValidator._split_query().
Updates _split_query() to selectively preserve bracketed content when configured patterns match, and normalizes whitespace in returned parts.
Adds/updates tests plus README + conceptual docs to cover and document the new configuration.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`src/linkml_reference_validator/models.py`	Adds `literal_bracket_patterns` config field with description + example.
`src/linkml_reference_validator/validation/supporting_text_validator.py`	Compiles configured patterns and conditionally preserves matching bracketed content in `_split_query()`.
`tests/test_models.py`	Verifies config default/custom values for the new field.
`tests/test_validation_config.py`	Verifies YAML config loading for `literal_bracket_patterns`.
`tests/test_supporting_text_validator.py`	Adds behavioral tests for default stripping vs configured preservation in split + matching.
`README.md`	Documents the new YAML option and intended usage.
`docs/concepts/editorial-conventions.md`	Adds conceptual documentation for literal brackets and configuration.

Copilot · 2026-04-04T00:47:36Z

src/linkml_reference_validator/validation/supporting_text_validator.py

            ['protein functions', 'in cells']
            >>> validator._split_query("protein [important] functions")
            ['protein functions']
+            >>> config = ReferenceValidationConfig(literal_bracket_patterns=[r"\\d"])


In the _split_query() docstring example, literal_bracket_patterns=[r"\\d"] compiles a pattern that matches a literal \d sequence, not digits. This example likely intends r"\d" so the shown output is reproducible.

Suggested change

>>> config = ReferenceValidationConfig(literal_bracket_patterns=[r"\\d"])

>>> config = ReferenceValidationConfig(literal_bracket_patterns=[r"\d"])

Copilot · 2026-04-04T00:47:37Z

README.md

+```
+
+Patterns are matched against the content inside `[...]`. If any pattern matches,
+that bracketed text is kept verbatim during validation.


The README says preserved bracketed text is kept "verbatim during validation", but the validator normalizes punctuation/whitespace before substring matching (and _split_query() also collapses whitespace). Consider rewording to “preserved (not stripped) before normalization/validation” to avoid implying exact byte-for-byte matching.

Suggested change

that bracketed text is kept verbatim during validation.

that bracketed text is preserved (not stripped) before normalization and validation.

Add literal bracket patterns for supporting text validation

e0f2163

cmungall marked this pull request as ready for review April 4, 2026 00:44

Copilot AI review requested due to automatic review settings April 4, 2026 00:44

Copilot started reviewing on behalf of cmungall April 4, 2026 00:45 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

cmungall deployed to release April 4, 2026 05:21 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Add configurable literal bracket patterns#44

[codex] Add configurable literal bracket patterns#44
cmungall wants to merge 1 commit intomainfrom
feat/issue-43

cmungall commented Apr 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	>>> config = ReferenceValidationConfig(literal_bracket_patterns=[r"\\d"])
	>>> config = ReferenceValidationConfig(literal_bracket_patterns=[r"\d"])

	that bracketed text is kept verbatim during validation.
	that bracketed text is preserved (not stripped) before normalization and validation.

Conversation

cmungall commented Apr 3, 2026

Summary

Why

Impact

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants