Plan 2: Improve current object-based search
The HED basic search capabilities are integral to analysis such as epoching. We have made some progress in addressing some of the implementation issues with the search efficiency. This plan addresses additional potential improvements.
Fix remaining performance and correctness issues in the QueryHandler system from issue #1268. PRs 1281-1282 already fixed 5 of 10 items; this plan covers the remaining 5 (items #3, #6, #7, #8, #10).
Phase A: Make SearchResult hashable (fixes #7, #8)
The AND merge in ExpressionAnd.merge_and_groups() has O(n³) complexity (self-documented as "trash and slow") and the OR dedup in ExpressionOr.handle_expr() is O(n²). Both stem from linear scans to detect duplicate SearchResult objects.
- Add
__hash__ and __eq__ to SearchResult in hed/models/query_util.py based on (id(group), frozenset(id(c) for c in children)) — this preserves the existing identity-based semantics (the current has_same_children method uses is checks)
- Replace the O(n³) dedup loop in
ExpressionAnd.merge_and_groups() (hed/models/query_expressions.py lines 121-142) with set-based lookup
- Replace the O(n²) dedup loop in
ExpressionOr.handle_expr() (hed/models/query_expressions.py lines 228-238) with set-based lookup
Can start immediately — no dependencies.
Phase B: Resolve tag form inconsistencies (#10)
Parallel with Phase A.
Each find_* method in hed/models/hed_group.py uses a different tag property for comparison:
| Method |
Tag property used |
What it matches |
find_tags() (L435) |
short_base_tag.casefold() |
Base tag without extension/value |
find_wildcard_tags() (L464) |
short_tag.casefold() prefix |
Short form including value, prefix match |
find_exact_tags() (L501) |
HedTag.__eq__ |
short_tag then org_tag.casefold() fallback |
find_tags_with_term() (L566) |
tag_terms tuple |
All ancestor terms from schema |
- Audit whether each method's choice of tag form is correct for its intended semantics. In particular,
find_exact_tags() is called from query_expressions.py with plain strings (e.g., "def/mydef"), and HedTag.__eq__(str) compares via self.casefold() == other.casefold() which returns str(self).casefold() = short_tag.casefold(). If a tag was written in long form in the source HED string, str(tag) still returns short_tag (when a schema entry exists), so this should work — but needs verification with tests.
- Add targeted tests for cross-form matching scenarios:
- Long-form written tags matched by short-form queries
- Tags with extensions/values
- Tags without a recognized schema entry (where
str(tag) falls back to org_tag)
- Document the rationale for each
find_* method's choice of tag form
Phase C: Code quality (#6)
Parallel with Phase A.
- Replace the string-based wildcard check in the negation restriction (
hed/models/query_handler.py line 141):
if "?" in str(interior):
raise HedQueryError("Cannot negate wildcards...")
With an expression-tree has_wildcard property on Expression subclasses. Currently, ~(Event && Action) is fine, but ~([A, ?]) is prohibited even though the wildcard is nested. The check converts the entire subtree to string and looks for ?, which is overly broad.
- Remove stale TODO comments for already-fixed issues
Phase D: Documentation alignment
- The HED search guide references
QueryParser and TagExpressionParser — update to QueryHandler
- Document the
@ (not-in-line) operator in user-facing documentation (currently only documented in code comments and tests)
- Add examples showing exact-match group with optional portion
{A: B} and {A, B:} syntax
Files to modify
| File |
What to change |
hed/models/query_util.py |
Add __hash__/__eq__ to SearchResult |
hed/models/query_expressions.py |
Replace dedup loops in merge_and_groups() and ExpressionOr.handle_expr() |
hed/models/hed_group.py |
Audit/document find_* methods (lines 435-600) |
hed/models/hed_tag.py |
Reference: __eq__ (L647), tag_terms (L55/342), short_tag (L87) |
hed/models/query_handler.py |
Refactor negation wildcard check (L141) |
tests/models/test_query_handler.py |
Add cross-form and edge-case tests |
tests/models/test_query_util.py |
Add SearchResult hash/equality tests |
Verification
- Run full test suite:
python -m unittest discover tests -v
- Run spec tests:
python -m unittest discover spec_tests -v
- Benchmark AND/OR dedup on HED strings with 50+ tags to verify performance improvement
- Test cross-form matching (long-form source strings with short-form queries and vice versa)
- Lint:
ruff check hed/ tests/ && ruff format --check hed/ tests/ && typos
Decisions
Plan 2: Improve current object-based search
The HED basic search capabilities are integral to analysis such as epoching. We have made some progress in addressing some of the implementation issues with the search efficiency. This plan addresses additional potential improvements.
Fix remaining performance and correctness issues in the
QueryHandlersystem from issue #1268. PRs 1281-1282 already fixed 5 of 10 items; this plan covers the remaining 5 (items #3, #6, #7, #8, #10).Phase A: Make
SearchResulthashable (fixes #7, #8)The AND merge in
ExpressionAnd.merge_and_groups()has O(n³) complexity (self-documented as "trash and slow") and the OR dedup inExpressionOr.handle_expr()is O(n²). Both stem from linear scans to detect duplicateSearchResultobjects.__hash__and__eq__toSearchResultinhed/models/query_util.pybased on(id(group), frozenset(id(c) for c in children))— this preserves the existing identity-based semantics (the currenthas_same_childrenmethod usesischecks)ExpressionAnd.merge_and_groups()(hed/models/query_expressions.pylines 121-142) withset-based lookupExpressionOr.handle_expr()(hed/models/query_expressions.pylines 228-238) withset-based lookupCan start immediately — no dependencies.
Phase B: Resolve tag form inconsistencies (#10)
Parallel with Phase A.
Each
find_*method inhed/models/hed_group.pyuses a different tag property for comparison:find_tags()(L435)short_base_tag.casefold()find_wildcard_tags()(L464)short_tag.casefold()prefixfind_exact_tags()(L501)HedTag.__eq__short_tagthenorg_tag.casefold()fallbackfind_tags_with_term()(L566)tag_termstuplefind_exact_tags()is called fromquery_expressions.pywith plain strings (e.g.,"def/mydef"), andHedTag.__eq__(str)compares viaself.casefold() == other.casefold()which returnsstr(self).casefold()=short_tag.casefold(). If a tag was written in long form in the source HED string,str(tag)still returnsshort_tag(when a schema entry exists), so this should work — but needs verification with tests.str(tag)falls back toorg_tag)find_*method's choice of tag formPhase C: Code quality (#6)
Parallel with Phase A.
hed/models/query_handler.pyline 141):has_wildcardproperty onExpressionsubclasses. Currently,~(Event && Action)is fine, but~([A, ?])is prohibited even though the wildcard is nested. The check converts the entire subtree to string and looks for?, which is overly broad.Phase D: Documentation alignment
QueryParserandTagExpressionParser— update toQueryHandler@(not-in-line) operator in user-facing documentation (currently only documented in code comments and tests){A: B}and{A, B:}syntaxFiles to modify
hed/models/query_util.py__hash__/__eq__toSearchResulthed/models/query_expressions.pymerge_and_groups()andExpressionOr.handle_expr()hed/models/hed_group.pyfind_*methods (lines 435-600)hed/models/hed_tag.py__eq__(L647),tag_terms(L55/342),short_tag(L87)hed/models/query_handler.pytests/models/test_query_handler.pytests/models/test_query_util.pySearchResulthash/equality testsVerification
python -m unittest discover tests -vpython -m unittest discover spec_tests -vruff check hed/ tests/ && ruff format --check hed/ tests/ && typosDecisions
SearchResulthash usesid()(object identity), not value equality — preserves existing match semantics where the same physical tag cannot satisfy both sides of an ANDExpressionDescendantGroupexact forwarding) confirmed NOT a bug per PR 1282 investigation — the[...]descendant group intentionally searches all ancestor levels