Skip to content

AI-2607: Add semantic layer MCP tools#416

Merged
mariankrotil merged 74 commits intomainfrom
AI-2607-semantic-layer-tooling-tools
Apr 13, 2026
Merged

AI-2607: Add semantic layer MCP tools#416
mariankrotil merged 74 commits intomainfrom
AI-2607-semantic-layer-tooling-tools

Conversation

@Matovidlo
Copy link
Copy Markdown
Contributor

Description

Linear: AI-2607

Change Type

  • Major (breaking changes, significant new features)
  • Minor (new features, enhancements, backward compatible)
  • Patch (bug fixes, small improvements, no new features)

Summary

Adds 4 semantic layer MCP tools built on top of the MetastoreClient foundation (see #415).

New tools:

  • semantic_discover — ranked search across semantic entities (models, metrics, dimensions, constraints)
  • semantic_get_definition — get a semantic object definition by UUID or name
  • semantic_query_plan — structured query planner that resolves metrics, dimensions, joins, and constraints
  • semantic_define — create, patch, replace, delete, or publish semantic objects

Supporting changes:

  • MetastoreClient wired into KeboolaClient via metastore_client property
  • tools/semantic/ package with model.py (Pydantic I/O models) and tools.py (tool implementations)
  • Tools registered in server.py
  • Unit tests in tests/tools/test_semantic.py
  • TOOLS.md updated

⚠️ Draft — depends on #415 (MetastoreClient foundation) being merged first.

Testing

  • Tested with Cursor AI desktop (Streamable-HTTP transports)

Optional testing

  • Tested with Cursor AI desktop (all transports)
  • Tested with claude.ai web and canary-orion MCP (SSE and Streamable-HTTP)
  • Tested with In Platform Agent on canary-orion
  • Tested with RO chat on canary-orion

Checklist

  • Self-review completed
  • Unit tests added/updated (if applicable)
  • Integration tests added/updated (if applicable)
  • Project version bumped according to the change type (if applicable)
  • Documentation updated (if applicable)

@linear
Copy link
Copy Markdown

linear bot commented Mar 10, 2026

Comment thread src/keboola_mcp_server/tools/semantic/model.py Outdated
Comment thread src/keboola_mcp_server/tools/semantic/model.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new “semantic layer” toolset (discover/get definition/query plan/define) backed by the Metastore API client, wires the Metastore client into KeboolaClient, and registers/docs/tests the new tools.

Changes:

  • Added 4 new semantic MCP tools (semantic_discover, semantic_get_definition, semantic_query_plan, semantic_define) with Pydantic I/O models.
  • Integrated MetastoreClient into KeboolaClient and registered semantic tools in the server and docs generator.
  • Added/updated unit tests and TOOLS.md documentation for the new tools and Metastore client behaviors.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/keboola_mcp_server/tools/semantic/tools.py New semantic tool implementations and server registration helper.
src/keboola_mcp_server/tools/semantic/model.py New Pydantic models/enums for semantic tool inputs/outputs.
src/keboola_mcp_server/tools/semantic/__init__.py Exports semantic tool registration and tag constant.
src/keboola_mcp_server/clients/client.py Adds metastore_client property and derives Metastore base URL from hostname suffix.
src/keboola_mcp_server/clients/metastore.py Minor import cleanup.
src/keboola_mcp_server/server.py Registers semantic tools in server startup.
src/keboola_mcp_server/generate_tool_docs.py Adds Semantic Tools category to generated docs.
TOOLS.md Documents new semantic tools and their JSON schemas.
tests/tools/test_semantic.py New unit tests for semantic tools.
tests/test_server.py Ensures new tools are listed and tagged/annotated correctly.
tests/conftest.py Adds metastore_client mock to the test KeboolaClient fixture.
tests/clients/test_metastore.py Extends Metastore client tests (org-scope params + revisions).
tests/clients/test_client.py Adds test for Metastore URL derivation and headers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/keboola_mcp_server/tools/semantic/tools.py Outdated
Comment thread src/keboola_mcp_server/tools/semantic/tools.py
Comment thread src/keboola_mcp_server/tools/semantic/tools.py Outdated
Comment thread src/keboola_mcp_server/tools/semantic/tools.py Outdated
Comment thread src/keboola_mcp_server/tools/semantic/tools.py Outdated
Comment thread src/keboola_mcp_server/tools/semantic/tools.py Outdated
Comment thread src/keboola_mcp_server/tools/semantic/tools.py Outdated
Comment thread src/keboola_mcp_server/tools/semantic/tools.py Outdated
Comment thread src/keboola_mcp_server/tools/semantic/model.py Outdated
@Matovidlo Matovidlo force-pushed the AI-2607-semantic-layer-tooling branch 2 times, most recently from c9de053 to 1ef3892 Compare March 13, 2026 09:55
Matovidlo added a commit that referenced this pull request Mar 13, 2026
Addresses open PR #416 review comments and two business requirements:

1. System prompt: new Semantic Layer section instructs AI to call
   semantic_discover before SQL and to patch semantic objects via
   semantic_define when query results contradict the semantic definition.

2. Two-tier candidate ranking in semantic_discover: uses LLM sampling
   (ctx.sample) when the client supports it for semantic relevance ranking,
   with difflib fuzzy scoring as fallback for partial/plural matches.

3. Reviewer fixes:
   - limit=0 now correctly returns empty matches (was max(limit,1))
   - semantic_define delete branch no longer fetches schema unnecessarily
   - semantic_get_definition raises ValueError when both uuid and name given
   - _score_text *texts variadic fixed to single text: str parameter

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Matovidlo added a commit that referenced this pull request Mar 16, 2026
Addresses open PR #416 review comments and two business requirements:

1. System prompt: new Semantic Layer section instructs AI to call
   semantic_discover before SQL and to patch semantic objects via
   semantic_define when query results contradict the semantic definition.

2. Two-tier candidate ranking in semantic_discover: uses LLM sampling
   (ctx.sample) when the client supports it for semantic relevance ranking,
   with difflib fuzzy scoring as fallback for partial/plural matches.

3. Reviewer fixes:
   - limit=0 now correctly returns empty matches (was max(limit,1))
   - semantic_define delete branch no longer fetches schema unnecessarily
   - semantic_get_definition raises ValueError when both uuid and name given
   - _score_text *texts variadic fixed to single text: str parameter

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Matovidlo Matovidlo force-pushed the AI-2607-semantic-layer-tooling-tools branch from 0fd6021 to dd83332 Compare March 16, 2026 07:05
vita-stejskal and others added 7 commits March 16, 2026 09:57
- Remove unused AliasChoices import from metastore.py
- Fix line-too-long (E501) violations in test_metastore.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Document INTEGTEST_METASTORE_URL / INTEGTEST_METASTORE_TOKEN in integtests/README.md
- Make JsonApiResource.type required (no silent empty-string default)
- Make JsonApiListEnvelope.data required (no silent empty-list default)
- Add docstring to MetastoreClient.create

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses open PR #416 review comments and two business requirements:

1. System prompt: new Semantic Layer section instructs AI to call
   semantic_discover before SQL and to patch semantic objects via
   semantic_define when query results contradict the semantic definition.

2. Two-tier candidate ranking in semantic_discover: uses LLM sampling
   (ctx.sample) when the client supports it for semantic relevance ranking,
   with difflib fuzzy scoring as fallback for partial/plural matches.

3. Reviewer fixes:
   - limit=0 now correctly returns empty matches (was max(limit,1))
   - semantic_define delete branch no longer fetches schema unnecessarily
   - semantic_get_definition raises ValueError when both uuid and name given
   - _score_text *texts variadic fixed to single text: str parameter

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…/replace

Addresses three Copilot review comments where the `name` argument was
resolved for API calls but not reflected in the payload passed to
schema validation, allowing schema-required name fields to pass
validation only when provided in data rather than the name argument.

- create: inject inferred_name into payload when not already present
- patch: inject name into merged_payload when a rename is requested
- replace: inject replace_name into payload when not already present

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Matovidlo Matovidlo force-pushed the AI-2607-semantic-layer-tooling-tools branch from dd83332 to 9490c8c Compare March 16, 2026 08:57
Matovidlo and others added 9 commits March 17, 2026 11:08
Semantic layer tools are now hidden from the tool list and blocked on call
unless the project has the `mcp-semantic-tooling` feature enabled in its
Storage API token response, matching the pattern used for other feature-gated
tools. Moves SEMANTIC_TOOLS_TAG to tools/constants.py to avoid a circular import.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ling, and test gating

- Remove duplicate metastore_api_url assignment in client.py (copy-paste dead code)
- Remove stray "issues." artifact from validate_semantic_query docstring; merge
  into the preceding sentence and sync TOOLS.md
- Early-return empty lists in _compare_expected_and_detected_objects when no
  expected objects are provided, preventing all detected objects from being
  reported as unexpected when the caller passes no expectations
- Catch re.error in search_semantic_context and raise ValueError with the
  offending pattern identified, replacing a cryptic internal exception
- Cap max_concurrency at 10 in all process_concurrently calls (tools.py ×4,
  service.py ×3) that previously passed len(collection), preventing unbounded
  concurrent Metastore API bursts from user-controlled inputs
- Remove INTEGTEST_METASTORE_URL/TOKEN from README (never read; URL is derived
  from INTEGTEST_POOL_STORAGE_API_URL, token reuses storage token)
- Move 4 semantic tools to exclude set in _assert_basic_setup so integration
  tests pass on projects without the mcp-semantic-tooling feature flag

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread src/keboola_mcp_server/tools/semantic/service.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 22 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread integtests/tools/semantic/test_tools.py
Comment thread src/keboola_mcp_server/tools/semantic/tools.py Outdated
Comment thread TOOLS.md Outdated
…PStatusError skip

- Guard all six from_metastore class methods and _get_semantic_model_id
  against MetastoreObject.attributes=None by using `or {}` fallback so
  objects without an attributes field do not raise AttributeError
- Add contexts_per_model optional kwarg to validate_semantic_query and
  validate_semantic_used_objects; when expected_semantic_objects is
  provided, the tool pre-loads contexts once and passes them to both
  calls, eliminating the double _load_validation_contexts round-trip
- Fix grammar: "touches a semantic objects" -> "touches semantic objects"
- Catch httpx.HTTPStatusError in _require_metastore_available and skip
  so 4xx/5xx responses (proxy 404, 503) don't fail the whole test module

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread src/keboola_mcp_server/tools/semantic/service.py
Comment thread src/keboola_mcp_server/tools/semantic/tools.py
Comment thread src/keboola_mcp_server/tools/semantic/tools.py
Matovidlo and others added 2 commits April 1, 2026 20:00
…cate type accumulation

- Guard semantic_object.data.attributes in _find_matches with `or {}`
  so objects without an attributes field don't crash on regex scan or
  jsonpath traversal (completes the round-3 attributes=None fix)
- Deduplicate cleaned_model_ids in validate_semantic_query using
  dict.fromkeys to preserve order, preventing doubled violations and
  context entries when the same model ID is passed twice
- Replace dict comprehension in _compare_expected_and_detected_objects
  with setdefault accumulation so duplicate object_type entries merge
  their IDs instead of the last entry silently overwriting earlier ones

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…antic_service_data methods

All six SemanticXxxCompact.from_semantic_service_data methods and
SemanticObject.from_semantic_service_data read obj.data.attributes
without a None guard. Since from_metastore stores the original
MetastoreObject verbatim in .data, any object whose Metastore response
omits the attributes field causes AttributeError on .get() calls and
ValidationError in SemanticObject (dict[str,Any] field rejects None).
Change every occurrence to obj.data.attributes or {} to complete the
attributes=None guard across all tools.py access sites.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread src/keboola_mcp_server/tools/semantic/tools.py Outdated
Comment thread src/keboola_mcp_server/tools/semantic/service.py
Comment thread src/keboola_mcp_server/tools/semantic/service.py Outdated
Comment thread src/keboola_mcp_server/tools/semantic/tools.py Outdated
…s, type validation

- Replace all obj.meta.name / self.data.meta.name accesses with
  getattr(..., 'name', None) since MetastoreObject.meta is typed
  MetaObjectMeta | None; affects display_name, five from_metastore
  name fallbacks, and two sorted(...) generators in constraint eval
- Strip constraint metric/dataset names in evaluate_constraints_from_context
  (was filtering with metric.strip() but storing unstripped, causing
  false-positive composition violations and false-negative exclusion hits
  vs. the stripped used_metric_names/used_dataset_ids sets)
- Remove ownership filter from ids branch of load_semantic_context_for_
  semantic_type: when caller supplies exact UUIDs, silently dropping
  objects whose modelUUID is absent causes data loss with no error
- Add type validation in get_object_by_id: raise ValueError when
  raw_obj.type != object_type.value so type mismatches surface at the
  API boundary instead of producing structurally empty service objects
- Guard ownership check in _load_expected_object_groups with
  `object_model_id is not None` so objects without modelUUID are not
  falsely rejected; remove the now-dead semantic_type guard (type is
  validated upstream in get_object_by_id)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread src/keboola_mcp_server/tools/semantic/service.py
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread src/keboola_mcp_server/tools/semantic/service.py
Comment thread src/keboola_mcp_server/tools/semantic/tools.py
@mariankrotil
Copy link
Copy Markdown
Contributor

🤝 Taking it over

@Matovidlo
Copy link
Copy Markdown
Contributor Author

@mariankrotil the testts are failing, could you please fix it and rebase ? 🙂 Thanks

@mariankrotil mariankrotil merged commit 6d19677 into main Apr 13, 2026
21 checks passed
@mariankrotil mariankrotil deleted the AI-2607-semantic-layer-tooling-tools branch April 13, 2026 14:55
@Matovidlo Matovidlo mentioned this pull request Apr 15, 2026
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants