AI-2607: Add semantic layer MCP tools by Matovidlo · Pull Request #416 · keboola/mcp-server

Matovidlo · 2026-03-10T11:54:06Z

Description

Linear: AI-2607

Change Type

Major (breaking changes, significant new features)
Minor (new features, enhancements, backward compatible)
Patch (bug fixes, small improvements, no new features)

Summary

Adds 4 semantic layer MCP tools built on top of the MetastoreClient foundation (see #415).

New tools:

semantic_discover — ranked search across semantic entities (models, metrics, dimensions, constraints)
semantic_get_definition — get a semantic object definition by UUID or name
semantic_query_plan — structured query planner that resolves metrics, dimensions, joins, and constraints
semantic_define — create, patch, replace, delete, or publish semantic objects

Supporting changes:

MetastoreClient wired into KeboolaClient via metastore_client property
tools/semantic/ package with model.py (Pydantic I/O models) and tools.py (tool implementations)
Tools registered in server.py
Unit tests in tests/tools/test_semantic.py
TOOLS.md updated

⚠️ Draft — depends on #415 (MetastoreClient foundation) being merged first.

Testing

Tested with Cursor AI desktop (Streamable-HTTP transports)

Optional testing

Tested with Cursor AI desktop (all transports)
Tested with claude.ai web and canary-orion MCP (SSE and Streamable-HTTP)
Tested with In Platform Agent on canary-orion
Tested with RO chat on canary-orion

Checklist

Self-review completed
Unit tests added/updated (if applicable)
Integration tests added/updated (if applicable)
Project version bumped according to the change type (if applicable)
Documentation updated (if applicable)

linear · 2026-03-10T11:54:10Z

AI-2607 Semantic Layer Tooling

Copilot

Pull request overview

This PR introduces a new “semantic layer” toolset (discover/get definition/query plan/define) backed by the Metastore API client, wires the Metastore client into KeboolaClient, and registers/docs/tests the new tools.

Changes:

Added 4 new semantic MCP tools (semantic_discover, semantic_get_definition, semantic_query_plan, semantic_define) with Pydantic I/O models.
Integrated MetastoreClient into KeboolaClient and registered semantic tools in the server and docs generator.
Added/updated unit tests and TOOLS.md documentation for the new tools and Metastore client behaviors.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`src/keboola_mcp_server/tools/semantic/tools.py`	New semantic tool implementations and server registration helper.
`src/keboola_mcp_server/tools/semantic/model.py`	New Pydantic models/enums for semantic tool inputs/outputs.
`src/keboola_mcp_server/tools/semantic/__init__.py`	Exports semantic tool registration and tag constant.
`src/keboola_mcp_server/clients/client.py`	Adds `metastore_client` property and derives Metastore base URL from hostname suffix.
`src/keboola_mcp_server/clients/metastore.py`	Minor import cleanup.
`src/keboola_mcp_server/server.py`	Registers semantic tools in server startup.
`src/keboola_mcp_server/generate_tool_docs.py`	Adds Semantic Tools category to generated docs.
`TOOLS.md`	Documents new semantic tools and their JSON schemas.
`tests/tools/test_semantic.py`	New unit tests for semantic tools.
`tests/test_server.py`	Ensures new tools are listed and tagged/annotated correctly.
`tests/conftest.py`	Adds `metastore_client` mock to the test KeboolaClient fixture.
`tests/clients/test_metastore.py`	Extends Metastore client tests (org-scope params + revisions).
`tests/clients/test_client.py`	Adds test for Metastore URL derivation and headers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Addresses open PR #416 review comments and two business requirements: 1. System prompt: new Semantic Layer section instructs AI to call semantic_discover before SQL and to patch semantic objects via semantic_define when query results contradict the semantic definition. 2. Two-tier candidate ranking in semantic_discover: uses LLM sampling (ctx.sample) when the client supports it for semantic relevance ranking, with difflib fuzzy scoring as fallback for partial/plural matches. 3. Reviewer fixes: - limit=0 now correctly returns empty matches (was max(limit,1)) - semantic_define delete branch no longer fetches schema unnecessarily - semantic_get_definition raises ValueError when both uuid and name given - _score_text *texts variadic fixed to single text: str parameter Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… column metadata is unavailable

- Remove unused AliasChoices import from metastore.py - Fix line-too-long (E501) violations in test_metastore.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Document INTEGTEST_METASTORE_URL / INTEGTEST_METASTORE_TOKEN in integtests/README.md - Make JsonApiResource.type required (no silent empty-string default) - Make JsonApiListEnvelope.data required (no silent empty-list default) - Add docstring to MetastoreClient.create Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Addresses open PR #416 review comments and two business requirements: 1. System prompt: new Semantic Layer section instructs AI to call semantic_discover before SQL and to patch semantic objects via semantic_define when query results contradict the semantic definition. 2. Two-tier candidate ranking in semantic_discover: uses LLM sampling (ctx.sample) when the client supports it for semantic relevance ranking, with difflib fuzzy scoring as fallback for partial/plural matches. 3. Reviewer fixes: - limit=0 now correctly returns empty matches (was max(limit,1)) - semantic_define delete branch no longer fetches schema unnecessarily - semantic_get_definition raises ValueError when both uuid and name given - _score_text *texts variadic fixed to single text: str parameter Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…/replace Addresses three Copilot review comments where the `name` argument was resolved for API calls but not reflected in the payload passed to schema validation, allowing schema-required name fields to pass validation only when provided in data rather than the name argument. - create: inject inferred_name into payload when not already present - patch: inject name into merged_payload when a rename is requested - replace: inject replace_name into payload when not already present Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Semantic layer tools are now hidden from the tool list and blocked on call unless the project has the `mcp-semantic-tooling` feature enabled in its Storage API token response, matching the pattern used for other feature-gated tools. Moves SEMANTIC_TOOLS_TAG to tools/constants.py to avoid a circular import. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ling, and test gating - Remove duplicate metastore_api_url assignment in client.py (copy-paste dead code) - Remove stray "issues." artifact from validate_semantic_query docstring; merge into the preceding sentence and sync TOOLS.md - Early-return empty lists in _compare_expected_and_detected_objects when no expected objects are provided, preventing all detected objects from being reported as unexpected when the caller passes no expectations - Catch re.error in search_semantic_context and raise ValueError with the offending pattern identified, replacing a cryptic internal exception - Cap max_concurrency at 10 in all process_concurrently calls (tools.py ×4, service.py ×3) that previously passed len(collection), preventing unbounded concurrent Metastore API bursts from user-controlled inputs - Remove INTEGTEST_METASTORE_URL/TOKEN from README (never read; URL is derived from INTEGTEST_POOL_STORAGE_API_URL, token reuses storage token) - Move 4 semantic tools to exclude set in _assert_basic_setup so integration tests pass on projects without the mcp-semantic-tooling feature flag Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 20 out of 22 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…PStatusError skip - Guard all six from_metastore class methods and _get_semantic_model_id against MetastoreObject.attributes=None by using `or {}` fallback so objects without an attributes field do not raise AttributeError - Add contexts_per_model optional kwarg to validate_semantic_query and validate_semantic_used_objects; when expected_semantic_objects is provided, the tool pre-loads contexts once and passes them to both calls, eliminating the double _load_validation_contexts round-trip - Fix grammar: "touches a semantic objects" -> "touches semantic objects" - Catch httpx.HTTPStatusError in _require_metastore_available and skip so 4xx/5xx responses (proxy 404, 503) don't fail the whole test module Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…cate type accumulation - Guard semantic_object.data.attributes in _find_matches with `or {}` so objects without an attributes field don't crash on regex scan or jsonpath traversal (completes the round-3 attributes=None fix) - Deduplicate cleaned_model_ids in validate_semantic_query using dict.fromkeys to preserve order, preventing doubled violations and context entries when the same model ID is passed twice - Replace dict comprehension in _compare_expected_and_detected_objects with setdefault accumulation so duplicate object_type entries merge their IDs instead of the last entry silently overwriting earlier ones Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…antic_service_data methods All six SemanticXxxCompact.from_semantic_service_data methods and SemanticObject.from_semantic_service_data read obj.data.attributes without a None guard. Since from_metastore stores the original MetastoreObject verbatim in .data, any object whose Metastore response omits the attributes field causes AttributeError on .get() calls and ValidationError in SemanticObject (dict[str,Any] field rejects None). Change every occurrence to obj.data.attributes or {} to complete the attributes=None guard across all tools.py access sites. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…s, type validation - Replace all obj.meta.name / self.data.meta.name accesses with getattr(..., 'name', None) since MetastoreObject.meta is typed MetaObjectMeta | None; affects display_name, five from_metastore name fallbacks, and two sorted(...) generators in constraint eval - Strip constraint metric/dataset names in evaluate_constraints_from_context (was filtering with metric.strip() but storing unstripped, causing false-positive composition violations and false-negative exclusion hits vs. the stripped used_metric_names/used_dataset_ids sets) - Remove ownership filter from ids branch of load_semantic_context_for_ semantic_type: when caller supplies exact UUIDs, silently dropping objects whose modelUUID is absent causes data loss with no error - Add type validation in get_object_by_id: raise ValueError when raw_obj.type != object_type.value so type mismatches surface at the API boundary instead of producing structurally empty service objects - Guard ownership check in _load_expected_object_groups with `object_model_id is not None` so objects without modelUUID are not falsely rejected; remove the now-dead semantic_type guard (type is validated upstream in get_object_by_id) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mariankrotil · 2026-04-10T13:06:35Z

🤝 Taking it over

…ecognition

Matovidlo · 2026-04-13T12:14:46Z

@mariankrotil the testts are failing, could you please fix it and rebase ? 🙂 Thanks

Matovidlo commented Mar 10, 2026

View reviewed changes

Comment thread src/keboola_mcp_server/tools/semantic/model.py Outdated

Comment thread src/keboola_mcp_server/tools/semantic/model.py

Matovidlo requested a review from Copilot March 10, 2026 11:58

Copilot started reviewing on behalf of Matovidlo March 10, 2026 11:59 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

davidesner reviewed Mar 11, 2026

View reviewed changes

Comment thread src/keboola_mcp_server/tools/semantic/tools.py Outdated

davidesner reviewed Mar 11, 2026

View reviewed changes

Comment thread src/keboola_mcp_server/tools/semantic/tools.py Outdated

davidesner reviewed Mar 11, 2026

View reviewed changes

Comment thread src/keboola_mcp_server/tools/semantic/model.py Outdated

Matovidlo force-pushed the AI-2607-semantic-layer-tooling branch 2 times, most recently from c9de053 to 1ef3892 Compare March 13, 2026 09:55

Matovidlo force-pushed the AI-2607-semantic-layer-tooling-tools branch from 0fd6021 to dd83332 Compare March 16, 2026 07:05

vita-stejskal and others added 7 commits March 16, 2026 09:57

AI-2706: fall back to count(*) query to verify table existence when…

07c9680

… column metadata is unavailable

AI-2607: fix flake8 violations in metastore client and tests

67db534

- Remove unused AliasChoices import from metastore.py - Fix line-too-long (E501) violations in test_metastore.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AI-2607 feat: add semantic tools and update tests

d8f1bb5

AI-2607: add semantic tools to integration test expected_tools

9490c8c

Matovidlo force-pushed the AI-2607-semantic-layer-tooling-tools branch from dd83332 to 9490c8c Compare March 16, 2026 08:57

Matovidlo and others added 9 commits March 17, 2026 11:08

Merge AI-2607-client into AI-2607-semantic-tools

349f68c

AI-2607 test: add semantic schema integration tests

3c54e74

AI-2607 refactor: export semantic tool module

27b03e3

AI-2607 refactor: reorganize semantic tool models

518e9b0

AI-2607 refactor: remove legacy semantic tools module

28888b8

AI-2607 chore: update semantic tooling dependencies

1698a5c

AI-2607 feat: add semantic schema definitions

177e610

AI-2607 feat: add semantic mcp tool handlers

355528a

Matovidlo requested a review from Copilot April 1, 2026 12:44

Copilot started reviewing on behalf of Matovidlo April 1, 2026 12:45 View session

claude bot reviewed Apr 1, 2026

View reviewed changes

Comment thread src/keboola_mcp_server/tools/semantic/service.py

Copilot AI reviewed Apr 1, 2026

View reviewed changes

Comment thread integtests/tools/semantic/test_tools.py

Comment thread src/keboola_mcp_server/tools/semantic/tools.py Outdated

Comment thread TOOLS.md Outdated

claude bot reviewed Apr 1, 2026

View reviewed changes

Comment thread src/keboola_mcp_server/tools/semantic/service.py

Comment thread src/keboola_mcp_server/tools/semantic/tools.py

Comment thread src/keboola_mcp_server/tools/semantic/tools.py

Matovidlo and others added 2 commits April 1, 2026 20:00

claude bot reviewed Apr 1, 2026

View reviewed changes

Comment thread src/keboola_mcp_server/tools/semantic/tools.py Outdated

claude bot reviewed Apr 1, 2026

View reviewed changes

Comment thread src/keboola_mcp_server/tools/semantic/service.py

Comment thread src/keboola_mcp_server/tools/semantic/service.py Outdated

Comment thread src/keboola_mcp_server/tools/semantic/tools.py Outdated

claude bot reviewed Apr 2, 2026

View reviewed changes

Comment thread src/keboola_mcp_server/tools/semantic/service.py

Matovidlo requested review from cjayyy and jordanrburger April 2, 2026 09:42

AI-2607: add LoggingMiddleware for debug-level request logging

52736ff

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude bot reviewed Apr 6, 2026

View reviewed changes

Comment thread src/keboola_mcp_server/tools/semantic/service.py

Comment thread src/keboola_mcp_server/tools/semantic/tools.py

Merge main into AI-2607

1cfefaf

vita-stejskal approved these changes Apr 13, 2026

View reviewed changes

AI-2607 refactor: simplify and improve the expected semantic object r…

654e597

…ecognition

mariankrotil added 5 commits April 13, 2026 14:20

Merge main into AI-2607

71bc5ea

AI-2607: make Pydantic error URL assertions version-aware

349bf3f

AI-2607 style: fix linter

feb039e

AI-2607 fix: expect pydantic version in error messages

9cb127e

Merge main into AI-2607

6b51669

mariankrotil merged commit 6d19677 into main Apr 13, 2026
21 checks passed

mariankrotil deleted the AI-2607-semantic-layer-tooling-tools branch April 13, 2026 14:55

Matovidlo mentioned this pull request Apr 15, 2026

chore: release v1.55.0 #475

Merged

14 tasks

Conversation

Matovidlo commented Mar 10, 2026

Description

Change Type

Summary

Testing

Optional testing

Checklist

Uh oh!

linear bot commented Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mariankrotil commented Apr 10, 2026

Uh oh!

Matovidlo commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants