AI-2607: Add semantic layer MCP tools#416
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new “semantic layer” toolset (discover/get definition/query plan/define) backed by the Metastore API client, wires the Metastore client into KeboolaClient, and registers/docs/tests the new tools.
Changes:
- Added 4 new semantic MCP tools (
semantic_discover,semantic_get_definition,semantic_query_plan,semantic_define) with Pydantic I/O models. - Integrated
MetastoreClientintoKeboolaClientand registered semantic tools in the server and docs generator. - Added/updated unit tests and
TOOLS.mddocumentation for the new tools and Metastore client behaviors.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/keboola_mcp_server/tools/semantic/tools.py |
New semantic tool implementations and server registration helper. |
src/keboola_mcp_server/tools/semantic/model.py |
New Pydantic models/enums for semantic tool inputs/outputs. |
src/keboola_mcp_server/tools/semantic/__init__.py |
Exports semantic tool registration and tag constant. |
src/keboola_mcp_server/clients/client.py |
Adds metastore_client property and derives Metastore base URL from hostname suffix. |
src/keboola_mcp_server/clients/metastore.py |
Minor import cleanup. |
src/keboola_mcp_server/server.py |
Registers semantic tools in server startup. |
src/keboola_mcp_server/generate_tool_docs.py |
Adds Semantic Tools category to generated docs. |
TOOLS.md |
Documents new semantic tools and their JSON schemas. |
tests/tools/test_semantic.py |
New unit tests for semantic tools. |
tests/test_server.py |
Ensures new tools are listed and tagged/annotated correctly. |
tests/conftest.py |
Adds metastore_client mock to the test KeboolaClient fixture. |
tests/clients/test_metastore.py |
Extends Metastore client tests (org-scope params + revisions). |
tests/clients/test_client.py |
Adds test for Metastore URL derivation and headers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c9de053 to
1ef3892
Compare
Addresses open PR #416 review comments and two business requirements: 1. System prompt: new Semantic Layer section instructs AI to call semantic_discover before SQL and to patch semantic objects via semantic_define when query results contradict the semantic definition. 2. Two-tier candidate ranking in semantic_discover: uses LLM sampling (ctx.sample) when the client supports it for semantic relevance ranking, with difflib fuzzy scoring as fallback for partial/plural matches. 3. Reviewer fixes: - limit=0 now correctly returns empty matches (was max(limit,1)) - semantic_define delete branch no longer fetches schema unnecessarily - semantic_get_definition raises ValueError when both uuid and name given - _score_text *texts variadic fixed to single text: str parameter Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses open PR #416 review comments and two business requirements: 1. System prompt: new Semantic Layer section instructs AI to call semantic_discover before SQL and to patch semantic objects via semantic_define when query results contradict the semantic definition. 2. Two-tier candidate ranking in semantic_discover: uses LLM sampling (ctx.sample) when the client supports it for semantic relevance ranking, with difflib fuzzy scoring as fallback for partial/plural matches. 3. Reviewer fixes: - limit=0 now correctly returns empty matches (was max(limit,1)) - semantic_define delete branch no longer fetches schema unnecessarily - semantic_get_definition raises ValueError when both uuid and name given - _score_text *texts variadic fixed to single text: str parameter Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0fd6021 to
dd83332
Compare
… column metadata is unavailable
- Remove unused AliasChoices import from metastore.py - Fix line-too-long (E501) violations in test_metastore.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Document INTEGTEST_METASTORE_URL / INTEGTEST_METASTORE_TOKEN in integtests/README.md - Make JsonApiResource.type required (no silent empty-string default) - Make JsonApiListEnvelope.data required (no silent empty-list default) - Add docstring to MetastoreClient.create Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses open PR #416 review comments and two business requirements: 1. System prompt: new Semantic Layer section instructs AI to call semantic_discover before SQL and to patch semantic objects via semantic_define when query results contradict the semantic definition. 2. Two-tier candidate ranking in semantic_discover: uses LLM sampling (ctx.sample) when the client supports it for semantic relevance ranking, with difflib fuzzy scoring as fallback for partial/plural matches. 3. Reviewer fixes: - limit=0 now correctly returns empty matches (was max(limit,1)) - semantic_define delete branch no longer fetches schema unnecessarily - semantic_get_definition raises ValueError when both uuid and name given - _score_text *texts variadic fixed to single text: str parameter Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…/replace Addresses three Copilot review comments where the `name` argument was resolved for API calls but not reflected in the payload passed to schema validation, allowing schema-required name fields to pass validation only when provided in data rather than the name argument. - create: inject inferred_name into payload when not already present - patch: inject name into merged_payload when a rename is requested - replace: inject replace_name into payload when not already present Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dd83332 to
9490c8c
Compare
Semantic layer tools are now hidden from the tool list and blocked on call unless the project has the `mcp-semantic-tooling` feature enabled in its Storage API token response, matching the pattern used for other feature-gated tools. Moves SEMANTIC_TOOLS_TAG to tools/constants.py to avoid a circular import. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ling, and test gating - Remove duplicate metastore_api_url assignment in client.py (copy-paste dead code) - Remove stray "issues." artifact from validate_semantic_query docstring; merge into the preceding sentence and sync TOOLS.md - Early-return empty lists in _compare_expected_and_detected_objects when no expected objects are provided, preventing all detected objects from being reported as unexpected when the caller passes no expectations - Catch re.error in search_semantic_context and raise ValueError with the offending pattern identified, replacing a cryptic internal exception - Cap max_concurrency at 10 in all process_concurrently calls (tools.py ×4, service.py ×3) that previously passed len(collection), preventing unbounded concurrent Metastore API bursts from user-controlled inputs - Remove INTEGTEST_METASTORE_URL/TOKEN from README (never read; URL is derived from INTEGTEST_POOL_STORAGE_API_URL, token reuses storage token) - Move 4 semantic tools to exclude set in _assert_basic_setup so integration tests pass on projects without the mcp-semantic-tooling feature flag Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 22 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…PStatusError skip
- Guard all six from_metastore class methods and _get_semantic_model_id
against MetastoreObject.attributes=None by using `or {}` fallback so
objects without an attributes field do not raise AttributeError
- Add contexts_per_model optional kwarg to validate_semantic_query and
validate_semantic_used_objects; when expected_semantic_objects is
provided, the tool pre-loads contexts once and passes them to both
calls, eliminating the double _load_validation_contexts round-trip
- Fix grammar: "touches a semantic objects" -> "touches semantic objects"
- Catch httpx.HTTPStatusError in _require_metastore_available and skip
so 4xx/5xx responses (proxy 404, 503) don't fail the whole test module
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cate type accumulation
- Guard semantic_object.data.attributes in _find_matches with `or {}`
so objects without an attributes field don't crash on regex scan or
jsonpath traversal (completes the round-3 attributes=None fix)
- Deduplicate cleaned_model_ids in validate_semantic_query using
dict.fromkeys to preserve order, preventing doubled violations and
context entries when the same model ID is passed twice
- Replace dict comprehension in _compare_expected_and_detected_objects
with setdefault accumulation so duplicate object_type entries merge
their IDs instead of the last entry silently overwriting earlier ones
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…antic_service_data methods
All six SemanticXxxCompact.from_semantic_service_data methods and
SemanticObject.from_semantic_service_data read obj.data.attributes
without a None guard. Since from_metastore stores the original
MetastoreObject verbatim in .data, any object whose Metastore response
omits the attributes field causes AttributeError on .get() calls and
ValidationError in SemanticObject (dict[str,Any] field rejects None).
Change every occurrence to obj.data.attributes or {} to complete the
attributes=None guard across all tools.py access sites.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s, type validation - Replace all obj.meta.name / self.data.meta.name accesses with getattr(..., 'name', None) since MetastoreObject.meta is typed MetaObjectMeta | None; affects display_name, five from_metastore name fallbacks, and two sorted(...) generators in constraint eval - Strip constraint metric/dataset names in evaluate_constraints_from_context (was filtering with metric.strip() but storing unstripped, causing false-positive composition violations and false-negative exclusion hits vs. the stripped used_metric_names/used_dataset_ids sets) - Remove ownership filter from ids branch of load_semantic_context_for_ semantic_type: when caller supplies exact UUIDs, silently dropping objects whose modelUUID is absent causes data loss with no error - Add type validation in get_object_by_id: raise ValueError when raw_obj.type != object_type.value so type mismatches surface at the API boundary instead of producing structurally empty service objects - Guard ownership check in _load_expected_object_groups with `object_model_id is not None` so objects without modelUUID are not falsely rejected; remove the now-dead semantic_type guard (type is validated upstream in get_object_by_id) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
🤝 Taking it over |
|
@mariankrotil the testts are failing, could you please fix it and rebase ? 🙂 Thanks |
Description
Linear: AI-2607
Change Type
Summary
Adds 4 semantic layer MCP tools built on top of the
MetastoreClientfoundation (see #415).New tools:
semantic_discover— ranked search across semantic entities (models, metrics, dimensions, constraints)semantic_get_definition— get a semantic object definition by UUID or namesemantic_query_plan— structured query planner that resolves metrics, dimensions, joins, and constraintssemantic_define— create, patch, replace, delete, or publish semantic objectsSupporting changes:
MetastoreClientwired intoKeboolaClientviametastore_clientpropertytools/semantic/package withmodel.py(Pydantic I/O models) andtools.py(tool implementations)server.pytests/tools/test_semantic.pyTOOLS.mdupdatedTesting
Streamable-HTTPtransports)Optional testing
canary-orionMCP (SSEandStreamable-HTTP)canary-orioncanary-orionChecklist