Skip to content

feat(mcp): add create_dataset tool to register physical tables as datasets#8

Open
hbrooks wants to merge 9 commits into
masterfrom
demo/pr-40340
Open

feat(mcp): add create_dataset tool to register physical tables as datasets#8
hbrooks wants to merge 9 commits into
masterfrom
demo/pr-40340

Conversation

@hbrooks

@hbrooks hbrooks commented May 28, 2026

Copy link
Copy Markdown

Originally PR apache#40340 in apache/superset by @aminghadersohi

aminghadersohi and others added 9 commits May 28, 2026 12:57
…asets

Adds create_dataset MCP tool that wraps POST /api/v1/dataset/ so skills and
agents can register an existing physical table as a Superset dataset without
manual UI interaction. Returns DatasetInfo (same shape as get_dataset_info)
so the resulting dataset_id feeds directly into generate_chart.

- CreateDatasetRequest schema (database_id, schema, table_name, owners?)
- Tool file with typed error handling (exists/not-found/validation/internal)
- Registered in dataset/tool/__init__.py and app.py
- DEFAULT_INSTRUCTIONS updated to list create_dataset
- Unit tests covering success, owners, error cases, and full DatasetInfo shape
- schemas.py: restore full apache/master version and add CreateDatasetRequest
  (previous cherry-pick used an older shorter version missing helper functions
  _sanitize_dataset_info_for_llm_context, _humanize_timestamp, etc.)
- create_dataset.py: remove parse_request decorator (not in apache/master yet)
…and is a lazy import

CreateDatasetCommand is imported inside the function body, so patching at
superset.mcp_service.dataset.tool.create_dataset.CreateDatasetCommand fails
with AttributeError. Patch at the source module instead.

Also fix data["schema_name"] assertions: DatasetInfo.model_serializer renames
the field to "schema" in the serialized output.
…ataset

Restores tool imports that were accidentally dropped from app.py:
create_virtual_dataset, query_dataset, get_chart_sql, get_chart_type_schema,
get_database_info, list_databases, save_sql_query. Exports create_virtual_dataset
and query_dataset from dataset/tool/__init__.py. Fixes KeyError in test_create_dataset
by setting is_favorite=None on the mock dataset to avoid Pydantic bool|None
validation errors from MagicMock auto-attributes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…decorator, normalize whitespace

- CreateDatasetRequest.schema is now str | None (default None) so databases
  without schema namespaces (e.g. SQLite) can register tables without error
- create_dataset switches from @mcp.tool/@mcp_auth_hook to the standard @tool
  decorator from superset_core.mcp.decorators, adding Dataset write RBAC and
  ToolAnnotations consistent with create_virtual_dataset
- Blank/whitespace-only schema values are normalized to None before forwarding
  to CreateDatasetCommand, avoiding spurious table-not-found failures
- Unexpected exceptions now re-raise (middleware handles them) instead of
  being swallowed into an InternalError response; test updated accordingly
- Uses DatasetError.create() factory and event_logger/ctx instrumentation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add test_create_dataset_invalid_error to cover the DatasetInvalidError
  handler in create_dataset (previously untested path)
- Add min_length=1 to CreateDatasetRequest.table_name to reject empty
  strings at the schema layer, consistent with CreateVirtualDatasetRequest

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mirrors schema normalization — table_name is now stripped before being
forwarded to CreateDatasetCommand, preventing whitespace-only strings
from reaching the database layer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants