Conversation
Defines the architecture and implementation plan for --local-backend flag: three pillars (Docker component execution, TS/JS app generation, DuckDB-WASM in-browser queries), CLI design, tool surface changes, and four implementation phases. No code changes — design doc only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds an RFC/design spec describing a proposed --local-backend mode for running the Keboola MCP server fully offline (no KBC_STORAGE_TOKEN), plus bumps the project version to 1.52.0.
Changes:
- Added
docs/local-backend.mdRFC covering CLI flags, backend/tool-surface behavior, and phased implementation plan. - Bumped package version from
1.51.0to1.52.0inpyproject.toml. - Updated
uv.lockto reflect the new project version.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
docs/local-backend.md |
New RFC describing local backend architecture, tooling, and dependencies. |
pyproject.toml |
Version bump to 1.52.0. |
uv.lock |
Lockfile update aligning with the version bump. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- docs/local-backend.md: replace ^1.29.0 with >=1.30.0 in all three occurrences (recommended stack table, dependency summary, security note) — ^1.29.0 resolves to >=1.29.0 <2.0.0 in npm semver, which includes the compromised 1.29.2 release - docs/local-backend.md: fix query_local sketch — use CREATE OR REPLACE TABLE (not IF NOT EXISTS) so stale tables are always refreshed; use parameterized read_csv_auto(?) to eliminate file-path SQL injection; sanitize table name (strip double-quotes); remove fetchdf()/pandas dependency in favor of cursor-based markdown formatting - pyproject.toml: correct version bump to 1.51.1 (Patch) — this PR adds only documentation, not new features (CLAUDE.md: docs → Patch) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- __init__: eagerly create tables/ subdirectory so Path.glob() never raises OSError on Python 3.13+ before CSV files exist - query_local: substitute '"' to '_' in table name (not deletion) to prevent both empty-identifier crash and silent name-collision - query_local: guard cursor.description is None before iterating - DB-API 2.0 sets it to None for DDL/DML (CREATE, DROP, INSERT, UPDATE, DELETE) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
||
| ### Platform-only tools (not registered in local mode) | ||
|
|
||
| `run_job`, `get_job`, `list_jobs`, `deploy_data_app`, `create_flow`, `update_flow`, `create_conditional_flow`, `create_sql_transformation`, `update_sql_transformation`, `create_oauth_url` |
There was a problem hiding this comment.
We could also sort of simulate jobs but I would skip it. For sure flows could be simulated on loclahost at least by creating cron jobs which would trigger component or duckdb. / create hook when new data are added
|
|
||
| 1. Upload CSVs to Keboola Storage via Storage API (token required at this point) | ||
| 2. Convert `configs/*.json` to Keboola component configurations via Components API | ||
| 3. Map local table references in component parameters to Storage table IDs |
There was a problem hiding this comment.
The local variant should be creating sort of configurations for the components/ SQL transformation in duckDB so they are migratable (component -> create configuration tool and duckdb using create transformation tool and data apps using create_data_app tool)
| # src/keboola_mcp_server/server.py | ||
| def create_server(args) -> FastMCP: | ||
| mcp = FastMCP("Keboola MCP Server") | ||
|
|
||
| register_common_tools(mcp) # shared tools | ||
|
|
||
| if args.local_backend: | ||
| local_backend = LocalBackend(data_dir=args.data_dir) | ||
| register_local_tools(mcp, local_backend) | ||
| else: | ||
| platform_backend = PlatformBackend(...) | ||
| register_platform_tools(mcp, platform_backend) | ||
|
|
||
| return mcp |
There was a problem hiding this comment.
I am thinking about the design how it will work with tools and other things. Currently I didn't find how this is intended architecturally, so I describe it how I could understand it. The LocalBackend class is explained that it will provide logic to the tools that run on a local backend (your file-based backend), but what is the PlatformBackend class, it is not explained what it should do? However, assuming it would somehow encapsulate the current platform backend tools?
Regarding the design, I would opt for the option to have two separate MCP servers -- one file-based backend, and one platform backend. The main reason is that, in the platform backend setting, we use middleware that requires a token, and the tools also depend on a client that is provided through that middleware - which would not be required by the local backend server.
I think we could have, for instance, a create_platform_server which would continue to work as it does today and then create_local_server for the LocalBackend where the middleware, filters, context handling, and session handling could be implemented differently, because these two approaches are not fundamentally the same. In the local variant, we can also keep session state and use advantages of that which we do not in the Platform Backend due to multiple instances. Additionally, regarding the shared tools that expose the same interface - the same input and the same output - while having different implementations underneath, it still seems like a good approach to keep those tools separate, even if they share the same name and output/input due to better maintainability.
If this was the intended design, please clarify what the Platform Backend is and how the shared tools will be treat in two different backends.
| | `query_data` | Executes SQL on Snowflake/BQ workspace | Executes SQL via native `duckdb` on CSV files | | ||
| | `search` | Semantic search via AI service | Searches filenames and CSV headers | | ||
| | `get_project_info` | Returns Keboola project metadata | Returns local project metadata | | ||
| | `create_data_app` | Generates and deploys Streamlit app | Generates Vite+React+DuckDB-WASM app | |
There was a problem hiding this comment.
In the platform backend, modify_data_app handles create and update, while deploy_data_app handles deploy and stop. There is no create_data_app tool
| | `get_buckets` | Lists Storage buckets | Returns single virtual bucket | | ||
| | `query_data` | Executes SQL on Snowflake/BQ workspace | Executes SQL via native `duckdb` on CSV files | | ||
| | `search` | Semantic search via AI service | Searches filenames and CSV headers | | ||
| | `get_project_info` | Returns Keboola project metadata | Returns local project metadata | |
There was a problem hiding this comment.
We should also consider which instructions we want to return based on the setting.
|
|
||
| A `migrate_to_keboola` tool helps users move local workflows to the Keboola platform: | ||
|
|
||
| 1. Upload CSVs to Keboola Storage via Storage API (token required at this point) |
There was a problem hiding this comment.
How will the token be provided? Would it not be simpler to use some keboola-cli command that handles this separately, outside of the local backend server?
| @mcp.tool() | ||
| def run_component( | ||
| component_image: str, | ||
| parameters: dict, |
There was a problem hiding this comment.
One question we should consider. Currently, we instruct the agent to retrieve the component config schemas and examples for the components it wants to run or create, and we fetch those from the AI Service API. How will the agent know this schema now?
|
The RFC direction is clear and describes the implementation mainly as conditional registration in |
Description
Linear: AI-2922
Change Type
Summary
This PR adds
docs/local-backend.md— a design RFC for a--local-backendflag that lets the Keboola MCP server run entirely offline, without aKBC_STORAGE_TOKEN.No code changes. This is a spec/design document for team review and alignment before implementation begins.
What the flag enables
A single CLI flag switches the entire tool surface from platform mode to local mode while keeping the MCP protocol contract identical:
--local-backend)KBC_STORAGE_TOKENdocker runlocallyduckdb(Python)Three implementation pillars
Docker component execution — runs any Keboola component image via
docker runusing the Common Interface contract (/datamount,config.json, CSV I/O, exit code semantics).TypeScript/JavaScript app generation —
create_data_appgenerates a Vite + React + DuckDB-WASM + ECharts single-page app instead of Streamlit. Self-contained, served via Docker Compose on localhost. ECharts chosen for its purely declarative JSON API — the best target for LLM-generated chart code.In-browser DuckDB-WASM queries — generated apps query CSV files directly in the browser (no server-side DB process). Server-side
query_datauses native Pythonduckdbon local CSVs.Implementation phases (post-RFC)
--local-backendflag,LocalBackendclass, localget_tables/query_data/get_buckets/searchrun_componenttool — Docker execution via Common Interfacecreate_data_app— TS/JS app generation with DuckDB-WASM + EChartsmigrate_to_keboolaskeletonFull spec:
docs/local-backend.mdTesting
No code changes in this PR — testing checklist is not applicable.
Checklist