AI-2922: Local backend mode — RFC spec doc by Matovidlo · Pull Request #458 · keboola/mcp-server

Matovidlo · 2026-04-04T14:30:35Z

Description

Linear: AI-2922

Change Type

Major (breaking changes, significant new features)
Minor (new features, enhancements, backward compatible)
Patch (bug fixes, small improvements, no new features)

Summary

This PR adds docs/local-backend.md — a design RFC for a --local-backend flag that lets the Keboola MCP server run entirely offline, without a KBC_STORAGE_TOKEN.

No code changes. This is a spec/design document for team review and alignment before implementation begins.

What the flag enables

A single CLI flag switches the entire tool surface from platform mode to local mode while keeping the MCP protocol contract identical:

Property	Platform (default)	Local (`--local-backend`)
Token required	`KBC_STORAGE_TOKEN`	None
Components run via	Keboola Jobs API	`docker run` locally
Data stored in	Keboola Storage	CSV files on disk
SQL queries	Snowflake / BigQuery	Native `duckdb` (Python)
Dashboard apps	Streamlit on Keboola	Vite + React + DuckDB-WASM on localhost

Three implementation pillars

Docker component execution — runs any Keboola component image via docker run using the Common Interface contract (/data mount, config.json, CSV I/O, exit code semantics).
TypeScript/JavaScript app generation — create_data_app generates a Vite + React + DuckDB-WASM + ECharts single-page app instead of Streamlit. Self-contained, served via Docker Compose on localhost. ECharts chosen for its purely declarative JSON API — the best target for LLM-generated chart code.
In-browser DuckDB-WASM queries — generated apps query CSV files directly in the browser (no server-side DB process). Server-side query_data uses native Python duckdb on local CSVs.

Implementation phases (post-RFC)

Phase 1 (weeks 1–2): --local-backend flag, LocalBackend class, local get_tables / query_data / get_buckets / search
Phase 2 (weeks 3–4): run_component tool — Docker execution via Common Interface
Phase 3 (weeks 5–7): create_data_app — TS/JS app generation with DuckDB-WASM + ECharts
Phase 4 (weeks 8–10): polish, error handling, migrate_to_keboola skeleton

Full spec: docs/local-backend.md

Testing

No code changes in this PR — testing checklist is not applicable.

Checklist

Self-review completed
Unit tests added/updated (if applicable)
Integration tests added/updated (if applicable)
Project version bumped according to the change type (if applicable) — bumped to 1.52.0
Documentation updated (if applicable) — this PR IS the documentation

Defines the architecture and implementation plan for --local-backend flag: three pillars (Docker component execution, TS/JS app generation, DuckDB-WASM in-browser queries), CLI design, tool surface changes, and four implementation phases. No code changes — design doc only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

linear · 2026-04-04T14:30:39Z

AI-2922 Local backend mode for keboola-mcp-server (--local-backend flag)

Copilot

Pull request overview

Adds an RFC/design spec describing a proposed --local-backend mode for running the Keboola MCP server fully offline (no KBC_STORAGE_TOKEN), plus bumps the project version to 1.52.0.

Changes:

Added docs/local-backend.md RFC covering CLI flags, backend/tool-surface behavior, and phased implementation plan.
Bumped package version from 1.51.0 to 1.52.0 in pyproject.toml.
Updated uv.lock to reflect the new project version.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`docs/local-backend.md`	New RFC describing local backend architecture, tooling, and dependencies.
`pyproject.toml`	Version bump to 1.52.0.
`uv.lock`	Lockfile update aligning with the version bump.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/local-backend.md

pyproject.toml

- docs/local-backend.md: replace ^1.29.0 with >=1.30.0 in all three occurrences (recommended stack table, dependency summary, security note) — ^1.29.0 resolves to >=1.29.0 <2.0.0 in npm semver, which includes the compromised 1.29.2 release - docs/local-backend.md: fix query_local sketch — use CREATE OR REPLACE TABLE (not IF NOT EXISTS) so stale tables are always refreshed; use parameterized read_csv_auto(?) to eliminate file-path SQL injection; sanitize table name (strip double-quotes); remove fetchdf()/pandas dependency in favor of cursor-based markdown formatting - pyproject.toml: correct version bump to 1.51.1 (Patch) — this PR adds only documentation, not new features (CLAUDE.md: docs → Patch) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs/local-backend.md

- __init__: eagerly create tables/ subdirectory so Path.glob() never raises OSError on Python 3.13+ before CSV files exist - query_local: substitute '"' to '_' in table name (not deletion) to prevent both empty-identifier crash and silent name-collision - query_local: guard cursor.description is None before iterating - DB-API 2.0 sets it to None for DDL/DML (CREATE, DROP, INSERT, UPDATE, DELETE) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs/local-backend.md

Matovidlo · 2026-04-06T07:19:01Z

docs/local-backend.md

+
+### Platform-only tools (not registered in local mode)
+
+`run_job`, `get_job`, `list_jobs`, `deploy_data_app`, `create_flow`, `update_flow`, `create_conditional_flow`, `create_sql_transformation`, `update_sql_transformation`, `create_oauth_url`


We could also sort of simulate jobs but I would skip it. For sure flows could be simulated on loclahost at least by creating cron jobs which would trigger component or duckdb. / create hook when new data are added

Matovidlo · 2026-04-06T07:27:27Z

docs/local-backend.md

+
+1. Upload CSVs to Keboola Storage via Storage API (token required at this point)
+2. Convert `configs/*.json` to Keboola component configurations via Components API
+3. Map local table references in component parameters to Storage table IDs


The local variant should be creating sort of configurations for the components/ SQL transformation in duckDB so they are migratable (component -> create configuration tool and duckdb using create transformation tool and data apps using create_data_app tool)

mariankrotil · 2026-04-10T09:11:56Z

docs/local-backend.md

+# src/keboola_mcp_server/server.py
+def create_server(args) -> FastMCP:
+    mcp = FastMCP("Keboola MCP Server")
+
+    register_common_tools(mcp)          # shared tools
+
+    if args.local_backend:
+        local_backend = LocalBackend(data_dir=args.data_dir)
+        register_local_tools(mcp, local_backend)
+    else:
+        platform_backend = PlatformBackend(...)
+        register_platform_tools(mcp, platform_backend)
+
+    return mcp


I am thinking about the design how it will work with tools and other things. Currently I didn't find how this is intended architecturally, so I describe it how I could understand it. The LocalBackend class is explained that it will provide logic to the tools that run on a local backend (your file-based backend), but what is the PlatformBackend class, it is not explained what it should do? However, assuming it would somehow encapsulate the current platform backend tools?

Regarding the design, I would opt for the option to have two separate MCP servers -- one file-based backend, and one platform backend. The main reason is that, in the platform backend setting, we use middleware that requires a token, and the tools also depend on a client that is provided through that middleware - which would not be required by the local backend server.

I think we could have, for instance, a create_platform_server which would continue to work as it does today and then create_local_server for the LocalBackend where the middleware, filters, context handling, and session handling could be implemented differently, because these two approaches are not fundamentally the same. In the local variant, we can also keep session state and use advantages of that which we do not in the Platform Backend due to multiple instances. Additionally, regarding the shared tools that expose the same interface - the same input and the same output - while having different implementations underneath, it still seems like a good approach to keep those tools separate, even if they share the same name and output/input due to better maintainability.

If this was the intended design, please clarify what the Platform Backend is and how the shared tools will be treat in two different backends.

mariankrotil · 2026-04-10T09:14:38Z

docs/local-backend.md

+| `query_data` | Executes SQL on Snowflake/BQ workspace | Executes SQL via native `duckdb` on CSV files |
+| `search` | Semantic search via AI service | Searches filenames and CSV headers |
+| `get_project_info` | Returns Keboola project metadata | Returns local project metadata |
+| `create_data_app` | Generates and deploys Streamlit app | Generates Vite+React+DuckDB-WASM app |


In the platform backend, modify_data_app handles create and update, while deploy_data_app handles deploy and stop. There is no create_data_app tool

mariankrotil · 2026-04-10T09:23:08Z

docs/local-backend.md

+| `get_buckets` | Lists Storage buckets | Returns single virtual bucket |
+| `query_data` | Executes SQL on Snowflake/BQ workspace | Executes SQL via native `duckdb` on CSV files |
+| `search` | Semantic search via AI service | Searches filenames and CSV headers |
+| `get_project_info` | Returns Keboola project metadata | Returns local project metadata |


We should also consider which instructions we want to return based on the setting.

mariankrotil · 2026-04-10T09:24:53Z

docs/local-backend.md

+
+A `migrate_to_keboola` tool helps users move local workflows to the Keboola platform:
+
+1. Upload CSVs to Keboola Storage via Storage API (token required at this point)


How will the token be provided? Would it not be simpler to use some keboola-cli command that handles this separately, outside of the local backend server?

mariankrotil · 2026-04-10T09:25:52Z

docs/local-backend.md

+@mcp.tool()
+def run_component(
+    component_image: str,
+    parameters: dict,


One question we should consider. Currently, we instruct the agent to retrieve the component config schemas and examples for the components it wants to run or create, and we fetch those from the AI Service API. How will the agent know this schema now?

mariankrotil · 2026-04-10T09:59:57Z

The RFC direction is clear and describes the implementation mainly as conditional registration in server.py with a LocalBackend dependency. However, platform logic in the current codebase is spread across tools, middleware, ProjectLinksManager, WorkspaceManager, preview routes, and prompt/tool filtering. Many tools obtain platform clients from ctx.session.state, not from a backend interface.

Matovidlo requested review from Copilot and jordanrburger April 4, 2026 14:37

Matovidlo marked this pull request as ready for review April 4, 2026 14:38

Copilot started reviewing on behalf of Matovidlo April 4, 2026 14:38 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

docs/local-backend.md Outdated Show resolved Hide resolved

docs/local-backend.md Outdated Show resolved Hide resolved

docs/local-backend.md Show resolved Hide resolved

claude bot reviewed Apr 4, 2026

View reviewed changes

docs/local-backend.md Outdated Show resolved Hide resolved

docs/local-backend.md Outdated Show resolved Hide resolved

pyproject.toml Outdated Show resolved Hide resolved

claude bot reviewed Apr 4, 2026

View reviewed changes

docs/local-backend.md Show resolved Hide resolved

docs/local-backend.md Show resolved Hide resolved

docs/local-backend.md Outdated Show resolved Hide resolved

claude bot reviewed Apr 4, 2026

View reviewed changes

docs/local-backend.md Show resolved Hide resolved

docs/local-backend.md Show resolved Hide resolved

Matovidlo commented Apr 6, 2026

View reviewed changes

mariankrotil reviewed Apr 10, 2026

View reviewed changes

jordanrburger approved these changes Apr 13, 2026 •

edited

Loading

View reviewed changes


		### Platform-only tools (not registered in local mode)

		`run_job`, `get_job`, `list_jobs`, `deploy_data_app`, `create_flow`, `update_flow`, `create_conditional_flow`, `create_sql_transformation`, `update_sql_transformation`, `create_oauth_url`


		A `migrate_to_keboola` tool helps users move local workflows to the Keboola platform:

		1. Upload CSVs to Keboola Storage via Storage API (token required at this point)

Conversation

Matovidlo commented Apr 4, 2026

Description

Change Type

Summary

What the flag enables

Three implementation pillars

Implementation phases (post-RFC)

Testing

Checklist

Uh oh!

linear bot commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Matovidlo Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Matovidlo Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

mariankrotil Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

mariankrotil Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

mariankrotil Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

mariankrotil Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

mariankrotil Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

mariankrotil commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants