Skip to content

AI-2922: Local backend mode — RFC spec doc#458

Open
Matovidlo wants to merge 3 commits intomainfrom
AI-2922-local-backend-mode
Open

AI-2922: Local backend mode — RFC spec doc#458
Matovidlo wants to merge 3 commits intomainfrom
AI-2922-local-backend-mode

Conversation

@Matovidlo
Copy link
Copy Markdown
Contributor

Description

Linear: AI-2922

Change Type

  • Major (breaking changes, significant new features)
  • Minor (new features, enhancements, backward compatible)
  • Patch (bug fixes, small improvements, no new features)

Summary

This PR adds docs/local-backend.md — a design RFC for a --local-backend flag that lets the Keboola MCP server run entirely offline, without a KBC_STORAGE_TOKEN.

No code changes. This is a spec/design document for team review and alignment before implementation begins.

What the flag enables

A single CLI flag switches the entire tool surface from platform mode to local mode while keeping the MCP protocol contract identical:

Property Platform (default) Local (--local-backend)
Token required KBC_STORAGE_TOKEN None
Components run via Keboola Jobs API docker run locally
Data stored in Keboola Storage CSV files on disk
SQL queries Snowflake / BigQuery Native duckdb (Python)
Dashboard apps Streamlit on Keboola Vite + React + DuckDB-WASM on localhost

Three implementation pillars

  1. Docker component execution — runs any Keboola component image via docker run using the Common Interface contract (/data mount, config.json, CSV I/O, exit code semantics).

  2. TypeScript/JavaScript app generationcreate_data_app generates a Vite + React + DuckDB-WASM + ECharts single-page app instead of Streamlit. Self-contained, served via Docker Compose on localhost. ECharts chosen for its purely declarative JSON API — the best target for LLM-generated chart code.

  3. In-browser DuckDB-WASM queries — generated apps query CSV files directly in the browser (no server-side DB process). Server-side query_data uses native Python duckdb on local CSVs.

Implementation phases (post-RFC)

  • Phase 1 (weeks 1–2): --local-backend flag, LocalBackend class, local get_tables / query_data / get_buckets / search
  • Phase 2 (weeks 3–4): run_component tool — Docker execution via Common Interface
  • Phase 3 (weeks 5–7): create_data_app — TS/JS app generation with DuckDB-WASM + ECharts
  • Phase 4 (weeks 8–10): polish, error handling, migrate_to_keboola skeleton

Full spec: docs/local-backend.md

Testing

No code changes in this PR — testing checklist is not applicable.

Checklist

  • Self-review completed
  • Unit tests added/updated (if applicable)
  • Integration tests added/updated (if applicable)
  • Project version bumped according to the change type (if applicable) — bumped to 1.52.0
  • Documentation updated (if applicable) — this PR IS the documentation

Defines the architecture and implementation plan for --local-backend flag:
three pillars (Docker component execution, TS/JS app generation, DuckDB-WASM
in-browser queries), CLI design, tool surface changes, and four implementation
phases. No code changes — design doc only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@linear
Copy link
Copy Markdown

linear bot commented Apr 4, 2026

@Matovidlo Matovidlo marked this pull request as ready for review April 4, 2026 14:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an RFC/design spec describing a proposed --local-backend mode for running the Keboola MCP server fully offline (no KBC_STORAGE_TOKEN), plus bumps the project version to 1.52.0.

Changes:

  • Added docs/local-backend.md RFC covering CLI flags, backend/tool-surface behavior, and phased implementation plan.
  • Bumped package version from 1.51.0 to 1.52.0 in pyproject.toml.
  • Updated uv.lock to reflect the new project version.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.

File Description
docs/local-backend.md New RFC describing local backend architecture, tooling, and dependencies.
pyproject.toml Version bump to 1.52.0.
uv.lock Lockfile update aligning with the version bump.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- docs/local-backend.md: replace ^1.29.0 with >=1.30.0 in all three
  occurrences (recommended stack table, dependency summary, security note)
  — ^1.29.0 resolves to >=1.29.0 <2.0.0 in npm semver, which includes
  the compromised 1.29.2 release
- docs/local-backend.md: fix query_local sketch — use CREATE OR REPLACE
  TABLE (not IF NOT EXISTS) so stale tables are always refreshed; use
  parameterized read_csv_auto(?) to eliminate file-path SQL injection;
  sanitize table name (strip double-quotes); remove fetchdf()/pandas
  dependency in favor of cursor-based markdown formatting
- pyproject.toml: correct version bump to 1.51.1 (Patch) — this PR adds
  only documentation, not new features (CLAUDE.md: docs → Patch)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- __init__: eagerly create tables/ subdirectory so Path.glob() never raises OSError on Python 3.13+ before CSV files exist
- query_local: substitute '"' to '_' in table name (not deletion) to prevent both empty-identifier crash and silent name-collision
- query_local: guard cursor.description is None before iterating - DB-API 2.0 sets it to None for DDL/DML (CREATE, DROP, INSERT, UPDATE, DELETE)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

### Platform-only tools (not registered in local mode)

`run_job`, `get_job`, `list_jobs`, `deploy_data_app`, `create_flow`, `update_flow`, `create_conditional_flow`, `create_sql_transformation`, `update_sql_transformation`, `create_oauth_url`
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also sort of simulate jobs but I would skip it. For sure flows could be simulated on loclahost at least by creating cron jobs which would trigger component or duckdb. / create hook when new data are added


1. Upload CSVs to Keboola Storage via Storage API (token required at this point)
2. Convert `configs/*.json` to Keboola component configurations via Components API
3. Map local table references in component parameters to Storage table IDs
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local variant should be creating sort of configurations for the components/ SQL transformation in duckDB so they are migratable (component -> create configuration tool and duckdb using create transformation tool and data apps using create_data_app tool)

Comment on lines +130 to +143
# src/keboola_mcp_server/server.py
def create_server(args) -> FastMCP:
mcp = FastMCP("Keboola MCP Server")

register_common_tools(mcp) # shared tools

if args.local_backend:
local_backend = LocalBackend(data_dir=args.data_dir)
register_local_tools(mcp, local_backend)
else:
platform_backend = PlatformBackend(...)
register_platform_tools(mcp, platform_backend)

return mcp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about the design how it will work with tools and other things. Currently I didn't find how this is intended architecturally, so I describe it how I could understand it. The LocalBackend class is explained that it will provide logic to the tools that run on a local backend (your file-based backend), but what is the PlatformBackend class, it is not explained what it should do? However, assuming it would somehow encapsulate the current platform backend tools?

Regarding the design, I would opt for the option to have two separate MCP servers -- one file-based backend, and one platform backend. The main reason is that, in the platform backend setting, we use middleware that requires a token, and the tools also depend on a client that is provided through that middleware - which would not be required by the local backend server.

I think we could have, for instance, a create_platform_server which would continue to work as it does today and then create_local_server for the LocalBackend where the middleware, filters, context handling, and session handling could be implemented differently, because these two approaches are not fundamentally the same. In the local variant, we can also keep session state and use advantages of that which we do not in the Platform Backend due to multiple instances. Additionally, regarding the shared tools that expose the same interface - the same input and the same output - while having different implementations underneath, it still seems like a good approach to keep those tools separate, even if they share the same name and output/input due to better maintainability.

If this was the intended design, please clarify what the Platform Backend is and how the shared tools will be treat in two different backends.

| `query_data` | Executes SQL on Snowflake/BQ workspace | Executes SQL via native `duckdb` on CSV files |
| `search` | Semantic search via AI service | Searches filenames and CSV headers |
| `get_project_info` | Returns Keboola project metadata | Returns local project metadata |
| `create_data_app` | Generates and deploys Streamlit app | Generates Vite+React+DuckDB-WASM app |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the platform backend, modify_data_app handles create and update, while deploy_data_app handles deploy and stop. There is no create_data_app tool

| `get_buckets` | Lists Storage buckets | Returns single virtual bucket |
| `query_data` | Executes SQL on Snowflake/BQ workspace | Executes SQL via native `duckdb` on CSV files |
| `search` | Semantic search via AI service | Searches filenames and CSV headers |
| `get_project_info` | Returns Keboola project metadata | Returns local project metadata |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also consider which instructions we want to return based on the setting.


A `migrate_to_keboola` tool helps users move local workflows to the Keboola platform:

1. Upload CSVs to Keboola Storage via Storage API (token required at this point)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the token be provided? Would it not be simpler to use some keboola-cli command that handles this separately, outside of the local backend server?

@mcp.tool()
def run_component(
component_image: str,
parameters: dict,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question we should consider. Currently, we instruct the agent to retrieve the component config schemas and examples for the components it wants to run or create, and we fetch those from the AI Service API. How will the agent know this schema now?

@mariankrotil
Copy link
Copy Markdown
Contributor

The RFC direction is clear and describes the implementation mainly as conditional registration in server.py with a LocalBackend dependency. However, platform logic in the current codebase is spread across tools, middleware, ProjectLinksManager, WorkspaceManager, preview routes, and prompt/tool filtering. Many tools obtain platform clients from ctx.session.state, not from a backend interface.

jordanrburger
jordanrburger approved these changes Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants