Skip to content

MPA: Multi-Project Architecture support#451

Open
davidesner wants to merge 9 commits intomainfrom
feature/mpa-support
Open

MPA: Multi-Project Architecture support#451
davidesner wants to merge 9 commits intomainfrom
feature/mpa-support

Conversation

@davidesner
Copy link
Copy Markdown
Contributor

Description

Linear: N/A (feature/exploration branch)

Change Type

  • Minor (new features, enhancements, backward compatible)

Summary

Adds Multi-Project Architecture (MPA) support, allowing users to work with multiple Keboola projects in a single MCP session.

Key changes:

  • Numbered env vars: KBC_STORAGE_TOKEN_1, KBC_STORAGE_TOKEN_2, ... to configure multiple projects in standard .mcp.json format
  • ProjectResolutionMiddleware: Injects project_id and branch_id parameters into tool schemas via middleware — zero changes to existing 31 tool functions
  • Branch tools: list_branches and create_branch for per-project branch management
  • Write protection: KBC_FORBID_MAIN_BRANCH_WRITES=true prevents writes to main branch
  • Multi-project info: get_project_info returns all projects in MPA mode with shared llm_instruction
  • init CLI command: Generates .mcp.json from a Manage API token (interactive, --project-ids, or --all)
  • Fully backward compatible: Single KBC_STORAGE_TOKEN (no number) = legacy mode, unchanged behavior

Param visibility rules:

Config project_id visible branch_id visible
Legacy (single token) No No
1 numbered token, branch fixed No No
1 numbered token, no branch No Yes
2+ tokens Yes Yes (if any branch unfixed)

Testing

  • Tested with Cursor AI desktop (Streamable-HTTP transports)

What was tested

  • All 1145 unit tests pass
  • tox (py312, black, flake8) clean
  • Integration tested against real projects (3049 Chat Exploration, 3047 Chat Data Engineer Demo)
  • Verified: project resolution by ID/alias, branch listing, branch creation (async job polling), multi-project info, write protection

Checklist

  • Self-review completed
  • Unit tests added/updated (if applicable)
  • Integration tests added/updated (if applicable)
  • Project version bumped according to the change type (if applicable)
  • Documentation updated (if applicable)

🤖 Generated with Claude Code

davidesner and others added 9 commits March 31, 2026 16:14
Introduce multi-project architecture (MPA) foundation:
- ProjectConfig dataclass for per-project token/branch config
- ProjectRegistry for resolving projects by ID or alias at runtime
- Numbered env var detection (KBC_STORAGE_TOKEN_1, KBC_STORAGE_TOKEN_2, ...)
- Project IDs and names derived from verify_token() at session init
- Full backward compatibility: single KBC_STORAGE_TOKEN = legacy mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ProjectResolutionMiddleware injects project_id/branch_id params into tool
  schemas and resolves them per tool call (zero changes to tool functions)
- SessionStateMiddleware extended with _create_mpa_session_state for
  concurrent multi-project initialization
- forbid_main_branch_writes enforcement in middleware
- Param visibility rules: project_id only with 2+ projects, branch_id only
  when not all branches are fixed in config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- list_branches and create_branch tools for per-project branch management
- dev_branch_create polls async storage job to return actual branch ID
- get_project_info returns MultiProjectInfo with all projects in MPA mode
- llm_instruction excluded from per-project entries (only at root level)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- init subcommand generates standard .mcp.json from Manage API token
  (supports --project-ids, --all, and interactive selection)
- ManageClient for token verification and Storage token creation
- README updated with multi-project mode documentation (Option E)
- TOOLS.md regenerated with new branch tools

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
create_branch must be allowed on main even when
KBC_FORBID_MAIN_BRANCH_WRITES is set, otherwise the agent
cannot create a dev branch to unblock itself.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix org listing: use GET /manage/organizations + GET /manage/organizations/{id}
  instead of expecting orgs from verify_token response
- Add interactive project selector: arrow keys + space to toggle, no extra deps
- Two-step selection: first pick organization, then pick projects within it
- Document full init CLI reference in README with uvx examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
get_project_info handles multi-project internally by returning
MultiProjectInfo with all projects, so injecting project_id is redundant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davidesner davidesner marked this pull request as ready for review April 1, 2026 09:35
@linear
Copy link
Copy Markdown

linear bot commented Apr 1, 2026

Comment on lines 168 to +195
LOG.info('Returning unified project info.')
return project_info


async def _get_multi_project_info(registry: ProjectRegistry) -> MultiProjectInfo:
"""Fetch project info for all projects in the registry."""

async def fetch_for_project(project_ctx: ProjectContext) -> ProjectInfo:
return await _fetch_single_project_info(
project_ctx.client,
project_ctx.workspace_manager,
include_llm_instruction=False,
)

results = await process_concurrently(
registry.list_projects(),
fetch_for_project,
)

project_infos = []
for result in results:
if isinstance(result, BaseException):
LOG.error(f'Failed to fetch project info: {result}')
continue
project_infos.append(result)

LOG.info(f'Returning multi-project info for {len(project_infos)} projects.')
return MultiProjectInfo(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 In _get_multi_project_info, if every concurrent project fetch fails (e.g., expired tokens, network outage), all exceptions are silently logged and skipped, causing the function to return MultiProjectInfo(projects=[], llm_instruction=...) -- a successful response with zero projects. The AI agent receives no error signal and cannot distinguish a total fetch failure from a legitimately empty project list; a guard like if not project_infos: raise ToolError(...) after the loop should be added.

Extended reasoning...

What the bug is and how it manifests

_get_multi_project_info (project.py lines 168-195) uses process_concurrently to concurrently fetch per-project info, then iterates over the results, logging and skipping any BaseException instances. If every project fetch raises an exception, project_infos remains empty and the function returns MultiProjectInfo(projects=[], llm_instruction=get_project_system_prompt()). From the caller's perspective this is a normal, successful response with zero projects.

The specific code path that triggers it

The loop at the end of _get_multi_project_info:

project_infos = []
for result in results:
    if isinstance(result, BaseException):
        LOG.error('Failed to fetch project info: ...')
        continue
    project_infos.append(result)
# No guard here -- project_infos may be empty
return MultiProjectInfo(projects=project_infos, llm_instruction=get_project_system_prompt())

The @tool_errors() decorator on get_project_info only converts raised exceptions to tool errors. Since no exception is raised when all fetches fail, the decorator has nothing to intercept and the silent empty response propagates to the AI agent.

Why existing code does not prevent it

The design intentionally tolerates partial failure: if some projects succeed and others fail, the successful results are returned. This is a reasonable pattern for partial outages. However, the edge case of total failure is not distinguished from partial success -- there is no lower bound check on project_infos. The ProjectRegistry constructor requires at least one project, so an empty registry is impossible at session-init time; but subsequent API failures (token expiry, network partition) after session start can cause all concurrent fetches to fail simultaneously.

Impact

The AI agent receives projects: [] with a success status and no error message. It has no way to know whether the configuration legitimately contains zero projects (impossible by design -- session startup would have failed) or whether a transient failure caused all fetches to be skipped. The agent may proceed as if no projects exist, silently losing all project context for the remainder of the conversation.

How to fix it

Add a guard immediately after the loop:

if not project_infos:
    raise ToolError(
        'Failed to fetch project information for all configured projects. '
        'This may be caused by expired tokens or a network issue.'
    )

This surfaces total failure as an actionable error while preserving the existing partial-failure behaviour (some projects succeed -> return what we have).

Step-by-step proof

  1. Two projects configured: KBC_STORAGE_TOKEN_1=tok1, KBC_STORAGE_TOKEN_2=tok2.
  2. Both tokens expire simultaneously (or a network outage occurs).
  3. get_project_info is called; session state has a valid ProjectRegistry with 2 projects -> MPA branch taken.
  4. _get_multi_project_info calls process_concurrently([proj1, proj2], fetch_for_project).
  5. Both _fetch_single_project_info calls raise (e.g., httpx.HTTPStatusError: 401 Unauthorized).
  6. results = [HTTPStatusError(...), HTTPStatusError(...)].
  7. Loop: both are BaseException instances -> logged, skipped. project_infos = [].
  8. No guard -> return MultiProjectInfo(projects=[], llm_instruction=...).
  9. The AI agent receives a successful tool result with an empty projects list and no indication of failure.

Comment on lines 153 to +172


def _interactive_checkbox(
items: list[dict],
label_fn: callable,
single_select: bool = False,
prompt: str | None = None,
) -> list[dict]:
"""Interactive selector using arrow keys, space to toggle, enter to confirm.

Controls:
↑/↓ Move cursor
Space Toggle selection (or select in single-select mode)
a Select/deselect all (multi-select only)
Enter Confirm
"""
import termios
import tty

selected = [False] * len(items)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The _interactive_checkbox function imports termios and tty (Unix-only stdlib modules) inside its body; on Windows these modules do not exist, so any user running uvx keboola_mcp_server init without --project-ids or --all will receive ModuleNotFoundError: No module named termios instead of a helpful error. The fix should detect the platform early in interactive mode and emit a clear message directing the user to use --project-ids or --all instead.

Extended reasoning...

What the bug is and how it manifests

_interactive_checkbox in src/keboola_mcp_server/cli.py (lines 169-170) contains two deferred imports at the very top of the function body: import termios and import tty. These modules are part of Python's standard library but are only available on POSIX/Unix systems (Linux, macOS). They are absent from CPython on Windows. Placing the imports inside the function body rather than at module level defers — but does not prevent — the error: the ImportError is simply raised at the first call site instead of at module import time.

The specific code path that triggers it

_interactive_checkbox is called from run_init only when neither --project-ids nor --all is supplied (the interactive mode else branch in run_init). When there is more than one organisation the function is called once to select an organisation, and then again to select projects. Even when there is only one organisation it is still called for the project selection step. On Windows, the very first call raises ModuleNotFoundError: No module named 'termios'.

Why existing code does not prevent it

The deferred import pattern is intended to keep the module importable on all platforms while deferring the failure until the feature is actually used. That intent is correct, but the implementation is incomplete: there is no sys.platform check and no user-friendly error message before the import is attempted. The README documents the interactive init invocation without any Windows caveat (see the 'Interactive mode' example in the new MPA section), and the Windows WSL setup section elsewhere in the README makes clear Windows users are an expected audience.

What the impact would be

Any Windows user who runs uvx keboola_mcp_server init without specifying --project-ids or --all will receive a raw Python traceback ending in ModuleNotFoundError: No module named 'termios'. This is confusing and provides no actionable guidance. The non-interactive paths (--project-ids and --all) are unaffected because they never call _interactive_checkbox.

How to fix it

Add a platform guard at the top of _interactive_checkbox (or at the start of the interactive else branch in run_init):

import sys
if sys.platform == 'win32':
    print('Error: Interactive mode is not supported on Windows. '
          'Use --project-ids or --all instead.', file=sys.stderr)
    sys.exit(1)

The README should also note that interactive mode requires a Unix-like terminal.

Step-by-step proof

  1. Windows user runs: uvx keboola_mcp_server init --manage-token TOKEN --api-url https://connection.keboola.com
  2. run_server dispatches to run_init because parsed_args.command == 'init'.
  3. run_init verifies the token, fetches organisations, and enters the else branch (no --project-ids, no --all).
  4. _interactive_checkbox is called to let the user select projects (or first an org if multiple exist).
  5. Python executes import termios at line 169 of cli.py — this raises ModuleNotFoundError: No module named 'termios' on Windows.
  6. The exception propagates up and is caught by the outer handler in run_server, which logs it and calls sys.exit(1) — the user sees only a confusing traceback with no guidance.

Comment on lines +546 to +575
) -> list[Tool]:
tools = await call_next(context)
config = self._get_config(context.fastmcp_context)

if not config.is_mpa_mode:
return tools

show_project_id = config.show_project_id_param
show_branch_id = config.show_branch_id_param

if not show_project_id and not show_branch_id:
return tools

modified_tools = []
for tool in tools:
if tool.name in PROJECT_AGNOSTIC_TOOLS:
modified_tools.append(tool)
continue

new_params = dict(tool.parameters)
properties = dict(new_params.get('properties', {}))

if show_project_id:
registry = ProjectRegistry.from_state(context.fastmcp_context.session.state)
available = registry._format_available_projects()
properties['project_id'] = {
'type': 'string',
'description': f'Project ID or alias to operate on. Available: {available}',
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 branch_id is injected into list_branches and create_branch tool schemas by ProjectResolutionMiddleware.on_list_tools, but both tools call project-level endpoints (GET /dev-branches, POST /dev-branches) that are unaffected by branch context, making the parameter semantically meaningless and misleading to AI agents. These tools should be excluded from branch_id injection (while still receiving project_id injection in multi-project mode).

Extended reasoning...

What the bug is and how it manifests

ProjectResolutionMiddleware.on_list_tools (mcp.py:546-575) injects optional branch_id parameters into every tool schema that is not in PROJECT_AGNOSTIC_TOOLS. The only tools excluded from injection are docs_query and get_project_info. This means list_branches and create_branch both receive a branch_id schema property with the description: "Branch ID to operate on. If not specified, uses the branch configured for the project..."

The specific code path that triggers it

In branches.py, list_branches calls client.storage_client.branches_list() which executes GET dev-branches. In storage.py, dev_branch_create() executes POST dev-branches. Neither endpoint has a branch-scoped URL prefix (like branch/{id}/dev-branches). The branch context set on AsyncStorageClient via self._branch_id is completely irrelevant to these two endpoints -- they return/create branches at the project level regardless of which branch the client is scoped to.

When on_call_tool processes a call with branch_id set, it calls client.with_branch_id(branch_id) and stores the switched client in session state, but the subsequent branches_list() and dev_branch_create() calls are entirely unaffected by that switch.

Why existing code doesn't prevent it

The commit 4a870a9 MPA: exclude get_project_info from project_id/branch_id injection shows the team added PROJECT_AGNOSTIC_TOOLS to exclude tools that shouldn't receive injected params. However, list_branches and create_branch were not added to this set or any exclusion set. There is currently no BRANCH_ID_EXEMPT_TOOLS concept -- only PROJECT_AGNOSTIC_TOOLS, which also excludes project_id. But project_id IS meaningful for these tools (which project's branches to list/create), so they cannot simply be added to PROJECT_AGNOSTIC_TOOLS.

Impact

An AI agent seeing branch_id in the list_branches schema would reasonably assume it filters or scopes the branch listing -- but it returns all branches regardless. For create_branch, passing branch_id makes no semantic sense (you are creating a new branch; the existing branch context is irrelevant), and an agent might incorrectly believe it creates a branch off a specific parent. This corrupts agent reasoning in MPA sessions.

How to fix it

Add a new BRANCH_ID_EXEMPT_TOOLS frozenset containing list_branches and create_branch, and skip branch_id injection for tools in that set while still allowing project_id injection. In on_list_tools, conditionally skip branch_id injection for tools in this set.

Step-by-step proof

  1. Config has 2 projects, both without fixed branches -> show_branch_id=True
  2. Agent calls list_tools -> middleware injects branch_id into list_branches schema
  3. Agent calls list_branches(branch_id="123") believing it will filter or scope the result
  4. on_call_tool pops branch_id="123", calls client.with_branch_id("123") (switching context)
  5. list_branches calls client.storage_client.branches_list() -> GET /v2/storage/dev-branches
  6. This endpoint ignores the branch context on the client -- returns all branches regardless
  7. Agent receives the same response it would have gotten without branch_id, but formed incorrect beliefs about tool behavior

Comment on lines +358 to +365
import asyncio
import time

data = {'name': name, 'description': description}
job = cast(JsonDict, await self.post(endpoint='dev-branches', data=data))

job_id = job.get('id')
if job.get('status') == 'success':
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 In dev_branch_create, job_id = job.get('id') is not validated before the polling loop: if the POST response lacks an id field and status is not 'success', the code calls self.job_detail(None), constructing URL jobs/None and raising a confusing HTTP error rather than a clear diagnostic. Fix: add if not job_id: raise RuntimeError('Branch creation job ID missing from API response') immediately after the early-return check.

Extended reasoning...

What the bug is and how it manifests

In dev_branch_create (src/keboola_mcp_server/clients/storage.py, lines ~358-395), the code POSTs to dev-branches and reads job_id = job.get('id'). There is one early-return path: if job.get('status') == 'success': return .... If neither condition fires -- status is something other than 'success' and 'id' is absent -- execution falls through to the polling loop with job_id = None.

The specific code path that triggers it

job = cast(JsonDict, await self.post(endpoint='dev-branches', data=data))
job_id = job.get('id')   # None if key is absent
if job.get('status') == 'success':
    return cast(JsonDict, job.get('results', {}))
# No guard here -- job_id may be None
while time.monotonic() < deadline:
    await asyncio.sleep(poll_interval)
    job = await self.job_detail(job_id)   # constructs 'jobs/None'

job_detail(None) builds endpoint jobs/None -- an invalid API path that returns an HTTP error referencing a non-existent job, giving the caller no indication that the real problem was a missing id in the branch-creation response.

Why existing code does not prevent it

The only guard before the polling loop checks status == 'success'. There is no check that job_id is a valid non-None value. The early-return logic and the polling logic are decoupled, so a response with neither status=success nor an id field slips through both checks silently.

Impact

In practice the Keboola SAPI always returns an id for async jobs, so this path is unlikely during normal operation. However, it is a genuine defensive programming gap in new code introduced by this PR. If triggered -- e.g. by a malformed API response, proxy interference, or future API change -- the resulting error message would mislead developers by referencing jobs/None rather than pointing at the real cause: a missing id in the initial response.

How to fix it

Add a validation guard after the success early-return:

job_id = job.get('id')
if job.get('status') == 'success':
    return cast(JsonDict, job.get('results', {}))
if not job_id:
    raise RuntimeError(f'Branch creation job ID missing from API response: {job}')

Step-by-step proof

  1. API returns {"status": "processing"} with no "id" field (unusual but structurally valid JSON).
  2. job_id = job.get('id') evaluates to None.
  3. job.get('status') == 'success' is False; no early return.
  4. Polling loop executes: self.job_detail(None) calls self.get(endpoint='jobs/None').
  5. SAPI returns HTTP 404 with a message about jobs/None not existing.
  6. Caller sees an opaque HTTP error with no indication the real cause is a missing id in the branch-creation response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant