Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 27 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<h1 align="center">Model Context Shell</h1>

<p align="center"><b>Unix-style pipelines for MCP tools compose complex tool workflows as single pipeline requests</b></p>
<p align="center"><b>Unix-style pipelines for MCP tools -compose complex tool workflows as single pipeline requests</b></p>

<p align="center">
<a href="#introduction">Introduction</a> &middot;
Expand All @@ -14,7 +14,7 @@

## Introduction

Model Context Shell is a system that lets AI agents compose [MCP](https://modelcontextprotocol.io/) tool calls similar to Unix shell scripting. Instead of the agent orchestrating each tool call individually (loading all intermediate data into context), agents can express complex workflows as pipelines that execute server-side.
Model Context Shell lets AI agents compose [MCP](https://modelcontextprotocol.io/) tool calls using something like Unix shell scripting. Instead of the agent orchestrating each tool call individually (loading all intermediate data into context), it can express a workflow as a pipeline that executes server-side.

For example, an agent can express a multi-step workflow as a single pipeline:

Expand All @@ -23,11 +23,11 @@ flowchart LR
A["Fetch users (MCP)"] --> B["Extract profile URLs (Shell)"] --> C["for_each (Shell)"] --> C1["Fetch profile (MCP)"] --> D["Filter and sort (Shell)"]
```

This pipeline fetches a list, extracts URLs, fetches each one, filters the results, and returns only the final output to the agent — no intermediate data in context.
This pipeline fetches a list, extracts URLs, fetches each one, filters the results, and returns only the final output to the agent. No intermediate data in context.

### Why this matters

[MCP](https://modelcontextprotocol.io/) is great — standardized interfaces, structured data, extensible ecosystem. But for complex workflows, the agent has to orchestrate each tool call individually, loading all intermediate results into context. Model Context Shell adds a pipeline layer the agent sends a single pipeline, and the server coordinates the tools, returning only the final result:
[MCP](https://modelcontextprotocol.io/) is great, but for complex workflows the agent has to orchestrate each tool call individually, loading all intermediate results into context. Model Context Shell adds a pipeline layer: the agent sends a single pipeline, and the server coordinates the tools, returning only the final result:

```mermaid
flowchart LR
Expand Down Expand Up @@ -60,27 +60,27 @@ flowchart LR

Example query: "List all Pokemon over 50 kg that have the chlorophyll ability"

Instead of 7+ separate tool calls loading all Pokemon data into context, the agent constructed a single pipeline that:
Instead of 7+ separate tool calls loading all Pokemon data into context, the agent built a single pipeline that:
- Fetched the ability data
- Extracted Pokemon URLs
- Fetched each Pokemon's details (7 API calls)
- Filtered by weight and formatted the results

**Result**: Only the final answer is loaded into context — no intermediate API responses.
Only the final answer is loaded into context, not the intermediate API responses.

In practice, agents don't construct the perfect pipeline on the first try. They typically run a few exploratory queries first to understand the shape of the data before building the final pipeline. To keep this process fast and cheap, the server includes a preview stage powered by [headson](https://github.com/kantord/headson) that returns a compact structural summary of the data enough for the agent to plan its transformations without loading the full dataset into context.
In practice, agents don't get the pipeline right on the first try. They typically run a few exploratory queries to understand the shape of the data before building the final pipeline. To keep this fast and cheap, the server includes a preview stage powered by [headson](https://github.com/kantord/headson) that returns a compact structural summary of the data, enough for the agent to plan its transformations without loading the full dataset into context.

### Design

Agents already have access to full shell environments and can call any CLI tool, which has significant overlap with what MCP tools provide. Rather than duplicating that, Model Context Shell explores whether similar workflows can be achieved in a safer, simpler MCP-native environment. Patterns like parallel map-reduce over tool call results are not common today because MCP doesn't natively support them, but they seem like a natural fit for coordinating tool calls imagine fetching all console errors via a Chrome DevTools MCP server and creating a separate GitHub issue for each one. A system tailored to these patterns can make them first-class operations.
Agents already have access to full shell environments and can call any CLI tool, which overlaps a lot with what MCP tools provide. Rather than duplicating that, Model Context Shell tries to achieve similar workflows in a safer, simpler MCP-native environment. Patterns like parallel map-reduce over tool call results are uncommon today because MCP doesn't natively support them, but they're a natural fit for coordinating tool calls -imagine fetching all console errors via a Chrome DevTools MCP server and creating a separate GitHub issue for each one.

The execution engine works with JSON pipeline definitions directly — agents construct pipelines from the MCP tool schema alone, without needing shell syntax. Commands are never passed through a shell interpreter; each command and its arguments are passed as separate elements to the underlying process (`shell=False`), eliminating shell injection risks entirely. Data flows between stages as JSON, preserving types through the pipeline rather than reducing everything to strings. MCP tool arguments are validated against their JSON Schema by the receiving server, giving agents type-checked feedback when they construct pipelines incorrectly.
The execution engine works with JSON pipeline definitions directly. Agents construct pipelines from the MCP tool schema alone, without needing shell syntax. Commands are never passed through a shell interpreter; each command and its arguments are passed as separate elements to the underlying process (`shell=False`), so there's no shell injection. Data flows between stages as JSON, preserving types through the pipeline rather than reducing everything to strings.

The result is a more constrained system compared to a general-purpose shell only a fixed set of data transformation commands is available, and all execution happens inside a container.
It's more constrained than a general-purpose shell: only a fixed set of data transformation commands is available, and all execution happens inside a container.

### How it works

Model Context Shell is packaged as an MCP server, which makes it easy to use with any agent that supports the protocol. It could also be packaged as a library built directly into an agent.
Model Context Shell is packaged as an MCP server, so any agent that supports the protocol can use it. It could also be packaged as a library built directly into an agent.

The server exposes four tools to the agent via MCP:

Expand All @@ -89,7 +89,7 @@ The server exposes four tools to the agent via MCP:
| `execute_pipeline` | Execute a pipeline of tool calls and shell commands |
| `list_all_tools` | Discover all tools available from MCP servers via [ToolHive](https://stacklok.com/download/) |
| `get_tool_details` | Get the full schema and description for a specific tool |
| `list_available_shell_commands` | Show the whitelist of allowed CLI commands |
| `list_available_shell_commands` | Show the allowlist of CLI commands |

The agent constructs pipelines as JSON arrays of stages. Data flows from one stage to the next, similar to Unix pipes. There are three stage types:

Expand All @@ -98,7 +98,7 @@ The agent constructs pipelines as JSON arrays of stages. Data flows from one sta
{"type": "tool", "name": "fetch", "server": "fetch", "args": {"url": "https://..."}}
```

**Command stages** transform data using whitelisted shell commands:
**Command stages** transform data using allowed shell commands:
```json
{"type": "command", "command": "jq", "args": ["-c", ".results[] | {id, name}"]}
```
Expand All @@ -108,9 +108,9 @@ The agent constructs pipelines as JSON arrays of stages. Data flows from one sta
{"type": "preview", "chars": 3000}
```

Any tool stage can set `"for_each": true` to process items one-by-one. The preceding stage must output JSONL (one JSON object per line), and the tool is called once per line. Results are collected into an array. This enables patterns like "fetch a list of URLs, then fetch each one" in a single pipeline call, using a single reused connection for efficiency.
Any tool stage can set `"for_each": true` to process items one-by-one. The preceding stage must output JSONL (one JSON object per line), and the tool is called once per line. Results are collected into an array. So "fetch a list of URLs, then fetch each one" is a single pipeline call, using a single reused connection.

Here is a full example — a pipeline that fetches users, extracts their profile URLs, fetches each profile, and filters for active users:
Full example -fetch users, extract their profile URLs, fetch each profile, filter for active users:

```json
[
Expand All @@ -125,7 +125,7 @@ Here is a full example — a pipeline that fetches users, extracts their profile

### Prerequisites

- [ToolHive](https://stacklok.com/download/) (`thv`) a runtime for managing MCP servers
- [ToolHive](https://stacklok.com/download/) (`thv`) -a runtime for managing MCP servers

### Quick start

Expand All @@ -139,7 +139,7 @@ thv run ghcr.io/stackloklabs/model-context-shell:latest --network host --foregro
thv run ghcr.io/stackloklabs/model-context-shell:latest --foreground --transport streamable-http
```

Once running, you can find the server's address with `thv list`, which shows the URL and port for each running server. If you've registered your AI client with `thv client setup`, ToolHive configures it to discover running servers automatically — see the [CLI quickstart](https://docs.stacklok.com/toolhive/tutorials/quickstart-cli) for details.
Once running, `thv list` shows the URL and port for each running server. If you've registered your AI client with `thv client setup`, ToolHive configures it to discover running servers automatically. See the [CLI quickstart](https://docs.stacklok.com/toolhive/tutorials/quickstart-cli) for details.

Model Context Shell works with any existing MCP servers running through ToolHive, and relies on ToolHive's authentication model for connected servers.

Expand Down Expand Up @@ -174,15 +174,15 @@ See the [ToolHive documentation](https://docs.stacklok.com/toolhive) for the ful

### Tips

**Connect only Model Context Shell to your agent** — For best results, don't connect individual MCP servers directly to the agent alongside Model Context Shell. When agents have direct access to tools, they may call them individually instead of composing efficient pipelines. The server can access all your MCP servers through ToolHive automatically.
**Connect only Model Context Shell to your agent.** Don't connect individual MCP servers directly to the agent alongside Model Context Shell. When agents have direct access to tools, they tend to call them individually instead of composing pipelines. The server can access all your MCP servers through ToolHive automatically.

**Some agents need encouragement** — Most agents will use the shell naturally for complex tasks, but some may need a hint in their system prompt (e.g., "Use Model Context Shell pipelines to combine multiple tool calls efficiently").
**Some agents need encouragement.** Most agents will use the shell naturally for complex tasks, but some may need a hint in their system prompt (e.g., "Use Model Context Shell pipelines to combine multiple tool calls efficiently").

## Security

ToolHive runs Model Context Shell in an isolated container, so shell commands have no access to the host filesystem or network. The MCP servers it coordinates also run in their own separate containers, managed by ToolHive.

- **Allowed commands only**: A fixed whitelist of safe, read-only data transformation commands (`jq`, `grep`, `sed`, `awk`, `sort`, `uniq`, `cut`, `wc`, `head`, `tail`, `tr`, `date`, `bc`, `paste`, `shuf`, `join`, `sleep`)
- **Allowed commands only**: A fixed allowlist of safe, read-only data transformation commands (`jq`, `grep`, `sed`, `awk`, `sort`, `uniq`, `cut`, `wc`, `head`, `tail`, `tr`, `date`, `bc`, `paste`, `shuf`, `join`, `sleep`)
- **No shell injection**: Commands are executed with `shell=False`, arguments passed separately
- **MCP tools only**: All external operations go through approved MCP servers

Expand Down Expand Up @@ -215,23 +215,23 @@ uv run pyright

## Specification

For now, this project serves as a living specification the implementation _is_ the spec. As the idea matures, a more formal specification may be extracted from it.
For now, this project serves as a living specification -the implementation _is_ the spec. A more formal specification may be extracted later.

**Execution model.** The current execution model is a scriptable map-reduce pipeline. Stages run sequentially, with `for_each` providing the map step over tool calls. This could be extended with a more generic mini-interpreter for evaluating more complex pipelines, but the current thinking is that it would never grow into a full-blown programming language. After a certain level of complexity, it makes more sense for agents to write a larger piece of code directly, or combine written code with the shell approach. That said, the built-in access to tools like `jq` and `awk` already makes the pipeline model surprisingly capable for most data transformation tasks.
**Execution model.** The current execution model is a scriptable map-reduce pipeline. Stages run sequentially, with `for_each` providing the map step over tool calls. This could be extended with a more generic mini-interpreter, but it probably shouldn't grow into a full programming language. Past a certain complexity, it makes more sense for agents to write code directly, or combine written code with the shell approach. That said, built-in access to tools like `jq` and `awk` already makes the pipeline model pretty capable for most data transformation tasks.

**Pipeline schema.** The pipeline format is defined by the `execute_pipeline` tool in [`main.py`](https://github.com/StacklokLabs/model-context-shell/blob/main/main.py). Since FastMCP generates the JSON Schema from the function signature and docstring, this serves as the canonical schema definition.
**Pipeline schema.** The pipeline stages are defined as typed Pydantic models in [`models.py`](https://github.com/StacklokLabs/model-context-shell/blob/main/models.py). FastMCP generates a discriminated-union JSON Schema from these models, so MCP clients can validate pipelines before sending them.

**ToolHive and security.** The reliance on ToolHive and container isolation is a practical choice it was the simplest way to get a working, secure system. ToolHive handles tool discovery, container management, and networking, which let this project focus on the pipeline execution model itself. A different deployment model could be used in the future without changing the core concept.
**ToolHive and security.** The reliance on ToolHive and container isolation is a practical choice -it was the simplest way to get a working, secure system. ToolHive handles tool discovery, container management, and networking, which lets this project focus on the pipeline execution model itself. A different deployment model could be used without changing the core concept.

## RFC

This project is both a working tech demo and an early-stage RFC for the concept of composable MCP tool pipelines. Rather than writing a detailed specification upfront, the goal is to gather feedback on the idea by providing something concrete to try.
This is both a working tech demo and an early-stage RFC for composable MCP tool pipelines. Rather than writing a detailed spec upfront, the goal is to gather feedback by providing something concrete to try.

If you have thoughts on the approach, ideas for improvements, or use cases we haven't considered, please share them in the [Discussions](https://github.com/StacklokLabs/model-context-shell/discussions) section.
If you have thoughts, ideas, or use cases we haven't considered, share them in the [Discussions](https://github.com/StacklokLabs/model-context-shell/discussions) section.

## Contributing

Contributions, ideas, and feedback are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines, including our DCO sign-off requirement.
Contributions and feedback welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines, including the DCO sign-off requirement.

## License

Expand Down
5 changes: 3 additions & 2 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

import mcp_client
import toolhive_client
from models import PipelineStage
from shell_engine import ShellEngine

mcp = FastMCP(
Expand Down Expand Up @@ -50,7 +51,7 @@ def list_available_shell_commands() -> list[str]:


@mcp.tool()
async def execute_pipeline(pipeline: list[dict]) -> str:
async def execute_pipeline(pipeline: list[PipelineStage]) -> str:
"""
Execute a pipeline of tool calls and shell commands to coordinate multiple operations.

Expand Down Expand Up @@ -79,7 +80,7 @@ async def execute_pipeline(pipeline: list[dict]) -> str:

Command Stage:
{"type": "command", "command": "jq", "args": ["-c", ".field"]}
- Runs whitelisted shell commands (see list_available_shell_commands)
- Runs allowed shell commands (see list_available_shell_commands)
- Command and args MUST be separate (security requirement)

Preview Stage:
Expand Down
41 changes: 41 additions & 0 deletions models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
"""Pydantic models for pipeline stages.

These models generate a discriminated-union JSON Schema so that MCP clients
and agents can validate pipelines before sending them.
"""

from typing import Annotated, Literal

from pydantic import BaseModel, Field


class ToolStage(BaseModel):
"""Call an external tool from an MCP server."""

type: Literal["tool"]
name: str = Field(min_length=1)
server: str = Field(min_length=1)
args: dict = Field(default_factory=dict)
for_each: bool = False


class CommandStage(BaseModel):
"""Run an allowed shell command."""

type: Literal["command"]
command: str = Field(min_length=1)
args: list[str] = Field(default_factory=list)
for_each: bool = False
timeout: float | None = None


class PreviewStage(BaseModel):
"""Summarize upstream data for inspection."""

type: Literal["preview"]
chars: int = Field(default=3000, gt=0)


PipelineStage = Annotated[
ToolStage | CommandStage | PreviewStage, Field(discriminator="type")
]
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,14 @@ ignore = [
quote-style = "double"

[tool.coverage.run]
source = ["main", "shell_engine", "mcp_client", "toolhive_client"]
source = ["main", "shell_engine", "mcp_client", "toolhive_client", "models"]
branch = true

[tool.coverage.report]
show_missing = true
skip_covered = false

[tool.pyright]
include = ["main.py", "mcp_client.py", "shell_engine.py", "toolhive_client.py", "tests"]
include = ["main.py", "mcp_client.py", "shell_engine.py", "toolhive_client.py", "models.py", "tests"]
pythonVersion = "3.13"
typeCheckingMode = "standard"
Loading