Secure, cloud-sandboxed Recursive Language Models (RLM) with DSPy and Modal.
fleet-rlm gives AI agents a secure cloud sandbox for long-context code and document work, with a Web UI-first experience, recursive delegation, and DSPy-aligned tooling.
Paper | Docs | Contributing
Install and launch the Web UI in under a minute:
# Option 1: install as a runnable tool
uv tool install fleet-rlm
fleet webOr in your active environment:
# Option 2: regular environment install
uv pip install fleet-rlm
fleet webOpen http://localhost:8000 in your browser.
fleet web is the primary interactive interface. The published package already includes the built frontend assets, so end users do not need bun or a separate frontend toolchain.
- Browser-first RLM chat (
fleet web) - A focused Web UI with
RLM Workspace,Volumes, andSettings - Secure Modal-backed long-context execution for code/doc workflows
- WS-first runtime streaming for chat and execution events
GET /api/v1/auth/meas the canonical frontend identity/bootstrap surface- Multitenant Entra auth with Neon-backed tenant admission when
AUTH_MODE=entra - Runtime configuration and diagnostics from the Web UI settings
- Optional MCP server surface (
fleet-rlm serve-mcp)
# Standalone terminal chat
fleet-rlm chat --trace-mode compact
# Explicit API server
fleet-rlm serve-api --port 8000
# MCP server
fleet-rlm serve-mcp --transport stdio
# Scaffold assets for Claude Code
fleet-rlm init --list- The current Web UI shell supports
RLM Workspace,Volumes, andSettings. - Legacy
taxonomy,skills,memory, andanalyticsbrowser routes redirect to the supported surfaces. - Product chat transport is WS-first (
/api/v1/ws/chat). - Frontend identity/bootstrap is
GET /api/v1/auth/me. - Runtime model updates from Settings are hot-applied in-process (
/api/v1/runtime/settings) and reflected on/api/v1/runtime/status. - Secret inputs in Runtime Settings are write-only.
- In
AUTH_MODE=entra, bearer tokens are validated against Entra JWKS and admitted only for active Neon tenants.
# from repo root
uv sync --extra dev --extra server
uv run fleet web
uv run fastapi devFor release/packaging workflows, uv build now runs frontend build sync automatically (requires bun in repo checkouts that include src/frontend).
Use full contributor setup and quality gates in AGENTS.md and CONTRIBUTING.md.
Read this after the quick start if you want the full system picture (entry points, ReAct orchestration, tools, Modal execution, persistent storage).
graph TB
subgraph entry ["πͺ Entry Points"]
CLI["fleet / fleet-rlm CLI"]
WebUI["Web UI<br/>(React SPA)"]
API["FastAPI<br/>(WS/REST)"]
TUI["Ink TUI<br/>(standalone runtime)"]
MCP["MCP Server"]
end
subgraph orchestration ["π§ Orchestration Layer"]
Agent["RLMReActChatAgent<br/>(dspy.Module)"]
LMs["Planner / Delegate LMs"]
History["Chat History"]
Memory["Core Memory<br/>(Persona/Human/Scratchpad)"]
DocCache["Document Cache"]
end
subgraph tools ["π§ ReAct Tools"]
DocTools["π load_document<br/>read_file_slice<br/>chunk_by_*"]
RecursiveTools["π rlm_query<br/>llm_query<br/>(recursive delegation)"]
ExecTools["β‘ execute_code<br/>edit_file<br/>search_code"]
end
subgraph execution ["βοΈ Execution Layer"]
Interpreter["ModalInterpreter<br/>(JSON protocol)"]
Profiles["Execution Profiles:<br/>ROOT | DELEGATE | MAINTENANCE"]
end
subgraph cloud ["βοΈ Cloud & Persistence"]
Sandbox["Modal Sandbox<br/>(Python REPL + Driver)"]
Volume[("πΎ Modal Volume<br/>/data/<br/>β’ workspaces<br/>β’ docs/metadata")]
Neon[("π Neon Postgres<br/>β’ runs / steps<br/>β’ artifacts<br/>β’ tenants")]
PostHog["π PostHog<br/>(LLM Observability)"]
end
WebUI -->|"WS / REST"| API
CLI --> Agent
API --> Agent
TUI --> Agent
MCP --> Agent
Agent --> LMs
Agent --> History
Agent --> Memory
Agent --> DocCache
Agent --> DocTools
Agent --> RecursiveTools
Agent --> ExecTools
API -.->|"Persistence"| Neon
Agent -.->|"Traces"| PostHog
DocTools --> Interpreter
RecursiveTools --> Interpreter
ExecTools --> Interpreter
Interpreter --> Profiles
Interpreter -->|"stdin/stdout<br/>JSON commands"| Sandbox
Sandbox -->|"read/write"| Volume
style entry fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style orchestration fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style tools fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style execution fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
style cloud fill:#fce4ec,stroke:#c2185b,stroke-width:2px
- Documentation index
- Explanation index
- Quick install + setup
- Configure Modal
- Runtime settings (LM/Modal diagnostics)
- Deploying the server
- Using the MCP server
- Frontend β Backend integration
- CLI reference
- HTTP API reference
- Auth modes
- Database architecture
- Source layout
fleet-rlm also supports runtime diagnostics endpoints, WebSocket execution streams (/api/v1/ws/execution), multi-tenant Neon-backed persistence, and opt-in PostHog LLM analytics. Those workflows are documented in the guides/reference docs rather than front-loaded here.
Contributions are welcome. Start with CONTRIBUTING.md, then use AGENTS.md for repo-specific commands and quality gates.
MIT License β see LICENSE.
Based on Recursive Language Modeling research by Alex L. Zhang (MIT CSAIL), Omar Khattab (Stanford), and Tim Kraska (MIT).