Give your AI coding agents memory, judgment, and the discipline to finish the job.
v1.40.0 · 168 modules · 117 test suites · 9 agent harnesses
Quickstart · Why UAP? · uap deliver · Architecture · Benchmarks · Docs
AI coding agents are capable but undisciplined. They forget everything between sessions, burn tokens echoing huge tool outputs, repeat the same mistakes, declare victory on work that doesn't compile, and trip over each other in shared repos. UAP is a production-tested layer that sits underneath your agent harness (Claude Code, Factory, Cursor, OpenCode, and more) and fixes these problems at the protocol level — no model change required.
| The problem | What UAP does | Measured impact |
|---|---|---|
| Agents forget past sessions | 4-tier memory with semantic recall + write-gates | 49.7% fewer tokens |
| Tool output floods the context | MCP Router — tool-hiding + FTS5 output compression | up to ~98% on large tool calls |
| Agents declare done on broken work | uap deliver — convergence loop against real gates |
+33pp task success (25% → 58%) |
| Repetitive mistakes | 23 Terminal-Bench patterns + learning loop | 68% fewer errors |
| Wrong model for the job | Multi-model router, 7 profiles | optimal cost/perf per task |
| Agents step on each other | Worktree isolation + coordination service | conflict-free parallel work |
| "Guidelines" get ignored | Policy gates as executable hooks, not prose | violations are blocked, not suggested |
Benchmarks below are from Terminal-Bench 2.0 (12 representative tasks). See docs/benchmarks/ for the full methodology and raw data.
# Install globally
npm install -g @miller-tech/uap
# One-command setup in your project (memory, patterns, hooks, policies)
cd your-project
uap setupThat's it. Your agent now has persistent memory, battle-tested patterns, policy gates, and multi-agent coordination wired into every session.
uap memory query "how did we handle auth last time?" # semantic recall
uap deliver "add rate limiting to the API" # drive a model to verified completion
uap dashboard overview # live task / agent / memory stateuap deliver is the headline of the v1.27–v1.40 line: a convergence loop that iterates a model against your project's real completion gates until the work is actually delivered — build passes, tests pass, lint is clean — not until the model thinks it's done.
uap deliver "implement the password reset flow"What happens under the hood:
- Explore → plan → apply — the model proposes changes; the applier writes them safely (pre-existing tests and gate configs are protected from being overwritten).
- Verify against real gates — a verifier ladder runs your build, tests, and lint. Nothing is "done" until they're green.
- Critique & iterate — failures feed back as structured guidance; the loop continues, persisting until delivered (extends past
--max-turnsto a ceiling, stopping on genuine stagnation). - Auto-optimization — every task is classified by complexity and the matching aids (HALO trace analysis, divergent ideation, coordination, deploy batching) activate automatically.
- Autonomy with a guidance channel — runs the full mission without stopping to ask, while still accepting operator guidance mid-flight.
It works with frontier models and local models (llama.cpp / Qwen) served over the Anthropic Messages API. See docs/guides/DELIVER.md.
- 🧠 4-tier memory — daily log → working cache → semantic (Qdrant) → long-term archive, with write-gates that block low-quality/duplicate memories and corrections that cascade across tiers.
- 🗜️ MCP Router — a token-optimizing tool proxy; large outputs are compressed via FTS5 intent search instead of dumped into context.
- 🎯
uap deliver— the convergence/delivery harness (above). - 🌳 Worktree workflow — isolated branch-per-feature, auto-PR, safe cleanup; enforced so agents never edit the project root.
- 🛡️ Policy gates — 20 executable enforcers (worktree, test, schema-diff, expert-review, memory-before-plan, delivery-enforcement…) that block non-compliant tool calls.
- 🤖 Expert droids & skills — 38 specialized droids and 32 skills, with an expert-router that recommends a droid chain per task.
- 🧭 Multi-model routing — 7 profiles (Claude Opus/Sonnet/Haiku, GPT, Qwen, generic); the router picks by complexity, cost, and performance.
- 🚦 Deploy batching & coordination — batched git/deploy actions and overlap detection keep multi-agent work conflict-free.
- 📊 Dashboard — rich TUI/web views of tasks, agents, memory, benchmarks, and policy status.
- 🔌 9 harnesses — Claude Code, Factory, Cursor, VSCode, OpenCode, Codex, ForgeCode, Oh-My-Pi, Hermes.
Full list with code-level detail: docs/reference/FEATURES.md.
UAP installs hooks into your agent harness, then mediates every tool call through memory, policy, and token-optimization layers.
┌─────────────────────────────────────────────────────────────┐
│ Agent harnesses │
│ Claude Code · Factory · Cursor · VSCode · OpenCode · … │
└───────────────────────────┬─────────────────────────────────┘
│ hooks (PreToolUse / tool.execute.before)
▼
┌─────────────────────────────────────────────────────────────┐
│ UAP CLI (uap) │
│ setup · memory · deliver · worktree · policy · deploy │
│ task · droids · model · mcp-router · harness · ideate … │
└──┬─────────┬──────────┬──────────┬──────────┬───────────────┘
▼ ▼ ▼ ▼ ▼
Memory Policy MCP Router Delivery Coordination
4 tiers 20 gates FTS5 compr. harness + deploy batch
- 30+ CLI commands across 18 source subsystems (168 TypeScript modules).
- Deep dive: docs/architecture/OVERVIEW.md · protocol spec: docs/architecture/PROTOCOL.md.
The honest, controlled result (paired A/B — same model, tasks, and seeds, toggling only UAP, with confidence intervals): UAP's accuracy lift depends on whether the base agent already self-verifies.
| Baseline | UAP accuracy lift | |
|---|---|---|
| Agentic harness (self-tests) | ~0pp (CI spans 0) | overhead only — value is efficiency/coordination |
| Non-agentic single-shot model | +20pp (78%→98%, 95% CI [+8,+32], p=0.008) | gate loop repairs edge-case bugs |
Run it yourself: uap bench paired --adapter raw --suite benchmarks/suites/real-gate-gated.
Full analysis: docs/benchmarks/PAIRED_FINDINGS.md.
Earlier uncontrolled Terminal-Bench numbers (confounded — see TBench Investigation)
| Metric | Baseline | With UAP | Δ |
|---|---|---|---|
| Tokens consumed | 558,000 | 280,438 | −49.7% |
| Task success rate | 25% | 58% | +33pp |
| Errors per task | 1.17 | 0.42 | −68% |
| Wall-clock (total) | 618s | 266s | −57% |
Methodology, raw runs, and cost analysis: docs/benchmarks/.
| Harness | Hooks | MCP Router | Policy gates |
|---|---|---|---|
| Claude Code | ✅ | ✅ | ✅ |
| Factory | ✅ | ✅ | ✅ |
| Cursor | ✅ | ✅ | ✅ |
| VSCode | ✅ | ✅ | ✅ |
| OpenCode | ✅ | ✅ | ✅ |
| Codex | ✅ | ✅ | ✅ |
| ForgeCode | ✅ | ✅ | ✅ |
| Oh-My-Pi | ✅ | ✅ | ✅ |
| Hermes (global) | ✅ | ✅ | ✅ |
Install into all detected harnesses with uap hooks install; audit coverage with uap hooks doctor. Matrix: docs/reference/PLATFORMS.md.
| Getting Started | Installation, quickstart, configuration |
| Guides | deliver, memory, MCP router, worktrees, policies, multi-model, local models |
| Architecture | System overview + the UAP protocol |
| Reference | CLI, API, patterns, database schema, platforms |
| Benchmarks | Methodology and results |
| Contributing | Dev setup, gates, conventions |
Start at the documentation index.
npm install
npm run build # TypeScript compile
npm test # vitest — 117 suites
npm run bench # benchmark suiteMIT © Miller Tech. See LICENSE.