Universal Agent Protocol (UAP)

Give your AI coding agents memory, judgment, and the discipline to finish the job.

v1.40.0 · 168 modules · 117 test suites · 9 agent harnesses

Quickstart · Why UAP? · uap deliver · Architecture · Benchmarks · Docs

Why UAP?

AI coding agents are capable but undisciplined. They forget everything between sessions, burn tokens echoing huge tool outputs, repeat the same mistakes, declare victory on work that doesn't compile, and trip over each other in shared repos. UAP is a production-tested layer that sits underneath your agent harness (Claude Code, Factory, Cursor, OpenCode, and more) and fixes these problems at the protocol level — no model change required.

The problem	What UAP does	Measured impact
Agents forget past sessions	4-tier memory with semantic recall + write-gates	49.7% fewer tokens
Tool output floods the context	MCP Router — tool-hiding + FTS5 output compression	up to ~98% on large tool calls
Agents declare done on broken work	`uap deliver` — convergence loop against real gates	+33pp task success (25% → 58%)
Repetitive mistakes	23 Terminal-Bench patterns + learning loop	68% fewer errors
Wrong model for the job	Multi-model router, 7 profiles	optimal cost/perf per task
Agents step on each other	Worktree isolation + coordination service	conflict-free parallel work
"Guidelines" get ignored	Policy gates as executable hooks, not prose	violations are blocked, not suggested

Benchmarks below are from Terminal-Bench 2.0 (12 representative tasks). See docs/benchmarks/ for the full methodology and raw data.

Quickstart

# Install globally
npm install -g @miller-tech/uap

# One-command setup in your project (memory, patterns, hooks, policies)
cd your-project
uap setup

That's it. Your agent now has persistent memory, battle-tested patterns, policy gates, and multi-agent coordination wired into every session.

uap memory query "how did we handle auth last time?"   # semantic recall
uap deliver "add rate limiting to the API"             # drive a model to verified completion
uap dashboard overview                                  # live task / agent / memory state

The `deliver` harness

uap deliver is the headline of the v1.27–v1.40 line: a convergence loop that iterates a model against your project's real completion gates until the work is actually delivered — build passes, tests pass, lint is clean — not until the model thinks it's done.

uap deliver "implement the password reset flow"

What happens under the hood:

Explore → plan → apply — the model proposes changes; the applier writes them safely (pre-existing tests and gate configs are protected from being overwritten).
Verify against real gates — a verifier ladder runs your build, tests, and lint. Nothing is "done" until they're green.
Critique & iterate — failures feed back as structured guidance; the loop continues, persisting until delivered (extends past --max-turns to a ceiling, stopping on genuine stagnation).
Auto-optimization — every task is classified by complexity and the matching aids (HALO trace analysis, divergent ideation, coordination, deploy batching) activate automatically.
Autonomy with a guidance channel — runs the full mission without stopping to ask, while still accepting operator guidance mid-flight.

It works with frontier models and local models (llama.cpp / Qwen) served over the Anthropic Messages API. See docs/guides/DELIVER.md.

Features

🧠 4-tier memory — daily log → working cache → semantic (Qdrant) → long-term archive, with write-gates that block low-quality/duplicate memories and corrections that cascade across tiers.
🗜️ MCP Router — a token-optimizing tool proxy; large outputs are compressed via FTS5 intent search instead of dumped into context.
🎯 uap deliver — the convergence/delivery harness (above).
🌳 Worktree workflow — isolated branch-per-feature, auto-PR, safe cleanup; enforced so agents never edit the project root.
🛡️ Policy gates — 20 executable enforcers (worktree, test, schema-diff, expert-review, memory-before-plan, delivery-enforcement…) that block non-compliant tool calls.
🤖 Expert droids & skills — 38 specialized droids and 32 skills, with an expert-router that recommends a droid chain per task.
🧭 Multi-model routing — 7 profiles (Claude Opus/Sonnet/Haiku, GPT, Qwen, generic); the router picks by complexity, cost, and performance.
🚦 Deploy batching & coordination — batched git/deploy actions and overlap detection keep multi-agent work conflict-free.
📊 Dashboard — rich TUI/web views of tasks, agents, memory, benchmarks, and policy status.
🔌 9 harnesses — Claude Code, Factory, Cursor, VSCode, OpenCode, Codex, ForgeCode, Oh-My-Pi, Hermes.

Full list with code-level detail: docs/reference/FEATURES.md.

Architecture

UAP installs hooks into your agent harness, then mediates every tool call through memory, policy, and token-optimization layers.

┌─────────────────────────────────────────────────────────────┐
│  Agent harnesses                                            │
│  Claude Code · Factory · Cursor · VSCode · OpenCode · …     │
└───────────────────────────┬─────────────────────────────────┘
                            │ hooks (PreToolUse / tool.execute.before)
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                       UAP CLI (uap)                         │
│  setup · memory · deliver · worktree · policy · deploy      │
│  task · droids · model · mcp-router · harness · ideate …    │
└──┬─────────┬──────────┬──────────┬──────────┬───────────────┘
   ▼         ▼          ▼          ▼          ▼
 Memory   Policy    MCP Router   Delivery   Coordination
 4 tiers  20 gates  FTS5 compr.  harness    + deploy batch

30+ CLI commands across 18 source subsystems (168 TypeScript modules).
Deep dive: docs/architecture/OVERVIEW.md · protocol spec: docs/architecture/PROTOCOL.md.

Benchmarks

The honest, controlled result (paired A/B — same model, tasks, and seeds, toggling only UAP, with confidence intervals): UAP's accuracy lift depends on whether the base agent already self-verifies.

Baseline	UAP accuracy lift
Agentic harness (self-tests)	~0pp (CI spans 0)	overhead only — value is efficiency/coordination
Non-agentic single-shot model	+20pp (78%→98%, 95% CI [+8,+32], p=0.008)	gate loop repairs edge-case bugs

Run it yourself: uap bench paired --adapter raw --suite benchmarks/suites/real-gate-gated. Full analysis: docs/benchmarks/PAIRED_FINDINGS.md.

Earlier uncontrolled Terminal-Bench numbers (confounded — see TBench Investigation)

Metric	Baseline	With UAP	Δ
Tokens consumed	558,000	280,438	−49.7%
Task success rate	25%	58%	+33pp
Errors per task	1.17	0.42	−68%
Wall-clock (total)	618s	266s	−57%

Methodology, raw runs, and cost analysis: docs/benchmarks/.

Supported harnesses

Harness	Hooks	MCP Router	Policy gates
Claude Code	✅	✅	✅
Factory	✅	✅	✅
Cursor	✅	✅	✅
VSCode	✅	✅	✅
OpenCode	✅	✅	✅
Codex	✅	✅	✅
ForgeCode	✅	✅	✅
Oh-My-Pi	✅	✅	✅
Hermes (global)	✅	✅	✅

Install into all detected harnesses with uap hooks install; audit coverage with uap hooks doctor. Matrix: docs/reference/PLATFORMS.md.

Documentation


Getting Started	Installation, quickstart, configuration
Guides	deliver, memory, MCP router, worktrees, policies, multi-model, local models
Architecture	System overview + the UAP protocol
Reference	CLI, API, patterns, database schema, platforms
Benchmarks	Methodology and results
Contributing	Dev setup, gates, conventions

Start at the documentation index.

Testing

npm install
npm run build      # TypeScript compile
npm test           # vitest — 117 suites
npm run bench      # benchmark suite

Name		Name	Last commit message	Last commit date
Latest commit History 718 Commits
.beads		.beads
.claude		.claude
.codex		.codex
.cursor		.cursor
.factory		.factory
.forge		.forge
.github/workflows		.github/workflows
.omp/hooks		.omp/hooks
.opencode		.opencode
.pipeline		.pipeline
.policy-tools		.policy-tools
.vscode		.vscode
agents/scripts		agents/scripts
benchmarks		benchmarks
config		config
docs		docs
examples		examples
harbor-configs		harbor-configs
infra/policies		infra/policies
policies		policies
scripts		scripts
skills		skills
src		src
templates		templates
test		test
tools		tools
web		web
.eslintrc.cjs		.eslintrc.cjs
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierrc		.prettierrc
.uap.json		.uap.json
AGENT.md		AGENT.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
opencode.json		opencode.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.bench.config.ts		vitest.bench.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Universal Agent Protocol (UAP)

Why UAP?

Quickstart

The `deliver` harness

Features

Architecture

Benchmarks

Supported harnesses

Documentation

Testing

License

About

Uh oh!

Releases 143

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Universal Agent Protocol (UAP)

Why UAP?

Quickstart

The deliver harness

Features

Architecture

Benchmarks

Supported harnesses

Documentation

Testing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 143

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

The `deliver` harness

Packages