Open-Source Agentic PR Reviewer Built on AgentField
Output • How It Works • Comparison • Quick Start • Architecture
Other tools run a single LLM pass over the diff with a fixed checklist. PR-AF builds a custom review strategy for every PR: it examines the change, reasons about what could go wrong, spawns parallel reviewer agents with runtime-crafted prompts, challenges its own findings adversarially, and posts specific inline comments. Free, open source, one API call. A deep review of a 500-line PR costs about $0.80 in LLM calls.
curl -X POST http://localhost:8080/api/v1/execute/async/pr-af.review \
-H "Content-Type: application/json" \
-d '{"input": {"pr_url": "https://github.com/owner/repo/pull/123"}}'Posts inline GitHub review comments with evidence-grounded findings:
Custom review strategy per PR. Evidence-grounded. Zero false positives. ~$0.80 for a 500-line PR.
PR-AF does not execute a static script. It structurally morphs its own execution graph based on the topology of the incoming Pull Request.
When a PR arrives, the system dynamically compiles review dimensions — evaluating the diff through semantic, mechanical, and systemic lenses. It uses these dimensions to spawn specialized, ephemeral reviewer agents tailored exclusively to the exact context of the current PR.
Full architecture deep-dive:
docs/ARCHITECTURE.md
Pipeline flow (Mermaid)
graph TD
classDef intake fill:#f3f4f6,stroke:#4b5563,stroke-width:2px;
classDef dynamic fill:#dbeafe,stroke:#3b82f6,stroke-width:2px;
classDef verify fill:#fef3c7,stroke:#2563eb,stroke-width:2px;
classDef synthesize fill:#ede9fe,stroke:#d97706,stroke-width:2px;
classDef output fill:#ecfdf5,stroke:#8b5cf6,stroke-width:2px;
PR[Incoming Pull Request] --> I1[Intake Triage]:::intake
I1 --> A1[Topological Anatomy Mapping]:::intake
A1 --> M1[Semantic Lens Generator]:::dynamic
A1 --> M2[Mechanical Lens Generator]:::dynamic
A1 --> M3[Systemic Lens Generator]:::dynamic
M1 --> D[Dimension Deduplication & Compilation]:::dynamic
M2 --> D
M3 --> D
D -->|Dynamically spawns N dimensions| R1(Thread 1: State Mutation)
D --> R2(Thread 2: API Boundaries)
D --> R3(Thread N: Dynamic Context...)
R1 --> E[Programmatic AST Extraction Engine]:::verify
R2 --> E
R3 --> E
E -->|Ground truth caller snippets| V[Evidence Verification Layer]:::verify
V -->|Unsubstantiated claims pruned| F[Falsifiability Gate]:::verify
F --> C1(Compound Cluster: File Topology)
F --> C2(Compound Cluster: Shared Imports)
F --> C3(Compound Cluster: Tag Overlap)
C1 --> S[Compound Vulnerability Synthesis]:::synthesize
C2 --> S
C3 --> S
S --> L{Coverage Depth Gate}
L -->|Blind spots detected| I1
L -->|Full coverage achieved| O[Synthesized GitHub Annotations]:::output
PR-AF uses this multi-phase cognitive pipeline to ensure rigorous, high-fidelity reviews:
Language models inherently operate on probability, which leads to assumption-based false positives. If the system flags a missing validation check, PR-AF does not immediately accept it. Instead, it utilizes programmatic AST (Abstract Syntax Tree) extraction to pull the exact caller snippets and import contexts from the broader repository. This raw data is then evaluated through an isolated verification layer. If the initial claim cannot be irrefutably grounded in the extracted code, it is silently pruned.
Standard tools analyze code linearly. PR-AF looks at the entire board to identify cross-correlated risks. It clusters isolated, seemingly minor anomalies across different files and evaluates them concurrently to detect whether they coalesce into a larger systemic exploit. For example, identifying an unprotected API key in one module and a database merge vulnerability in another will be synthesized into a single, high-severity "Coordinated Injection" finding.
Before any finding is compiled into the final GitHub comment, it must pass through a strict falsifiability framework. The system actively attempts to invalidate its own findings—searching for reasons why the reported anomaly might be safe, intended behavior, or securely mitigated elsewhere in the codebase structure. Only findings that survive this aggressive auto-invalidation process are surfaced to the developer.
There are excellent AI code review tools on the market. PR-AF is not designed to replace fast, interactive tools; it is designed for comprehensive CI/CD gating where accuracy and architectural depth matter more than execution speed.
| Feature | PR-AF (AgentField) | Claude Code CLI | Commercial SaaS (e.g. Codex, CodeRabbit) |
|---|---|---|---|
| Best For | Deep CI/CD architectural audits | Fast, iterative inner-loop development | Clean GitHub UX and chat-based reviews |
| Cost | Free / Open Source (BYOK API costs only) | Pay-per-token (BYOK) | ~$20 - $25 / user / month |
| Architecture | Massively parallel cognitive pipeline | Single-thread interactive loop | Context retrieval + LLM review |
| Execution Time | ~35-50 minutes | Seconds to minutes | ~2-5 minutes |
| False Positives | Extremely low (Evidence Grounding) | Moderate (relies on context window) | Low-to-Moderate (heuristic filtering) |
| Compound Risks | Yes (Dedicated Compound Synthesizer) | Unlikely (diff-focused) | Partial (depends on retrieval accuracy) |
We highly recommend using Claude Code for your local development and running PR-AF as your final GitHub Actions gatekeeper.
git clone https://github.com/Agent-Field/pr-af.git && cd pr-af
cp .env.example .env # Add OPENROUTER_API_KEY, GH_TOKEN
docker compose up --buildStarts AgentField control plane (http://localhost:8080) + PR-AF agent.
curl -X POST http://localhost:8080/api/v1/execute/async/pr-af.review \
-H "Content-Type: application/json" \
-d '{"input": {"pr_url": "https://github.com/owner/repo/pull/123"}}'Poll for results:
curl http://localhost:8080/api/v1/executions/<execution_id>The easiest way to use PR-AF is to drop it into your GitHub Actions. It requires zero configuration and runs securely using GitHub's built-in GITHUB_TOKEN.
Add this workflow to your repository at .github/workflows/pr-af-review.yml. It triggers automatically whenever you add the pr-af label to a Pull Request.
name: AgentField PR Review
on:
pull_request:
types: [labeled]
jobs:
pr-af-review:
if: github.event.label.name == 'pr-af'
runs-on: ubuntu-latest
# Needs permissions to post comments and read code
permissions:
contents: read
pull-requests: write
steps:
- name: Checkout PR-AF
uses: actions/checkout@v4
with:
repository: Agent-Field/pr-af
path: pr-af
- name: Start AgentField & PR-AF
working-directory: ./pr-af
env:
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
docker compose up -d
sleep 15 # Wait for services to be healthy
- name: Execute Deep Architectural Audit
working-directory: ./pr-af
env:
PR_URL: ${{ github.event.pull_request.html_url }}
run: |
python3 scripts/ci_runner.pyNote: PR-AF runs a comprehensive parallel pipeline. Reviews typically take 35-50 minutes depending on PR complexity.


{ "total_findings": 5, "by_severity": {"critical": 1, "important": 2, "suggestion": 2}, "findings": [ { "severity": "critical", "title": "SQL injection in user input handling", "file": "src/api/users.py", "line": 42, "body": "Raw query parameter interpolated directly into SQL. Tracer confirms no parameterization between input and cursor.execute().", "suggestion": "cursor.execute('SELECT * FROM users WHERE id = %s', (user_id,))", "evidence": "AST extraction confirms f-string SQL at users.py:42, no sanitization in call chain", "compound_risk": "Combined with missing auth middleware (finding #2), this is exploitable by unauthenticated users" } ], "review_dimensions": 4, "cost_usd": 0.83 }