docs: Add FAQ section for common questions#18
Open
meichuanyi wants to merge 318 commits into
Open
Conversation
…tests See CHANGELOG for the full list of fixes shipped in this release. The big ones: - Templates packaging fix means HTML and PDF reports finally work for everyone who installed via pip - LLM gate replaced with first-run setup prompt so customers using Claude Code MCP no longer need to set a fake API key - Agents fall back to deterministic mode when LLM is unreachable instead of returning silent zero-finding results - 4 previously-fictional agents (api_security, credential_tester, vuln_scanner, privesc) now exist as real BaseAgent classes with both LLM-driven and deterministic implementations - 8 specialist agents wired up as MCP tools: total 33 -> 41 - Recon results capped per phase to prevent 27,680-finding explosions - Stale 'running' engagements get reconciled to 'interrupted' on the next ptai start Also adds 4 integration tests for the reconciler that hit a real SQLite DB. The original reconciler shipped with started_at instead of created_at in the WHERE clause; mocked CLI tests didn't catch it because they don't talk to actual SQL. Fixed in 7fcb503; the new test_reconcile_uses_real_columns guards against that class of schema drift in CI.
Code (9 fixes): - Item 7: subdomain cap configurable via PENTEST_AI_MAX_FINDINGS_<PHASE> env vars - Item 9: orchestrator catches asyncio.CancelledError, marks engagement as 'interrupted' on SIGINT/SIGTERM instead of leaving 'running' forever - Item 11: ptai resume on already-completed engagement now says "already completed; nothing to resume" with a hint to use ptai retest, instead of the misleading "resumed and completed" CI / packaging (2 new): - Item 10: .github/workflows/docker.yml builds + pushes ghcr.io image on every v*.*.* tag. Smoke-tests the published image before declaring success. - Item 1: docs/release-pypi.md walks through the one-time PyPI Trusted Publishing setup that the v0.10.2 + v0.10.3 release runs were missing. Marketing site: - Item 14: pricing/trust-bar counts updated 191/12/33 -> 194/17/41 to match the actual product (committed in pentest-ai-preview-v4 separately). Docs (4 drafts ready for legal/ops review): - Item 3: docs/legal/PRIVACY.md and docs/legal/TERMS.md - Item 5: docs/status-page-runbook.md - Item 17: docs/launch-playbook.md (T-14 through T+7 plan) 655 tests pass.
WebAgent's deterministic _run_tool_phase ran tools sequentially. nuclei + nikto + skipfish in series could push a single phase past 5 minutes; the end-to-end test in 0.10.3 hit a 20s timeout on test_web_app for that reason. Fix: each phase now schedules its installed tools with asyncio.gather and caps each tool at 30s wall-clock. ReconAgent, ADAgent, APISecurityAgent, and WirelessAgent had the same pattern and got the same fix. ReconAgent preserves its phase finding-cap; cap is applied after collection rather than mid-stream so we can run tools concurrently. Verified by tests/test_agents_parallel.py: 5 mock tools that each sleep 0.5s complete in ~0.5s wall-clock instead of 2.5s. 658 tests pass.
Pre-launch security sweep with bandit found 8 HIGH severity issues. Triage: REAL FIXES: - engine/tool_installer.py: install_tool and install_tier had subprocess.run with shell=True and an f-string that interpolated sudo_password. CWE-78 command-injection if a password ever contained shell metachars. Switched to argv form with the password piped via stdin. - cli/menu.py: os.system(candidate + " --quiet") replaced with subprocess.run([candidate, "--quiet"], check=False). The path is currently a fixed location but defense-in-depth keeps a future user-driven path safe. ANNOTATED AS INTENTIONAL: - engine/scanners.py x5: httpx.AsyncClient(verify=False, ...) marked with # nosec B501 + a comment. Built-in scanners deliberately scan targets with potentially-broken SSL; cert validity is part of what we report on, not an abort condition. ALSO: - engine/sarif.py: tool.driver.version was hardcoded "0.8.0" (lying to GitHub Code Scanning about which ptai produced the SARIF). Now resolves from importlib.metadata at export time. - .gitleaks.toml: allowlist tests/ and engine/scanners.py so the test fixture JWT and the secret-pattern regexes don't fail a launch-blocking scan. gitleaks runs clean against full git history (94 commits) with this config. Verified: - bandit -r ... --severity-level high: 0 issues (was 8) - pip-audit on project venv: 0 known CVEs - gitleaks detect: 0 leaks - 658 tests pass.
See CHANGELOG for details. Three real bandit HIGH severity fixes: - tool_installer: shell=True + f-string sudo password (CWE-78) - cli/menu: os.system replaced with subprocess.run - SARIF tool version was hardcoded 0.8.0; now dynamic via importlib.metadata bandit HIGH: 8 -> 0. pip-audit: clean. gitleaks: clean. 658 tests pass.
Adds a new security workflow that runs on every push/PR with: - bandit at -lll -ii (HIGH severity, medium+ confidence) matching project baseline - pip-audit --strict against editable install - gitleaks with .gitleaks.toml config Also fixes em dash in release.yml comment.
Adds 61 tests across 41 MCP tools, hitting all error paths, profile resolution flows, agent delegation, browser action dispatch, evidence filesystem walks, campaign creation, and intensity changes. Full suite: 719 passed, 19 skipped (no regression).
Adds 25 tests for detect_os, has_go/has_pip, audit_tools tier filter, print_audit table rendering, install_tool across apt/go/pip/snap/manual methods (success and failure), and install_tier with sudo, apt-update, and skip-tools paths. All subprocess calls mocked. Full suite: 744 passed, 19 skipped.
Adds 44 tests for SecurityTool._build_command argument-filter branches (blocked keys, allowed_args, regex, bool, shell injection), execute() with cache hit, cache miss + write, parse_output dispatch, FileNotFoundError, generic Exception, and configure_cache state plus _persist_tool_result paths. Full suite: 814 passed, 19 skipped.
Adds 40 tests for BaseAgent setters, _check_scope branches, the _execute_tool_call dispatcher (analyze_findings, store_finding success + dedup + invalid severity, builtin/security routing, unknown), the _run_builtin_scanner success/timeout/exception/scope-violation paths, the _run_security_tool registry-miss/not-installed/scope/timeout/retry paths and auth-arg injection, run_tool_loop deterministic fallback and LLM-driven termination, think() first-call vs mid-loop failure, and _truncate edge cases. Specialist agents inherit; this lifts the coverage floor for the whole agent family. Full suite: 854 passed, 19 skipped.
Two tests gated on ANTHROPIC_API_KEY + PTAI_E2E_LIVE=1: - direct provider completion (proves API reachable + auth works) - BaseAgent.think round-trip (proves the LLM tool-loop wiring is alive) Skipped in default test runs and in CI. Run before recording the demo video and before each release: this is the only proof we have that the flagship LLM-driven path actually works against a live API. Documented in docs/launch-playbook.md T-14 checklist.
Splits test and lint jobs: - test runs across 6 cells (ubuntu/macos/windows × py3.10/py3.12) with fail-fast disabled so a single OS regression does not mask others - lint + mypy run once on ubuntu/py3.12 (no value in repeating across OSes) Switched from uv venv + activate (bash-only) to plain pip + actions/setup-python caching, which works identically on all three OSes. Local: 854 passed, 21 skipped.
Adds a global cap that fans out across every PENTEST_AI_MAX_FINDINGS_* env var the recon agent honors. Default 0 keeps the per-phase defaults (200 for subdomain enum, 500 for port scan, etc.). Per-var env settings still win when set explicitly via setdefault. Closes Phase 13.3 from the public-launch plan. Full suite: 856 passed, 21 skipped.
Adds 17 tests for APISecurityAgent, CredentialTesterAgent, VulnScannerAgent, PrivescAdvisorAgent, SocialEngineerAgent. Each has the same shape: LLM unavailability falls back to a deterministic tool-driven path. Tests exercise both branches plus tool-execute exceptions. Coverage: api_security 27→95%, credential_tester 31→100%, vuln_scanner 31→100%, privesc 29→100%, social_engineer 50→100%.
Adds 31 tests for _same_host, _normalize_url, _FormLinkParser link/form extraction, _extract_endpoints param parsing + destructive-param skips, _finding truncation, _send_probe GET/POST/exception, _probe_sqli error-marker + no-marker + probe-failure paths, _probe_xss reflection detection, _probe_cmdi id-marker detection, run_authenticated_scan session vs authenticator dispatch, and _crawl non-html / skip-substring / http-error handling. All httpx calls mocked, no live network.
- engine/auth_handler.py: AuthCredentials.from_dict / from_cli_args branches and build_auth_args across nuclei, sqlmap, ffuf flag mapping - engine/llm/providers/anthropic.py: complete() with simple message, with system+tools, and round-trip translating tool_calls and tool_results - agents/report/renderer.py: risk_level CRITICAL/HIGH/MEDIUM/LOW branches plus render_pdf weasyprint dispatch Global coverage: 77%.
cli/auth.py: 33 tests for api_base override, store/load_api_key with env-priority and OSError handling, key_source resolution, validate_key_remote success/invalid/HTTP-error/non-200 paths, ingest_engagement no-key/success/ 402 quota/unparseable-body/error/network-error paths, and mask_key. cli/mcp_setup.py: platform detection, command/entry shape, generate_config_snippet, detect_installed_clients no-config edge case, inject_config dry-run/write/missing.
- Ollama: complete with simple/tools+JSON-arg variants, base_url normalization - OpenAI: complete simple, tool_calls JSON parse + bad-JSON fallback, _format_message with tool_calls and tool_call_id - PoCAgent: validate_finding not-found, static-poc injection/unknown, validate_all severity filter - AD/Wireless/Cloud: deterministic and LLM-unavailable fallback paths Global coverage: 78%.
Tests SIGINT handler install/uninstall idempotency and graceful non-main-thread degradation, all REPL commands (resume, step, abort, skip, inject, help, blank, unknown, EOF, KeyboardInterrupt), and _inspect findings/chain/summary branches with truncation. Global coverage: 79%.
- cli/credential_resolvers/aws_sm: empty ref, plain string secret, JSON-field extraction, missing field, binary secret, empty secret, GetSecretValue failure, missing-boto3 paths - agents/recon._env_int: default, valid override, invalid value, zero/negative - agents/report/renderer.write_report: html-only, html+pdf with RuntimeError fallback, html+pdf success Global coverage: 80% (target hit). Test count: 1016 passed, 21 skipped.
The pre-push hook runs against a venv without ptai[tracing] extras, so the OTLP exporter and console-exporter tests (which import unconditionally inside Tracer._init_if_needed) need importorskip guards. Result: 26 pass with [tracing] installed, 20 pass + 6 skip without.
Pre-push secret scanner flagged AKIA[...]EXAMPLE in test fixtures. The string is AWS's documented example (not a real credential), but the regex match is reasonable signal. Replace with a clearly-fake placeholder; the parsers match on the surrounding 'secret:' / 'rule:' phrases, not the key format itself.
security.yml: gitleaks-action SHA cb71... no longer resolves on GitHub. Replace with the latest pinned tag v2.3.9 (ff98106e). Windows CI matrix surfaced pre-existing platform issues (the Linux-only CI never ran them). Two fixes: 1. cli/auth_profiles._check_perms now skips the 0o600 enforcement on Windows. Windows uses ACLs, not Unix mode bits, so st_mode reads back 0o666 regardless. Hardening on Windows requires DACLs out of band; the code now documents that. 2. Tests that assert specific Unix mode bits (test_save_creates_0600_file, test_load_refuses_world_readable_file, test_store_and_load_api_key) are skipped on Windows. 3. test_expanduser now sets USERPROFILE alongside HOME so '~' resolves correctly on Windows. Local suite: 1016 passed, 21 skipped.
- test_load_refuses_group_readable_file is the second perm-mode test that needs the Windows skip mark (paired with the world-readable one) - pip-audit failed on CVE-2026-3219 in the runner's pre-installed pip; upgrade pip in the venv before audit so the project's own deps are what gets audited
mcp_server/server.py:758 has called agent.get_cookies(url) since the browser_inspect tool was wired in 0.10.x, but the method was never implemented. The 'cookies' action would have raised AttributeError if anyone selected it. mypy --ignore-missing-imports caught this once the new CI matrix included a type-check step on Python 3.12. Implementation mirrors extract_forms: open a page, navigate, read cookies via the Playwright BrowserContext API.
engine/tracing.py:163-178 wrapped both span-start and the user-yield
in a single try/except. When user code raised inside the span:
1. Inner except marked status=error and re-raised
2. Outer except caught the same exception
3. Outer except yielded a NoopSpan as a 'fallback'
4. The contextmanager generator had already yielded once, so the
second yield raised 'generator didn't stop after throw()'
Refactor splits the two concerns:
- Outer try wraps only start_as_current_span() init (real failure
case where NoopSpan fallback is correct)
- Once we yield the wrapper, user exceptions propagate cleanly to
the caller through the with-statement exit machinery
Adds test_span_user_exception_propagates_cleanly to lock this in.
Removes the apologetic NOTE that documented the original bug.
Full suite: 1017 passed, 21 skipped.
- README: add Community section linking to GitHub Discussions - docs/launch/launch-checklist.md: go/no-go gate document - docs/launch/soc2-kickoff.md: vendor matrix + SOC2 Type I kickoff plan - docs/launch/community-channel.md: GitHub Discussions decision record - docs/launch/demo-script.md: 90s demo video beat sheet + toolchain - docs/launch/testimonial-outreach.md: outreach templates + tracker - docs/launch/install-matrix.md: cross-platform install verification matrix
Phase 4. Two more probes gain OOB code paths gated on engagement_id in session extras: engine/probes/web/xxe_upload.py — after the existing three-step in-band probe (baseline, file-disclosure, billion-laughs DoS), an external-DTD + SVG-DTD payload fires per candidate path under both shapes (raw application/xml + multipart/form-data). pending_oob row carries the critical CWE-611 finding template. engine/probes/web/stored_xss.py — after the existing POST-then-GET echo confirmation, three curated stored-XSS OAST payloads (img onerror fetch, sendBeacon, attribute-break) fire as the first COMMENT_FIELDS field value under both JSON + form-urlencoded shapes. pending_oob row carries the high CWE-79 finding template. Confirms when a victim's browser later renders the stored comment and the payload calls back to the collaborator. Blind-RCE wiring deliberately deferred — ptai has no general command-injection probe today (the closest, deserialization + nextjs_rsc_rce, are CVE-specific). The Phase-4 payload library already ships rce_oob_payloads() so a future command-injection probe gets OOB for free. 86 / 86 in the cross-probe + OOB sweep (SSRF + SQLi + XXE + stored-XSS + OOB client/registry/payloads + poll_oob).
Phase 4. CLI surface for the OOB collaborator. Three flags set the env vars the OOB registry reads: --oast-server URL -> PTAI_OAST_SERVER (default: https://oast.fun) --oast-token TOKEN -> PTAI_OAST_TOKEN (for self-hosted Interactsh) --no-oast -> PTAI_NO_OAST=1 (disable OAST entirely) Pentesters running on programs that forbid third-party collaborator infra (the PortSwigger Burp Collaborator policy is canonical) can now point ptai at a self-hosted Interactsh server in one command: ptai start http://target --oast-server https://oast.example.com --oast-token <T> Or turn OAST off where outbound DNS/HTTP to a collaborator isn't permitted at all: ptai start http://target --no-oast Flags carry through to MCP run_probe / poll_oob via the env-var seam; CLI agent-mode probes pick them up the same way once task 0xSteph#11 wires that path. 67 / 67 CLI tests still green.
Phase 4. Documents the dual-mode collaborator story for blind-vuln detection: encrypted-at-rest payloads with server-side metadata visibility, the PortSwigger Collaborator policy as the canonical "self-host on paid engagements" rule, and the --oast-server / --no-oast escape hatches. Section sits under Responsible Use because OAST has the same kind of operational consequence as scope enforcement: the user owns the decision about where callback data lands.
Closes Phase 4. Walks the full OOB loop in one test against a mock
Interactsh server stood up on a loopback port:
register_oob_probe -> mock-interactsh /register (RSA pubkey stored)
-> pending_oob row persisted
mock.queue_interaction(...) — simulates the target firing the payload
MCP poll_oob -> mock-interactsh /poll (returns real wire-format
AES-CTR ciphertext + RSA-OAEP-wrapped AES key)
-> decrypt -> find_pending_oob_by_full_id
-> add_finding + mark_pending_oob_matched
-> on-disk oob_interaction evidence artifact
Exercises exactly the encryption code path the real oast.fun server
uses (re-uses tests/test_oob_client.py::_fake_poll_response). Confirms
the artifact file content carries the queued interaction's source IP.
Phase 4 deliverable scoreboard:
audit deal-breaker 0xSteph#2 (blind vulns undetectable) -> CLOSED
346 / 346 tests in the cross-phase sweep
CHANGELOG entry summarizes the 10 commits comprising Phase 4 (#21-#30
plus the carved-out RCE-wiring follow-up).
Phase 5 (Caido / Burp / ZAP plugin) prereqs. Both surfaced as BLOCKING
by the cross-stream gap audit:
get_findings now accepts:
- url=<substring> case-insensitive LIKE match on the target column.
Lets a proxy plugin scope the Findings tab to the
URL the user is currently inspecting.
- since=<iso-ts> created_at >= comparison. Lets a polling Findings
tab fetch only what's new since the last refresh
instead of re-downloading the full list each tick.
No schema migration needed — the existing target and created_at columns
carry the data. Both filters default to None and combine cleanly with
the existing severity / status filters; backward-compatible.
health() MCP tool:
Liveness probe for plugin status indicators. Returns
{status, version, timestamp, uptime_seconds, active_engagements}
with zero side effects. Never raises — degrades to active=0 when the
DB is unreachable so the status indicator can show a clear "MCP up,
DB degraded" state instead of just "MCP unreachable."
Server start time anchored at module import via time.monotonic() so
uptime_seconds is meaningful even after the DB reconnects. Active
engagement count is a single COUNT(*) WHERE status='running' query;
sub-millisecond on any sane row count.
9 new tests (4 url/since combos, 4 health behaviors, 1 backward-compat).
314 / 314 across the MCP + findings_db + OOB + evidence + CLI sweep.
Phase 5 bridge. ptai's MCP server normally speaks JSON-RPC over SSE,
which is the right shape for Claude Code but finicky from a JVM/JS
HTTP client. The new mcp_server/rest.py module mounts four Starlette
routes via FastMCP's @mcp.custom_route() decorator that delegate
straight to the existing tool functions:
GET /v1/health -> health()
GET /v1/findings -> get_findings(engagement_id=, severity=,
status=, url=, since=)
POST /v1/http_request -> http_request({engagement_id, method, url,
headers?, body?, json_body?,
auth_profile?,
allow_destructive?})
GET /v1/evidence -> get_evidence(engagement_id=, finding_id=,
include_content=, as_curl=)
No duplicated logic — each route is just unpack-args, call-tool,
JSONResponse-wrap. LocalAuthMiddleware enforces Bearer + Host: header
identically on /v1/* as it does on /sse, so REST consumers (the
Caido / Burp / ZAP plugins shipping out of pentest-ai-extensions) get
the same DNS-rebinding-defense + token-auth posture as MCP consumers.
The plugin client at pentest-ai-extensions/caido/packages/frontend/
src/ptai/client.ts already targets these exact paths — v0.0.1 of the
plugin starts functioning end-to-end the moment this ships.
10 dedicated tests covering auth, host allowlist, all four routes,
JSON body parsing, error shapes. 324/324 across the full MCP +
findings_db + evidence + OOB + CLI regression sweep.
Task 0xSteph#11 — closes the carry-forward from Phase 1. ptai's standalone "ptai start" agent-mode path (cli/main.py:start, agent_mode=True) drives probes via engine.agents.handlers.registry_bridge, which builds its own aiohttp.ClientSession deep in the call chain. That session never carried the _ptai_extras shape the HTTP primitives chokepoint looks for, so every Phase 1+4 capability was MCP-path-only: - no evidence_artifacts on findings from the CLI path - --upstream-proxy / PTAI_UPSTREAM_PROXY had no effect on CLI runs - OOB-aware probes silently skipped their OAST code path Now the bridge populates session._ptai_extras with: - engagement_id (from WorkingMemory) — unlocks the OOB probes' register_oob_probe() call path - evidence_collector (rooted at PENTEST_EVIDENCE_DIR) — every HTTP call through the primitives chokepoint persists request + response bytes - proxy (from PTAI_UPSTREAM_PROXY if set) — Phase-2 stealth proxy passthrough now works for ptai start --upstream-proxy <url> After the probe returns, _attach_pending_evidence_to_findings(session, findings) drains the pending-evidence buffer and attaches the artifact summaries to every finding the probe emitted — same orchestrator-side auto-attach the MCP run_probe path does. Intensity-derived stealth knobs (ua_rotation, jitter_ms) deferred — WorkingMemory doesn't carry intensity today; small refactor follows when needed. The big-ticket items (evidence + OOB + proxy) all work on the CLI path now. Legacy AgentOrchestrator (--legacy-pipeline) path still TODO; that goes through agents/web/web_agent.py + agent-specific session creation. Not on the critical path since agent_mode is the default (--agent-mode/--legacy-pipeline). Tracked as follow-up. One new test in tests/test_registry_bridge.py asserts the extras attach + buffer-drain wiring works end-to-end. 58/58 in the registry_bridge + agent_loop + working_memory + handler sweep.
Full-suite run revealed two flakes in test_mcp_rest_adapter.py that passed in isolation but failed when another async-test file ran first in the session. Cause: the tests used the deprecated asyncio.get_event_loop().run_until_complete() pattern to seed the FindingsDB before driving TestClient. Under Python 3.13 + pytest-asyncio mode=AUTO, get_event_loop() raises 'no current event loop in thread' when a prior test closed the loop. Fix: convert both tests to @pytest.mark.asyncio + plain `await` for the seeding, then drive TestClient inside the same async test (TestClient spins its own loop internally — no conflict with the outer pytest-asyncio loop because we never wait on TestClient's response stream from inside the running coroutine; the calls are synchronous from Python's perspective). Pure test-hygiene change, no production code touched. 21/21 across test_mcp_rest_adapter + test_findings_db_get_findings_filters + test_oob_end_to_end + test_mcp_health + test_evidence_integration_e2e (the cross-file async-DB load that previously triggered the flake).
Headline release closing 3 of 4 deal-breakers from the pentester audit:
Phase 1 — Evidence bundle. Every finding carries verbatim request
+ response bytes, SHA-256 integrity hash, and a copy-pasteable
curl one-liner. SARIF gains DAST webRequest/webResponse so
GitHub Code Scanning renders the exchange inline.
Phase 2 — Real intensity=stealth. UA rotation across 7 curated modern
browser UAs, per-request jitter, upstream proxy passthrough
via --upstream-proxy / PTAI_UPSTREAM_PROXY.
Phase 4 — OOB collaborator. Interactsh client (RSA-OAEP + AES-CTR
wire format verified against upstream), curated per-DBMS /
per-engine payload library (SSRF, blind SQLi, XXE, RCE,
stored XSS, SSTI, Log4Shell), poll_oob MCP tool, OAST
payloads wired into ssrf_cloud_metadata + sqli_fuzz +
xxe_upload + stored_xss.
Plus the plugin-client foundation:
- MCP auth: per-install token file (~/.pentest-ai/mcp-token, 0600)
+ Host: allowlist for DNS-rebinding defense
- REST adapter at /v1/health, /v1/findings, /v1/http_request,
/v1/evidence — lets JVM/JS proxy plugins consume ptai without
SSE+JSON-RPC
- get_findings(url=, since=) + new health() MCP tool
- CLI agent-mode parity: ptai start probes now carry evidence the
same way MCP-driven probes do
39 commits since 0.15.3. Smoke verified against TaskFlow honeypot:
8/8 findings carry evidence_artifacts, 406/406 on-disk artifacts emit
valid curl reproducers, SARIF webRequest+webResponse populated, REST
/v1/* returns 200, CLI parity confirmed.
…ovider (0xSteph#12) Issue 0xSteph#12 reporters were hitting "agent_mode: NNN action handlers registered" then silent exit. The Ollama OLLAMA_HOST fix in 73fef36 addressed a real bug (factory.py read OLLAMA_BASE_URL instead of the canonical OLLAMA_HOST env var), but it didn't help these users — the CLI agent-mode driver (engine.agents.anthropic_agent.AnthropicAgent) hardcodes the Anthropic Messages API surface (client.messages.create) and is constructed unconditionally via `client = AsyncAnthropic()` at cli/main.py:729. So a user setting PENTEST_AI_LLM_PROVIDER=ollama and OLLAMA_HOST was: 1. Past _llm_key_present() (which honors ollama) 2. Hitting AsyncAnthropic() — no ANTHROPIC_API_KEY in env 3. First LLM call failing inside AnthropicAgent.decide_next_action, which catches the exception and returns Action(name="finish"), terminating the loop cleanly — under the spinner — with no surfaced error. Fix: validate before the Progress spinner starts. When agent_mode is active and PENTEST_AI_LLM_PROVIDER is set to something other than "anthropic", exit 4 with a clear message pointing the user at: - Running ptai over MCP (Path 1/2) — every provider works there - --no-llm for the deterministic wrapped-tools path - PENTEST_AI_LLM_PROVIDER=anthropic + ANTHROPIC_API_KEY Same gate also fires when PENTEST_AI_LLM_PROVIDER=anthropic is explicit but ANTHROPIC_API_KEY is missing (the existing _llm_key_present gate only catches the "no provider configured at all" case; this catches the "provider chosen, key forgotten" case). Two dedicated regression tests in tests/test_cli.py. 69/69 in the CLI sweep. Native multi-provider CLI agent-mode (so ollama/openai/litellm actually work on Path 3) tracked as a separate follow-up.
Supersedes the loud-failure guards from 6fc6d11. The previous fix told non-Anthropic users their setup was unsupported; this fix actually makes their setup work. Three changes wire ptai's CLI agent-mode through the existing provider-agnostic LLMClient factory so OpenAI / Ollama / LiteLLM users hit a functioning agent loop instead of a silent hang: cli/main.py Replace the hard-coded `AsyncAnthropic()` construction with `create_llm_client()`. The factory honours PENTEST_AI_LLM_PROVIDER, auto-detects the provider from whichever API key is set, and returns a unified LLMClient. Removed the now-redundant loud-failure guards added in 6fc6d11. engine/agents/anthropic_agent.py Duck-type the client in decide_next_action. If the client exposes .complete() it's the unified LLMClient — call it with LLMMessage objects and read resp.content directly. Otherwise fall back to the legacy client.messages.create() path so the 8 tests that pass MagicMock with messages.create stubbed keep working. The fallback is the only thing keeping the class name accurate for now; the rename to MultiProviderAgent (or similar) is a later cleanup. engine/llm/factory.py Auto-detect provider from available keys when PENTEST_AI_LLM_PROVIDER isn't set: ANTHROPIC_API_KEY -> anthropic, OPENAI_API_KEY -> openai, neither -> openai (fallback to current default). Closes the "I set OPENAI_API_KEY and got nothing" foot-gun poeylizn hit on the original issue thread. Verified end-to-end against a real local Ollama instance running qwen2.5-coder:7b: - Factory routes to OllamaProvider (correct base_url + model) - create_llm_client() wraps with cost-tracking; .complete() exposed - LLMClient.complete(...) round-trips through Ollama; got real reply - AnthropicAgent.decide_next_action uses the new branch, returns a real Action(name='probe.test', ...) — not finish-due-to-failure Tests updated: two replaced (test_start_agent_mode_uses_factory_for_*) to assert the factory routing for OpenAI and Ollama users. 104/104 across tests/test_cli + test_anthropic_agent + test_llm + test_agent_loop in scope. Native Anthropic SDK fallback path stays so existing tests + users with bare AsyncAnthropic clients keep working without changes.
Patch release with the real fix for issue 0xSteph#12 (silent exit in CLI agent-mode when PENTEST_AI_LLM_PROVIDER is non-Anthropic). 0.16.0 shipped an adjacent OLLAMA_HOST factory fix but missed the actual root cause; 0.16.1 wires CLI agent-mode through the unified LLM factory so OPENAI_API_KEY / OLLAMA_HOST / PENTEST_AI_LLM_PROVIDER=ollama users actually work on Path 3 instead of being silently ignored. Verified end-to-end against a live local Ollama before tagging.
…docs Issue 0xSteph#12 follow-up. poeylizn was on 0.16.1 pointing ptai at DeepSeek in the cloud and got a 404 because the openai-path factory still asked for gpt-4o regardless of what their endpoint served. The PENTEST_AI_MODEL env var existed but only the LiteLLM path honored it; the openai / anthropic / ollama paths used their own hardcoded defaults. Now all four provider paths honor PENTEST_AI_MODEL. Same recipe works for DeepSeek, Groq, Together AI, local llama.cpp / vLLM / LM Studio, and any other OpenAI-compatible endpoint. Also adds docs/llm-providers.md with concrete env-var recipes per provider (Anthropic, OpenAI + compatible, Ollama, LiteLLM with Azure / Bedrock / Vertex / OpenRouter examples), troubleshooting, and the --no-llm escape hatch. Linked from README Path 3. 5 new tests in tests/test_llm.py covering PENTEST_AI_MODEL across all four providers + explicit-arg-beats-env precedence. 108/108 across the touched paths. Verified live against running Ollama: PENTEST_AI_MODEL=qwen2.5-coder:7b routes correctly through the factory + the unified LLMClient + a real completion call.
…bodies Issue 0xSteph#12 follow-up. poeylizn reported a 400 from DeepSeek-cloud that read "Client error '400 Bad Request' for url '...'" with no upstream reason. Turned out his actual problem was the model-name gap that 0.16.2 fixed (he resolved it by upgrading), but the unreadable error exposed a real engineering miss: all four providers were calling httpx response.raise_for_status() which throws away the response body. Custom-base-URL users (DeepSeek cloud, Groq, Together AI, vLLM, etc.) got no diagnostic when their endpoint rejected a request, even when the upstream's JSON error body would have told them exactly what to fix. 0.16.3 changes: engine/llm/providers/openai.py — new LLMHTTPError class. When response.status_code >= 400, raise LLMHTTPError with endpoint URL, model name, status code, and the upstream response body (truncated to 2 KB). engine/llm/providers/anthropic.py — same fix, reuses LLMHTTPError. engine/llm/providers/ollama.py — same. docs/llm-providers.md — troubleshooting section updated to walk through the new error format, common DeepSeek 400 causes (model rename, missing /v1 suffix, key whitespace), and a no-network factory-config preflight (python -c "from engine.llm.factory ..."). End-to-end verified live: PENTEST_AI_LLM_PROVIDER=openai pointed at Ollama's /v1/ with a nonexistent model now raises LLMHTTPError: OpenAI-compatible endpoint at http://localhost:11434/v1 returned HTTP 404 for model='this-model-does-not-exist': {"error":{"message":"model 'this-model-does-not-exist' not found", "type":"not_found_error",...}} instead of the old opaque "Client error '404' for url '...'". Same endpoint with a valid model name completes normally — the openai provider's custom-base-URL path was never broken, just unhelpful when the upstream said no.
Two related cleanups in one commit. Both have been red on every main
push for the past day or two; the per-commit pentest-ai workflow's
lint job is catching real quality drift.
1. Ruff (34 errors -> 0):
- 26 auto-fixed by `ruff check --fix`: unused imports (F401),
unsorted import blocks (I001), quoted annotations (UP037).
- 5 manual SIM105 suppressible-exception sites in probe-edge
code (sqli_fuzz, stored_xss, xxe_upload) intentionally use
try/except/pass to keep probe edges resilient against arbitrary
network failures; tagged with `# noqa: SIM105`.
- 2 auth_local.py OSError suppressors converted to
contextlib.suppress (clean fit, narrow exception).
- 1 F811 shadowing in mcp_server/server.py (local `rest` var
collided with `from mcp_server import rest`); renamed local
to `body`.
- 2 F841 unused-variable cases in tests (`e2`, `old`); removed
the bindings.
2. AI-typography cleanup in user-facing docs:
- Replaced em-dashes (74 total) with hyphens across CHANGELOG.md,
README.md, docs/llm-providers.md.
- Same pass dropped Unicode arrows (->), ellipses (...), and
en-dashes from this session's additions. Older README emoji,
badges, and intentional legacy math symbols (>=, x) preserved.
Mypy was already clean. 100 tests across the touched paths still
green after the auto-fixes.
The v0.16.3 release.yml test job died on a TypeError: three existing Anthropic provider tests mocked `response` with bare MagicMocks that didn't set status_code, so the new `response.status_code >= 400` check raised against the mock instead of returning False. Result: 0.16.3 the tag exists in git, but 0.16.3 the wheel never reached PyPI. This release re-cuts the same LLM-provider-error-body fix on top of the fixed test fixtures + the ruff cleanup + the em-dash sweep that landed in between. Six tests now set fake_response.status_code = 200 explicitly; 40 / 40 in the touched files green. User-visible behavior is identical to what 0.16.3 was supposed to ship: LLMHTTPError raised on >=400 with endpoint, model, status code, and the upstream's response body (truncated 2 KB) so DeepSeek / Groq / vLLM / Together-AI users can actually diagnose 4xx instead of seeing httpx's opaque one-liner.
… loudly v0.17.0 Change 1. Closes the silent-zero-findings failure mode from issue 0xSteph#12 where DeepSeek emitted free-form text or made-up handler names and the loop accepted the parser's fallback finish without warning. - ResponseQuality enum (VALID / UNPARSEABLE / UNKNOWN_HANDLER) classified by AnthropicAgent._parse_action; agent stashes self.last_quality so the LLMAgent Protocol signature stays unchanged. - WorkingMemory.consecutive_bad_responses counts non-VALID responses in a row; resets on a clean response. - LoopConfig.bad_response_threshold (default 3, env-overridable via PTAI_AGENT_FALLBACK_THRESHOLD) trips the deterministic fallback with exit_reason="llm_non_cooperative". - Early-finish detection: finish at iter < min_iterations_before_finish with 0 findings now runs the deterministic fallback (exit_reason "llm_finished_too_early"). Previously the loop silently continued, which is exactly how poeylizn ended with 0 findings. - min_iterations_before_finish now env-overridable via PTAI_MIN_ITERATIONS_BEFORE_FINISH. - CLI post-engagement summary prints a NOTE warning when exit_reason is one of the non-cooperative or fallback outcomes, so users know the LLM was the bottleneck (or the source) of any findings. Tests: ResponseQuality classification for valid/unparseable/unknown handler/swallowed-exception paths; threshold trip + counter reset; early-finish-with-zero vs early-finish-with-findings divergence; env override behaviour.
…nge 2) Closes the second failure mode behind issue 0xSteph#12: factory returns a client that LATER fails mid-loop with no diagnostic. Now bad configs fail loud at startup with a concrete next-step block, and 'ptai doctor' lets users sanity-check their config without running a scan. Provider.validate() (LLMClient Protocol addition): - Anthropic: GET /v1/models with key - OpenAI / OpenAI-compat: GET {base_url}/models; on 404 fall back to a 1-token complete() probe (covers vLLM, llama.cpp, private deployments that don't ship /models) - Ollama: GET /api/tags + check the configured model is in the tag list, accepts both 'llama3.1' and 'llama3.1:latest' style names - LiteLLM: 1-token complete() (only universal preflight for 300+ backends) - CostTrackingLLMClient passes validate() through Factory: - Auto-detect chain now probes OLLAMA_HOST (or default localhost:11434) for /api/tags via a 500ms sync httpx call when no cloud API keys are set. Closes as8ASd3's report where OLLAMA_HOST was set but PENTEST_AI_LLM_PROVIDER wasn't, so the factory fell through to Anthropic with no key. - New async validate_client(client) helper with hard timeout from PTAI_FACTORY_VALIDATE_TIMEOUT_MS (default 2000). On timeout: warn and continue (best-effort preflight). On LLMUnavailableError: propagate (auth failures always fail loud). Skippable via PTAI_FACTORY_VALIDATE=0 for one release. CLI: - 'ptai start --agent-mode' now awaits validate_client() after factory construction; exits 1 with the next-step block on auth failure. - New 'ptai doctor' command prints resolved provider/model/endpoint, env-var surface (keys masked), storage paths, and a live preflight result. Exits 0 on success, 1 on validate failure. Distinct from cli/menu.py:_doctor (install-audit shell wrapper). Tests: - tests/test_llm_factory_validate.py: per-provider validate paths (200 / 401 / 404 fallback / connection-refused / model-missing); validate_client timeout-warns-and-continues + auth-propagates + env-skip behavior. - tests/test_llm_factory_autodetect.py: OLLAMA_HOST probe drives the auto-detect path; explicit provider wins; cloud keys still win first. - tests/test_cli_doctor.py: doctor exit codes, section presence, key masking, Ollama endpoint surfacing. - tests/test_agent_mode_cli.py: now sets PTAI_FACTORY_VALIDATE=0 so the fake API key in the test doesn't 401 against real Anthropic.
…ge 4a)
Adds scripts/diag_post_engagement_hang.py: a SIGUSR1-driven faulthandler
wrapper that lets us inspect which thread is keeping the interpreter
alive after "Engagement complete". No production code changes.
The script can either exec a ptai command directly (and forward the
PID + signal recipe to stderr) or print the recipe for an already-
running PID. When kill -USR1 hits the registered handler, every
thread's stack dumps to stderr; the named thread IS the bug.
Use:
python scripts/diag_post_engagement_hang.py --exec \
"ptai start http://localhost:3000 --no-llm --no-oast"
# in another terminal once the spinner stops moving:
kill -USR1 <pid>
Full investigation notes (which threads, ranked candidates, the
empirical isolated httpx reproducer, recommended fix shape for 4b)
live in the sibling notes repo at pentest-ai-notes/ per the
.gitignore rule excluding notes/ from this public tree.
Change 4b will add the targeted close() call(s) the harness pointed
at + a watchdog backstop, gated by PTAI_FORCE_EXIT_DISABLE /
PTAI_FORCE_EXIT_SECS.
…ng (v0.17.0 Change 4b) Closes the post-engagement hang reported by poeylizn on issue 0xSteph#12. Root cause (per Change 4a investigation): The LLM client (an httpx.AsyncClient inside each provider) is created in cli/main.py:_run_engagement() but never aclose()'d. Once any request has been made through it, the connection pool keeps the interpreter blocked on threading._shutdown waiting for non-daemon transport machinery to wind down. That's the lock.acquire() traceback poeylizn posted. Targeted fix: - Hoist llm_client out to function scope so the finally block can see it. - Add `await llm_client.close()` to the existing finally chain at cli/main.py:828, right next to the established `await db.close()` and cache.close() pattern. No new shutdown framework. Watchdog backstop (env-gated): - New _arm_force_exit_timer() helper: daemon thread that sleeps for PTAI_FORCE_EXIT_SECS (default 5) then calls os._exit(0) after logging any still-alive non-daemon threads. Catches future regressions where some new code path holds a resource we haven't seen yet. - Armed at the end of `start` in both code paths (the --ci early-return and the panel-then-sync path). - Skippable via PTAI_FORCE_EXIT_DISABLE=1 so we can verify the root-cause fix is what's doing the work (not the watchdog masking a regression). Tests (tests/test_cli_start_clean_exit.py): - _arm_force_exit_timer: spawns daemon thread; honours DISABLE=1; invalid PTAI_FORCE_EXIT_SECS doesn't crash; SECS=0 short-circuits; end-to-end subprocess test confirms os._exit(0) fires after the configured delay. - Static-contract guards: the llm_client init + close() and the _arm_force_exit_timer call sites must stay in cli.main. Subprocess-against-honeypot integration test is deferred to the Change 3 CI matrix (next commit) where it runs against a real Juice Shop sidecar with --no-llm. That cell will assert exit_reason + exit-within-N-seconds end-to-end.
Closes the loop on future issue-12-shaped reports where the reporter was N versions behind a release that already fixed their bug. Every 'ptai start' and 'ptai doctor' run now nags if a newer stable is on PyPI, modelled on Claude Code's update-available banner. cli/_version_check.py (new): - maybe_nag(current, *, console, deadline_ms=200) - synchronous entry, never raises, never blocks more than the deadline. - Daemon-thread worker hits the PyPI JSON endpoint with a 1s hard timeout. Foreground polls a queue with deadline_ms. If the worker doesn't beat the deadline, the nag is skipped THIS run; the worker still writes the cache so the NEXT run benefits. That's how we get "never blocks startup" while still making progress. - Cache: ~/.pentest-ai/version-check.json with 24h TTL. Same dir as findings.db and the evidence dir (no new top-level state location). - Skips on: PTAI_SKIP_VERSION_CHECK=1, cache fresh, PyPI unreachable, PyPI returns non-200, the current local version reads as 'unknown' (dev install without metadata). - PTAI_VERSION_OVERRIDE - test-only knob in the same env-gate pattern as the rest of v0.17.0. Lets users manually trigger the nag for verification without rebuilding. - Pre-releases (a/b/rc/dev/alpha/beta) and yanked releases excluded when picking 'latest stable'. cli/main.py wires it into exactly two call sites: - 'ptai start' before the AUP gate so the user sees Update-available ahead of the engagement banner. - 'ptai doctor' right after the version header. Tests (tests/test_version_check.py, 11 cases): - happy path: newer stable -> nag with the pipx upgrade line - same version -> no nag - PyPI 500 / OSError -> no nag, no exception - PTAI_SKIP_VERSION_CHECK=1 -> zero HTTP calls - PTAI_VERSION_OVERRIDE wins over the importlib.metadata value - fresh cache (<24h) -> zero HTTP calls - stale cache + 5s-slow PyPI mock -> returns under 500ms, no nag, worker still completes in background - pre-releases not picked as 'latest' - yanked releases not picked as 'latest' - unknown local version -> silent skip All tests use mocks; the module never makes a real network call during pytest. The cache file is redirected into tmp_path per test so the user's real ~/.pentest-ai/version-check.json stays untouched.
…nge 3) Closes the CI coverage hole that let issue 0xSteph#12 ship: the release gate was only testing Claude-driven Juice Shop, the easiest possible diagonal. The build can now only go green if both: - ollama cell: the agent loop drives Juice Shop via a local qwen2.5-coder:7b sidecar and emits at least 1 finding. Catches the silent-zero-findings outcome directly. - deterministic cell: PTAI_NO_LLM=1 path runs cleanly. Confirms the fallback Change 1 triggers actually produces a valid engagement. Both cells share the same Juice Shop service container, same lifecycle test file, same exit-reason allowlist. fail-fast=false so a single-cell flake doesn't mask the other. The plan-prescribed 4x2 provider x target matrix is intentionally NOT in scope. One non-Anthropic cell catches the regression class we have evidence of; expanding is a follow-up once we have baseline data. Anthropic is not added to the matrix - it would require a paid CI secret and the existing path is already exercised by maintainer test runs. Ollama model layer cache (per Gap 4): - actions/cache step caches ~/.ollama/models keyed ollama-models-qwen2.5-coder-7b-v1. -v1 is a manual cache-bust knob. - Cache restore turns the 4.7 GB pull instant on subsequent runs; cold-cache runs still work, just slower. - timeout-minutes up from 30 to 40 to absorb the first cold pull. CLI: - _ci_print's engagement_complete event now includes exit_reason from WorkingMemory so the matrix assertion can gate on it. Tests: - tests/test_engagement_lifecycle_e2e.py:test_matrix_cell_exits_cleanly_against_juiceshop is a matrix-only test (skips unless PTAI_E2E_MATRIX_CELL is set). Spawns ptai start --ci as a subprocess against Juice Shop, streams stdout, parses the engagement_complete JSON line, asserts: 1. process exits within 30s of the JSON banner (Change 4b guarantee) 2. exit_reason in {finished, coverage, deterministic, llm_non_cooperative, llm_finished_too_early} 3. Ollama cell only: total_findings >= 1 Deterministic cell is allowed 0 findings because some probes legitimately don't trigger against Juice Shop. Local-skip semantics: the test self-skips when PTAI_E2E_MATRIX_CELL is unset, so unit-test pytest runs aren't affected. Set the env locally to run it against an Ollama instance + a Juice Shop container.
…rage gap)
Documents all five v0.17.0 Changes ahead of the version bump:
- Change 1: garbage + give-up LLM detection -> deterministic fallback
- Change 2: ptai doctor + factory-time validate() + Ollama auto-detect
- Change 3: 2-cell release-e2e matrix (ollama + deterministic) with
model layer cache
- Change 4a: SIGUSR1 hang-investigation harness
- Change 4b: LLM client close in engagement finally + watchdog backstop
- Change 5: PyPI version-check startup nag
Steve cuts the actual version bump + tag.
…ama3.1 default Live test surfaced this on the v0.17.0 branch: with OLLAMA_HOST set, no cloud key, no PENTEST_AI_MODEL set, the factory was picking ollama + llama3.1 (the static default) regardless of what the user actually had pulled. Validate then failed loud with 'ollama pull llama3.1', which is the wrong remediation for a user who has qwen2.5-coder:7b (or anything else) already pulled. The v0.17.0 plan called for 'first model in the tag list' in the auto-detect branch; the original implementation skipped that step. This commit adds _ollama_first_model() and uses it when both 'model' and 'env_model' are empty. Falls back to 'llama3.1' only if the probe fails. Explicit PENTEST_AI_MODEL still wins. Verified live: doctor against an Ollama with only qwen2.5-coder:7b pulled now reports 'Resolved provider: ollama, Model: qwen2.5-coder:7b' and validates OK.
…strator escalation (v0.17.0 Change 1c+1d)
Closes the buyer-blocker that local Ollama tests against the honeypot
surfaced: qwen2.5-coder:7b emits perfectly valid JSON, picks real
registered handlers like meta.set_auth, but its action choices are
unhelpful and the engagement ends with zero findings against a target
that has 63 findings via the orchestrator path.
Two layers stacked:
Change 1c - agent_loop safety net:
When run_agent_loop ends max_iterations or coverage with zero
findings AND fallback_to_deterministic=True, run the existing
_run_deterministic_fallback (probe.* with {} args) and switch
exit_reason to the new "llm_unproductive". Catches per-loop
outcomes that the per-response quality check doesn't catch.
Change 1d - cli/main.py orchestrator escalation (the real fix):
After run_agent_loop returns, check db.get_findings for the
engagement. If still zero, escalate to AgentOrchestrator on the
SAME engagement_id. The orchestrator runs the proper phase
pipeline (recon -> discovery -> probes with real args -> chain
-> validate), which is what produces real findings against the
honeypot. agent_loop's internal fallback alone isn't enough -
it calls probe.* with {} args which produces nothing without
recon-discovered endpoints.
exit_reason is tagged "<original>+orchestrator_escalation" so the
user-facing summary tells them which paths ran. The CLI summary
surfaces a clear NOTE explaining the LLM was unproductive and the
orchestrator took over.
Live verification (qwen2.5-coder:7b against tests/honeypot/, 4 agent
iterations then escalation):
- exit=0 (clean), elapsed 193s
- 63 findings: 13 critical, 6 high, 2 medium, 4 low, 38 info
- exit_reason="llm_unproductive+orchestrator_escalation"
- escalation warning logged with the original exit_reason
Tests:
- tests/test_agent_loop.py: three new tests for Change 1c
(safety net fires; counter-case with pre-existing findings;
disabled when fallback_to_deterministic=False).
- tests/test_agent_mode_cli.py: db.get_findings now returns a
non-empty list so the escalation path doesn't fire in this
test (which is verifying agent-mode dispatch, not escalation).
- tests/test_engagement_lifecycle_e2e.py: allowed-exit-reason set
expanded with "llm_unproductive" (the matrix CI cell now
accepts this outcome).
- tests/test_cli_start_clean_exit.py: thread-name asserts switched
to count-based to fix order-sensitivity.
Closes issue 0xSteph#12 (silent-exit + post-engagement hang + zero-findings against vulnerable targets). Five plan-Changes + the live-test additions (cooperative-but-unproductive safety net, orchestrator escalation, auto-detect-first-pulled-model fix). Live-verified end-to-end with Ollama qwen2.5-coder:7b against the local honeypot: 63 findings via the escalation path, clean exit in 193s. See CHANGELOG [0.17.0] for the full surface.
v0.17.0 tagged but blocked from PyPI publish by the new matrix CI test
itself, not by any production code regression. Two test bugs:
1. Deterministic cell uses --no-llm (legacy AgentOrchestrator path)
which doesn't set exit_reason; the assertion rejected None.
2. Ollama cell hit the 900s per-test timeout because qwen2.5-coder:7b
on the GitHub Actions runner (no GPU) is slow + the orchestrator
escalation adds another ~5min on top of the agent loop.
Fixes (test-only, no production code touched):
- Allow exit_reason=None for the deterministic cell.
- Strip '+orchestrator_escalation' suffix from exit_reason before
checking the allowlist (Change 1d adds the suffix when escalation
fires).
- Cap --agent-max-iter at 3 for the ollama cell - the escalation
produces findings anyway, so more iterations just burn time.
- Per-test timeout bumped to 1800s on the matrix test specifically;
the 900s file-wide default stays for the original lifecycle tests.
The v0.17.0 production fixes (silent-exit, hang, orchestrator
escalation, doctor, version nag) are unchanged. This release ships
the same code with a CI test that actually passes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds FAQ covering features, paths, MCP tools, benchmarks, tiers, help.