Skip to content

Add Codex CLI runtime support#72

Merged
AbirAbbas merged 13 commits into
Agent-Field:mainfrom
ivasuy:codex-provider-support
May 11, 2026
Merged

Add Codex CLI runtime support#72
AbirAbbas merged 13 commits into
Agent-Field:mainfrom
ivasuy:codex-provider-support

Conversation

@ivasuy
Copy link
Copy Markdown
Contributor

@ivasuy ivasuy commented May 10, 2026

Summary

Adds conservative Codex CLI runtime support to SWE-AF without changing the existing default runtime or replacing Claude/OpenCode/HAX behavior.

This lets both SWE-AF planner and fast agents run through AgentField's Codex provider using either:

  • ChatGPT subscription auth via local codex login credentials mounted from ~/.codex.
  • OpenAI API-platform billing via OPENAI_API_KEY.

What Changed

Runtime/provider support

  • Added shared runtime-to-provider mapping for claude_code, open_code, and codex.
  • Extended planner config validation to accept runtime: "codex".
  • Extended fast-agent config validation to accept runtime: "codex".
  • Routed planner and fast harness calls through the shared provider adapter.
  • Preserved existing fast fallback behavior: unknown runtimes still fall back to opencode.

Codex structured-output support

  • Added a small compatibility patch for AgentField's Codex provider.
  • Uses Codex CLI native structured output flags:
    • --output-schema
    • --output-last-message
  • Writes AgentField's schema file before invocation so Codex can validate final responses.
  • Replaces AgentField's generic Write-tool schema suffix with Codex-specific final-JSON instructions.
  • Converts Pydantic JSON schemas into Codex-compatible strict schemas.
  • Recurses into nested $defs / definitions so schemas with referenced nested models are accepted by Codex.

Docker and compose support

  • Installs @openai/codex in the Docker image.
  • Adds a lightweight codex wrapper supporting SWE_CODEX_AUTH_MODE:
    • auto: use OPENAI_API_KEY when present, otherwise local Codex login.
    • chatgpt: force ChatGPT-login mode by unsetting OPENAI_API_KEY for Codex.
    • api_key: require OPENAI_API_KEY.
  • Mounts host ~/.codex into both swe-agent and swe-fast containers.
  • Propagates SWE_DEFAULT_RUNTIME, SWE_DEFAULT_MODEL, SWE_CODEX_AUTH_MODE, and OPENAI_API_KEY where needed.
  • Adds .env loading to swe-fast so it behaves consistently with swe-agent.

Documentation

  • Updates .env.example, README, deployment docs, architecture docs, contributing docs, and skill docs.
  • Documents supported runtimes: claude_code, open_code, codex.
  • Documents ChatGPT subscription setup:
    • Run codex login on the host.
    • Set SWE_CODEX_AUTH_MODE=chatgpt or use auto with no OPENAI_API_KEY.
  • Documents API-key setup:
    • Set SWE_CODEX_AUTH_MODE=api_key.
    • Set OPENAI_API_KEY.
  • Adds planner and fast examples using runtime: "codex" and models.default: "gpt-5.3-codex".

Files Changed

Runtime/config:

  • swe_af/runtime/providers.py
  • swe_af/runtime/__init__.py
  • swe_af/runtime/codex_harness_patch.py
  • swe_af/execution/schemas.py
  • swe_af/fast/schemas.py
  • swe_af/reasoners/execution_agents.py
  • swe_af/reasoners/pipeline.py
  • swe_af/execution/_replanner_compat.py
  • swe_af/fast/planner.py
  • swe_af/fast/app.py
  • swe_af/fast/__init__.py
  • swe_af/reasoners/__init__.py

Docker/config/docs:

  • Dockerfile
  • docker-compose.yml
  • docker-compose.local.yml
  • requirements-docker.txt
  • .env.example
  • README.md
  • docs/ARCHITECTURE.md
  • docs/CONTRIBUTING.md
  • docs/SKILL.md
  • docs/deployment.md

Tests:

  • tests/test_model_config.py
  • tests/test_runtime_provider_routing.py
  • tests/test_codex_harness_patch.py
  • tests/test_dockerfile.py
  • tests/fast/test_app.py
  • tests/fast/test_docker_config.py
  • tests/fast/test_fast_init_executor_planner_verifier_routing.py

Audit Trail / Bugs Found During Validation

1. Docker agents exited with code 132 on Linux/aarch64

Observed both swe-agent and swe-fast start, then exit immediately with code 132.

Root cause:

  • cryptography==48.0.0 crashed with SIGILL while AgentField imported Ed25519 for DID registration.

Fix:

  • Pin Docker runtime dependency to cryptography<46 in requirements-docker.txt.

Validation:

  • Direct Ed25519 import stopped crashing.
  • Both containers now stay up and report healthy.

2. swe-fast did not load .env

Observed swe-agent had env_file: .env, but swe-fast only used shell-substituted environment values.

Risk:

  • Local users could set values in .env and have planner work while fast silently missed equivalent settings.

Fix:

  • Added env_file: .env to swe-fast.

3. Codex was invoked but structured output fell back

Initial smoke test proved Codex was triggered, but fast planner returned deterministic fallback:

{
  "fallback_used": true,
  "rationale": "Fallback plan: LLM did not return a parseable result."
}

Root causes:

  • AgentField's generic schema suffix asked Codex to use a Write tool to create .agentfield_output.json, but Codex CLI was running read-only and did not have AgentField's Write tool.
  • Strict schema conversion did not recurse into $defs, so nested models still had optional/default fields omitted from required.
  • Codex CLI rejected the nested FastTask schema with invalid_json_schema.

Fixes:

  • Added Codex-specific final-JSON prompt suffix.
  • Used native Codex CLI --output-schema and --output-last-message.
  • Recursed strict schema normalization into $defs and definitions.

Validation:

  • Direct Codex CLI run with the real FastPlanResult schema now succeeds.
  • AgentField run through swe-fast.fast_plan_tasks now succeeds with fallback_used: false.

Local Validation

Docker stack:

control-plane: running
swe-agent: healthy
swe-fast: healthy

Codex auth inside both containers:

docker compose exec swe-agent codex login status
Logged in using ChatGPT

docker compose exec swe-fast codex login status
Logged in using ChatGPT

AgentField Codex smoke run:

target: swe-fast.fast_plan_tasks
ai_provider: codex
model: gpt-5.3-codex
status: succeeded
fallback_used: false

Smoke execution:

execution_id: exec_20260510_120451_aptvsu51
run_id: run_20260510_120451_fc2yk24h

Returned parsed task:

{
  "name": "inspect_startup_docs",
  "title": "Inspect Repository Startup Documentation"
}

Static checks:

python3 -m py_compile swe_af/runtime/codex_harness_patch.py tests/test_codex_harness_patch.py

Note: full pytest was not run locally because the host environment did not have pytest installed.

Compatibility Notes

  • Default runtime remains claude_code.
  • Existing Claude, OpenCode, OpenRouter, HAX, and GitHub PR behavior is preserved.
  • This PR adds Codex support as an additional runtime/provider path only.

@ivasuy ivasuy requested a review from AbirAbbas as a code owner May 10, 2026 12:16
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 10, 2026

CLA assistant check
All committers have signed the CLA.

AbirAbbas and others added 3 commits May 11, 2026 12:02
The codex harness patch was replacing _schema.build_prompt_suffix and
_runner.build_prompt_suffix globally at import time, so claude_code and
open_code runs were also receiving the codex-specific instruction:
"Do not try to create .agentfield_output.json yourself; the Codex CLI
will persist your final JSON response for AgentField."

That instruction is wrong for those providers — Claude / OpenCode are
supposed to use their Write tool to create the output file (the fast
path the runner expects), and forcing them onto the stdout-parse
fallback costs latency, drops the inline schema for small schemas, and
sends a confusing instruction referencing a Codex CLI that isn't in
the loop.

Use a contextvars.ContextVar set by a wrapped Agent.harness so that
the suffix dispatcher returns the codex-native suffix only when the
active call is for codex, and falls back to the original AgentField
suffix for every other provider.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The codex strict-schema patch strips `default` from properties and
marks every field as required, so when FastPlanResult flows through
Codex the model has to invent a value for `fallback_used`. Despite
the prompt example showing `false`, Codex sometimes returns `true`
alongside a perfectly valid task list — making the flag meaningless
for any downstream consumer that gates on it.

`fallback_used` is planner-side state, not an LLM self-assessment:
it should be True iff the planner's `_fallback_plan(...)` path ran.
Override it back to False after a successful parse so the flag
reflects what actually happened, regardless of what the model wrote.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two gotchas surfaced when actually running a full main-mode build with
the codex runtime that weren't covered in the existing setup notes:

1. The Docker image bakes ENV HARNESS_MODEL=openrouter/moonshotai/kimi-k2.6
   as an OpenCode-side fallback, and SWE-AF's model-resolution env cascade
   reads HARNESS_MODEL. So a codex deployment that only sets
   SWE_DEFAULT_RUNTIME=codex (without SWE_DEFAULT_MODEL) hands an
   OpenRouter Kimi model id to the Codex CLI and the Product Manager
   reasoner fails in ~13s. Document that SWE_DEFAULT_MODEL=gpt-5.3-codex
   (or per-build models map) is required to pin the Codex model.

2. Codex CLI's workspace-write sandbox uses bubblewrap (`bwrap`) and
   needs Linux user namespaces enabled on the host. Docker-on-WSL2 and
   hardened environments refuse with "bwrap: No permissions to create a
   new namespace", and the coder agents return success while writing no
   files. Document the symptom so operators can recognize and fix it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AbirAbbas AbirAbbas merged commit 0e68788 into Agent-Field:main May 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants