Automated agent that solves a 30-step browser navigation challenge using LLM-driven browser automation.
- Bun runtime
- An OpenRouter API key
cd computer-use-challenge
bun installCreate a .env file:
OPENROUTER_API_KEY=sk-or-v1-...
Install Playwright browsers (first time only):
bunx playwright install chromiumbun run startThis launches a headless Chromium browser, navigates to the challenge site, and uses gpt-oss-120b (via OpenRouter/Groq) to solve each step.
| Variable | Default | Description |
|---|---|---|
OPENROUTER_API_KEY |
(required) | OpenRouter API key |
MAX_TURNS |
1000 |
Max LLM round-trips before stopping |
MAX_STEPS |
30 |
Target number of steps to complete |
HEADLESS |
true |
Set to false to see the browser window |
Example with visible browser:
HEADLESS=false bun run startResults are saved to runs/ after every completed step:
trajectory_<timestamp>.json— full transcript (updated incrementally after each step)
src/
index.ts — entry point, saves final metrics
agent.ts — main agent loop, stuck detection, sessionStorage bypass
tools.ts — browser tools (navigate, evaluate, snapshot, action)
metrics.ts — token/cost tracking
prompts/
SYSTEM.md — system prompt with challenge-solving patterns
- The agent navigates to the challenge URL and clicks START
- For each of the 30 steps, the LLM reads the page, identifies the challenge type, solves it to reveal a 6-character code, then submits the code
- On step transitions, context is cleared to keep token usage low
- SessionStorage bypass (steps 18-20 where there is a bug in environment): if stuck for 5 turns, decodes the XOR+base64 encoded
wo_sessionfrom sessionStorage to extract the code directly and submit it