fix(eval): optimize screenshot request path and require normalized type coordinates by DavidBShan · Pull Request #658 · browseros-ai/BrowserOS

David Shan (DavidBShan) · 2026-04-08T06:51:04Z

Summary

This PR strengthens the eval orchestrator-executor flow for Clado runs by improving task bootstrap context, progress detection, screenshot handling, and delegation guardrails.

What Changed

Added an orchestrator bootstrap path that captures initial page state and seeds the orchestration loop with stronger context.
Added page progress snapshot/signals logic to detect no-op or repeated executor behavior and prevent unproductive action loops.
Improved delegation observability by attaching screenshots/evidence to delegation steps and tightening status reporting.
Updated the Clado action executor to:
- optimize screenshot request payloads for inference reliability,
- require normalized x/y coordinates for type actions (0-1000).
Updated orchestrator-executor types/config/constants and weekly eval config defaults to align with the new runtime behavior.

Why

Reduce stalled delegations and repeated no-progress actions.
Improve executor predictability for text-entry actions.
Increase signal quality for debugging and grading by preserving better screenshot evidence and initial context.

github-actions · 2026-04-08T06:51:14Z

All contributors have signed the CLA. Thank you!
_{Posted by the CLA Assistant Lite bot.}

David Shan (DavidBShan) · 2026-04-08T06:52:01Z

I have read the CLA Document and I hereby sign the CLA

Felarof (felarof99) · 2026-04-08T23:10:49Z

Greptile (@greptileai)

greptile-apps · 2026-04-08T23:14:31Z

Greptile Summary

This PR strengthens the orchestrator-executor eval loop by adding a bootstrap phase (initial page snapshot + seed context), per-action page-progress signals for no-op detection, screenshot attachment to delegation results, and tightened type action validation requiring normalized x/y coordinates. It also introduces screenshot size optimization for Clado inference requests.

Confidence Score: 4/5

Mostly safe to merge; two functional issues should be addressed before production eval runs.

The PNG palette compression (P1) risks degrading visual model accuracy for screenshots that exceed 2 MB — directly counteracting the stated reliability goal. The missing bodyText progress signal (P1) can cause premature blocked results on content-updating actions like Load More clicks. Both are straightforward fixes. All other changes are well-implemented.

clado-action-executor.ts (PNG palette compression) and page-progress.ts (missing bodyText signal)

Vulnerabilities

No security concerns identified. The Clado API key is passed as a Bearer token in-process from resolved config; no secrets are logged. The evaluate_script injection point uses the browser's own CDP sandbox. The parseActionsFromRawResponse JSON parser is wrapped in a try/catch, preventing malformed model output from crashing the executor.

Important Files Changed

Filename	Overview
packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/clado-action-executor.ts	Core executor file; adds screenshot size optimization, per-action progress checks, and required x/y for type actions. PNG palette compression and missing bodyText progress signal warrant attention.
packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/page-progress.ts	New file capturing page state signals; bodyText is captured but not compared in pageProgressSignals, risking false no-progress on content-only page updates.
packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/bootstrap.ts	New file providing initial page state snapshot and formatted observation string; straightforward and well-implemented.
packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/index.ts	Wires bootstrap capture and screenshot attachment into the delegation factory; clean integration with existing patterns.
packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/orchestrator-agent.ts	Extensive system prompt improvements and bootstrap prompt seeding; logic changes look correct.
packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/types.ts	Adds OrchestratorBootstrap type and screenshotDataUrl to ExecutorResult; clean additions.
packages/browseros-agent/apps/eval/src/constants.ts	Adds NO_PROGRESS_MAX_STREAK and DOM_STATE_HASH_MAX_CHARS constants; straightforward additions.
packages/browseros-agent/apps/eval/src/capture/screenshot.ts	Adds captureBase64 method for in-memory screenshot retrieval; clean and well-structured.
packages/browseros-agent/apps/eval/configs/browseros-oe-clado-weekly.json	Adds maxTurns: 5 to the orchestrator config for the weekly eval run.

Sequence Diagram

sequenceDiagram
    participant OE as OrchestratorExecutorEvaluator
    participant B as Bootstrap
    participant OA as OrchestratorAgent
    participant EF as ExecutorFactory
    participant CE as CladoActionExecutor
    participant PP as PageProgress

    OE->>B: captureInitialPageSnapshot(browser, pageId)
    B-->>OE: BootstrapPageSnapshot
    OE->>OA: agent.run(OrchestratorBootstrap)
    loop Orchestration turns (maxTurns=5)
        OA->>EF: delegate(instruction)
        EF->>CE: executor.execute(instruction)
        loop Per action (max 5)
            CE->>CE: captureScreenshotBase64()
            CE->>CE: optimizeScreenshotForRequest() if >2MB
            CE->>CE: requestActionPrediction(Clado API)
            CE->>PP: capturePageProgressSnapshot() before
            CE->>CE: executeAction()
            CE->>PP: capturePageProgressSnapshot() after
            PP-->>CE: pageProgressSignals(before, after)
            alt no signals and same action streak >= 3
                CE-->>EF: status=blocked
            end
        end
        CE-->>EF: ExecutorResult
        EF->>EF: captureBase64() post-delegation screenshot
        EF-->>OA: status, observation, screenshotDataUrl
        alt task complete
            OA-->>OE: final answer text
        end
    end
    OE-->>OE: saveMetadata()

Prompt To Fix All With AI

This is a comment left during a code review.
Path: packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/clado-action-executor.ts
Line: 113-118

Comment:
**PNG palette mode may degrade inference quality**

`palette: true` quantizes the image to an indexed 8-bit color table (~204 colors at `quality: 80`). Web screenshots contain gradients, shadows, and anti-aliased text that easily require thousands of colors — palette quantization introduces dithering artifacts that can make UI text unreadable for a visual model, working against the stated goal of "optimizing for inference reliability."

Consider using JPEG instead, which provides continuous-tone compression without the color-count ceiling:

```suggestion
    const resized = await sharp(original)
      .resize({ width, withoutEnlargement: true })
      .jpeg({ quality: 80, mozjpeg: true })
      .toBuffer()
```

And update the fallback similarly:
```typescript
const smallest = await sharp(original)
  .resize({ width: 512, withoutEnlargement: true })
  .jpeg({ quality: 70, mozjpeg: true })
  .toBuffer()
```

If the downstream Clado endpoint requires PNG, use `.png({ compressionLevel: 9 })` without `palette`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/page-progress.ts
Line: 158-186

Comment:
**`bodyText` change not included as a progress signal**

`pageProgressSignals` checks URL, scroll, focus, popup state, and form control values — but it never compares `before.bodyText` vs `after.bodyText`. A "Load More" click, an AJAX content refresh, or a page that updates its main text without changing the URL/scroll/form state will produce zero signals. The executor's action will be mis-classified as "no progress" and the streak counter will advance, potentially triggering an early `blocked` result on a perfectly productive action.

```suggestion
export function pageProgressSignals(
  before: PageProgressSnapshot | null,
  after: PageProgressSnapshot | null,
): string[] {
  if (!before || !after) return []

  const signals: string[] = []
  if (before.url !== after.url || before.title !== after.title) {
    signals.push('navigation')
  }
  if (
    Math.abs(after.scrollX - before.scrollX) >= 8 ||
    Math.abs(after.scrollY - before.scrollY) >= 8
  ) {
    signals.push('scroll')
  }
  if (JSON.stringify(before.active) !== JSON.stringify(after.active)) {
    signals.push('focus')
  }
  if (
    before.expandedCount !== after.expandedCount ||
    before.popupCount !== after.popupCount
  ) {
    signals.push('open_state')
  }
  if (JSON.stringify(before.controls) !== JSON.stringify(after.controls)) {
    signals.push('value')
  }
  if (before.bodyText !== after.bodyText) {
    signals.push('content')
  }
  return signals
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/clado-action-executor.ts
Line: 94-96

Comment:
**Coordinate range: implementation caps at 999, error message says 0-1000**

`clampNormalized` clamps to `[0, 999]`, but the `type` action's thrown error (line 599) tells callers the valid range is "0-1000". A caller passing exactly `1000` will have it silently clamped to `999`, which is a ~0.1% positional shift. The discrepancy should be consistent — either change the clamp to `Math.min(1000, …)` or update the error message to "0-999".

```suggestion
function clampNormalized(value: number): number {
  return Math.min(1000, Math.max(0, Math.round(value)))
}
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(eval): avoid invalid multimodal orch..." | Re-trigger Greptile}

greptile-apps · 2026-04-08T23:14:34Z

+      .png({ compressionLevel: 9, palette: true, quality: 80 })
+      .toBuffer()
+    if (resized.byteLength <= CLADO_IMAGE_MAX_BYTES) {
+      return resized.toString('base64')
+    }
+  }


PNG palette mode may degrade inference quality

palette: true quantizes the image to an indexed 8-bit color table (~204 colors at quality: 80). Web screenshots contain gradients, shadows, and anti-aliased text that easily require thousands of colors — palette quantization introduces dithering artifacts that can make UI text unreadable for a visual model, working against the stated goal of "optimizing for inference reliability."

Consider using JPEG instead, which provides continuous-tone compression without the color-count ceiling:

Suggested change

.png({ compressionLevel: 9, palette: true, quality: 80 })

.toBuffer()

if (resized.byteLength <= CLADO_IMAGE_MAX_BYTES) {

return resized.toString('base64')

}

}

const resized = await sharp(original)

.resize({ width, withoutEnlargement: true })

.jpeg({ quality: 80, mozjpeg: true })

.toBuffer()

And update the fallback similarly:

const smallest = await sharp(original) .resize({ width: 512, withoutEnlargement: true }) .jpeg({ quality: 70, mozjpeg: true }) .toBuffer()

If the downstream Clado endpoint requires PNG, use .png({ compressionLevel: 9 }) without palette.

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/clado-action-executor.ts Line: 113-118 Comment: **PNG palette mode may degrade inference quality** `palette: true` quantizes the image to an indexed 8-bit color table (~204 colors at `quality: 80`). Web screenshots contain gradients, shadows, and anti-aliased text that easily require thousands of colors — palette quantization introduces dithering artifacts that can make UI text unreadable for a visual model, working against the stated goal of "optimizing for inference reliability." Consider using JPEG instead, which provides continuous-tone compression without the color-count ceiling: ```suggestion const resized = await sharp(original) .resize({ width, withoutEnlargement: true }) .jpeg({ quality: 80, mozjpeg: true }) .toBuffer() ``` And update the fallback similarly: ```typescript const smallest = await sharp(original) .resize({ width: 512, withoutEnlargement: true }) .jpeg({ quality: 70, mozjpeg: true }) .toBuffer() ``` If the downstream Clado endpoint requires PNG, use `.png({ compressionLevel: 9 })` without `palette`. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-08T23:14:35Z

+export function pageProgressSignals(
+  before: PageProgressSnapshot | null,
+  after: PageProgressSnapshot | null,
+): string[] {
+  if (!before || !after) return []
+
+  const signals: string[] = []
+  if (before.url !== after.url || before.title !== after.title) {
+    signals.push('navigation')
+  }
+  if (
+    Math.abs(after.scrollX - before.scrollX) >= 8 ||
+    Math.abs(after.scrollY - before.scrollY) >= 8
+  ) {
+    signals.push('scroll')
+  }
+  if (JSON.stringify(before.active) !== JSON.stringify(after.active)) {
+    signals.push('focus')
+  }
+  if (
+    before.expandedCount !== after.expandedCount ||
+    before.popupCount !== after.popupCount
+  ) {
+    signals.push('open_state')
+  }
+  if (JSON.stringify(before.controls) !== JSON.stringify(after.controls)) {
+    signals.push('value')
+  }
+  return signals


bodyText change not included as a progress signal

pageProgressSignals checks URL, scroll, focus, popup state, and form control values — but it never compares before.bodyText vs after.bodyText. A "Load More" click, an AJAX content refresh, or a page that updates its main text without changing the URL/scroll/form state will produce zero signals. The executor's action will be mis-classified as "no progress" and the streak counter will advance, potentially triggering an early blocked result on a perfectly productive action.

Suggested change

export function pageProgressSignals(

before: PageProgressSnapshot | null,

after: PageProgressSnapshot | null,

): string[] {

if (!before || !after) return []

const signals: string[] = []

if (before.url !== after.url || before.title !== after.title) {

signals.push('navigation')

}

if (

Math.abs(after.scrollX - before.scrollX) >= 8 ||

Math.abs(after.scrollY - before.scrollY) >= 8

) {

signals.push('scroll')

}

if (JSON.stringify(before.active) !== JSON.stringify(after.active)) {

signals.push('focus')

}

if (

before.expandedCount !== after.expandedCount ||

before.popupCount !== after.popupCount

) {

signals.push('open_state')

}

if (JSON.stringify(before.controls) !== JSON.stringify(after.controls)) {

signals.push('value')

}

return signals

export function pageProgressSignals(

before: PageProgressSnapshot | null,

after: PageProgressSnapshot | null,

): string[] {

if (!before || !after) return []

const signals: string[] = []

if (before.url !== after.url || before.title !== after.title) {

signals.push('navigation')

}

if (

Math.abs(after.scrollX - before.scrollX) >= 8 ||

Math.abs(after.scrollY - before.scrollY) >= 8

) {

signals.push('scroll')

}

if (JSON.stringify(before.active) !== JSON.stringify(after.active)) {

signals.push('focus')

}

if (

before.expandedCount !== after.expandedCount ||

before.popupCount !== after.popupCount

) {

signals.push('open_state')

}

if (JSON.stringify(before.controls) !== JSON.stringify(after.controls)) {

signals.push('value')

}

if (before.bodyText !== after.bodyText) {

signals.push('content')

}

return signals

}

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/page-progress.ts Line: 158-186 Comment: **`bodyText` change not included as a progress signal** `pageProgressSignals` checks URL, scroll, focus, popup state, and form control values — but it never compares `before.bodyText` vs `after.bodyText`. A "Load More" click, an AJAX content refresh, or a page that updates its main text without changing the URL/scroll/form state will produce zero signals. The executor's action will be mis-classified as "no progress" and the streak counter will advance, potentially triggering an early `blocked` result on a perfectly productive action. ```suggestion export function pageProgressSignals( before: PageProgressSnapshot | null, after: PageProgressSnapshot | null, ): string[] { if (!before || !after) return [] const signals: string[] = [] if (before.url !== after.url || before.title !== after.title) { signals.push('navigation') } if ( Math.abs(after.scrollX - before.scrollX) >= 8 || Math.abs(after.scrollY - before.scrollY) >= 8 ) { signals.push('scroll') } if (JSON.stringify(before.active) !== JSON.stringify(after.active)) { signals.push('focus') } if ( before.expandedCount !== after.expandedCount || before.popupCount !== after.popupCount ) { signals.push('open_state') } if (JSON.stringify(before.controls) !== JSON.stringify(after.controls)) { signals.push('value') } if (before.bodyText !== after.bodyText) { signals.push('content') } return signals } ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps Bot reviewed Apr 8, 2026

View reviewed changes

David Shan (DavidBShan) force-pushed the main branch from 25eb9c3 to 823b515 Compare April 16, 2026 10:08

David Shan (DavidBShan) added 5 commits April 16, 2026 19:34

feat(eval): add direct Clado action evaluator

44908f7

fix(eval): stabilize BrowserOS launch

db1439f

feat(eval): add Clado visible-state grading

ac5a466

feat(eval): load NopeCHA in BrowserOS app

34ddd82

feat(eval): pass Clado action temperature

a0a3782

David Shan (DavidBShan) force-pushed the main branch from 3461afc to a0a3782 Compare April 16, 2026 19:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(eval): optimize screenshot request path and require normalized type coordinates#658

fix(eval): optimize screenshot request path and require normalized type coordinates#658
David Shan (DavidBShan) wants to merge 5 commits intobrowseros-ai:mainfrom
DavidBShan:main

David Shan (DavidBShan) commented Apr 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

David Shan (DavidBShan) commented Apr 8, 2026

Uh oh!

Felarof (felarof99) commented Apr 8, 2026

Uh oh!

greptile-apps Bot commented Apr 8, 2026

Vulnerabilities

Uh oh!

greptile-apps Bot Apr 8, 2026

Uh oh!

greptile-apps Bot Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

David Shan (DavidBShan) commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Why

Uh oh!

github-actions Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

David Shan (DavidBShan) commented Apr 8, 2026

Uh oh!

Felarof (felarof99) commented Apr 8, 2026

Uh oh!

greptile-apps Bot commented Apr 8, 2026

Greptile Summary

Confidence Score: 4/5

Vulnerabilities

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

David Shan (DavidBShan) commented Apr 8, 2026 •

edited

Loading

github-actions Bot commented Apr 8, 2026 •

edited

Loading