force answer tool usage at max step limit by AyushKarupakula · Pull Request #331 · hud-evals/hud-python

AyushKarupakula · 2026-02-16T13:29:02Z

Summary

enforce a final-step answer path in MCPAgent._run_context by setting _force_answer_only and injecting a terminal instruction when max_steps is reached without an answer tool call
update Claude/OpenAI/Gemini providers to honor _force_answer_only using provider-native tool-choice restrictions so only answer can be called on the forced step
clear forced-answer state once an answer tool call is observed, preserving normal behavior in non-forced steps

Ensure MCPAgent enforces an answer submission when max steps are reached by setting a force-answer flag and appending a final instruction. Update Claude, OpenAI, and Gemini agents to honor that flag with provider-native tool-choice restrictions so runs terminate with an answer instead of stalling. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-16T13:32:43Z

hud/agents/base.py

+                                break
+
+                        # Check if we should stop
+                        if response.done or not response.tool_calls:


Stop condition skipped without tool calls

High Severity

_run_context now gates stop handling behind if response.tool_calls, so responses with no tool calls never hit the response.done or not response.tool_calls exit path. The loop can continue until max_steps exhaustion (or forever with max_steps=-1), and Trace.content may end up None instead of the model’s final response.

cursor · 2026-02-16T13:32:43Z

hud/agents/base.py

+                                answer_submitted = True
+                                # Clear forced-answer flag once we've seen an answer
+                                if hasattr(self, "_force_answer_only"):
+                                    setattr(self, "_force_answer_only", False)


Forced-answer flag leaks across runs

Medium Severity

_run_context sets _force_answer_only but only clears it when an answer tool call is observed. If a run exits without that call, the flag stays set on the agent instance. Later run() calls on the same instance are unintentionally forced into answer-only mode from the first step.

Additional Locations (2)

hud/agents/openai.py#L349-L353

hud/agents/claude.py#L204-L223

cursor · 2026-02-16T13:32:43Z

hud/agents/base.py

+                    if response.tool_calls:
+                        for tool_call in response.tool_calls:
+                            if getattr(tool_call, "name", "") == "answer":
+                                answer_submitted = True


Failed answer calls disable final-step forcing

Medium Severity

answer_submitted is set as soon as an answer tool call is emitted, before call_tools confirms success. If that answer call fails, later max-step logic sees answer_submitted=True and skips forced-answer behavior, allowing the run to finish without a valid final submission.

Additional Locations (1)

hud/agents/base.py#L502-L503

cursor bot reviewed Feb 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

force answer tool usage at max step limit#331

force answer tool usage at max step limit#331
AyushKarupakula wants to merge 1 commit intohud-evals:mainfrom
AyushKarupakula:ayush/force-answer-tool-enforcement

AyushKarupakula commented Feb 16, 2026 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 16, 2026

Uh oh!

cursor bot Feb 16, 2026

Uh oh!

cursor bot Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AyushKarupakula commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 16, 2026

Choose a reason for hiding this comment

Stop condition skipped without tool calls

Uh oh!

cursor bot Feb 16, 2026

Choose a reason for hiding this comment

Forced-answer flag leaks across runs

Uh oh!

cursor bot Feb 16, 2026

Choose a reason for hiding this comment

Failed answer calls disable final-step forcing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AyushKarupakula commented Feb 16, 2026 •

edited

Loading