Skip to content

force answer tool usage at max step limit#331

Open
AyushKarupakula wants to merge 1 commit intohud-evals:mainfrom
AyushKarupakula:ayush/force-answer-tool-enforcement
Open

force answer tool usage at max step limit#331
AyushKarupakula wants to merge 1 commit intohud-evals:mainfrom
AyushKarupakula:ayush/force-answer-tool-enforcement

Conversation

@AyushKarupakula
Copy link

@AyushKarupakula AyushKarupakula commented Feb 16, 2026

Summary

  • enforce a final-step answer path in MCPAgent._run_context by setting _force_answer_only and injecting a terminal instruction when max_steps is reached without an answer tool call
  • update Claude/OpenAI/Gemini providers to honor _force_answer_only using provider-native tool-choice restrictions so only answer can be called on the forced step
  • clear forced-answer state once an answer tool call is observed, preserving normal behavior in non-forced steps

Ensure MCPAgent enforces an answer submission when max steps are reached by setting a force-answer flag and appending a final instruction. Update Claude, OpenAI, and Gemini agents to honor that flag with provider-native tool-choice restrictions so runs terminate with an answer instead of stalling.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

break

# Check if we should stop
if response.done or not response.tool_calls:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stop condition skipped without tool calls

High Severity

_run_context now gates stop handling behind if response.tool_calls, so responses with no tool calls never hit the response.done or not response.tool_calls exit path. The loop can continue until max_steps exhaustion (or forever with max_steps=-1), and Trace.content may end up None instead of the model’s final response.

Fix in Cursor Fix in Web

answer_submitted = True
# Clear forced-answer flag once we've seen an answer
if hasattr(self, "_force_answer_only"):
setattr(self, "_force_answer_only", False)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forced-answer flag leaks across runs

Medium Severity

_run_context sets _force_answer_only but only clears it when an answer tool call is observed. If a run exits without that call, the flag stays set on the agent instance. Later run() calls on the same instance are unintentionally forced into answer-only mode from the first step.

Additional Locations (2)

Fix in Cursor Fix in Web

if response.tool_calls:
for tool_call in response.tool_calls:
if getattr(tool_call, "name", "") == "answer":
answer_submitted = True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failed answer calls disable final-step forcing

Medium Severity

answer_submitted is set as soon as an answer tool call is emitted, before call_tools confirms success. If that answer call fails, later max-step logic sees answer_submitted=True and skips forced-answer behavior, allowing the run to finish without a valid final submission.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant