force answer tool usage at max step limit#331
force answer tool usage at max step limit#331AyushKarupakula wants to merge 1 commit intohud-evals:mainfrom
Conversation
Ensure MCPAgent enforces an answer submission when max steps are reached by setting a force-answer flag and appending a final instruction. Update Claude, OpenAI, and Gemini agents to honor that flag with provider-native tool-choice restrictions so runs terminate with an answer instead of stalling. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| break | ||
|
|
||
| # Check if we should stop | ||
| if response.done or not response.tool_calls: |
There was a problem hiding this comment.
Stop condition skipped without tool calls
High Severity
_run_context now gates stop handling behind if response.tool_calls, so responses with no tool calls never hit the response.done or not response.tool_calls exit path. The loop can continue until max_steps exhaustion (or forever with max_steps=-1), and Trace.content may end up None instead of the model’s final response.
| answer_submitted = True | ||
| # Clear forced-answer flag once we've seen an answer | ||
| if hasattr(self, "_force_answer_only"): | ||
| setattr(self, "_force_answer_only", False) |
There was a problem hiding this comment.
Forced-answer flag leaks across runs
Medium Severity
_run_context sets _force_answer_only but only clears it when an answer tool call is observed. If a run exits without that call, the flag stays set on the agent instance. Later run() calls on the same instance are unintentionally forced into answer-only mode from the first step.
Additional Locations (2)
| if response.tool_calls: | ||
| for tool_call in response.tool_calls: | ||
| if getattr(tool_call, "name", "") == "answer": | ||
| answer_submitted = True |
There was a problem hiding this comment.
Failed answer calls disable final-step forcing
Medium Severity
answer_submitted is set as soon as an answer tool call is emitted, before call_tools confirms success. If that answer call fails, later max-step logic sees answer_submitted=True and skips forced-answer behavior, allowing the run to finish without a valid final submission.


Summary
MCPAgent._run_contextby setting_force_answer_onlyand injecting a terminal instruction whenmax_stepsis reached without ananswertool call_force_answer_onlyusing provider-native tool-choice restrictions so onlyanswercan be called on the forced stepanswertool call is observed, preserving normal behavior in non-forced steps