test chat template swap between and during inference#212
Conversation
There was a problem hiding this comment.
Pull request overview
Adds coverage for swapping an agent’s chat template override both between inference calls and while a streaming inference request is in flight, plus additional unit tests validating chat template rendering behaviors in ChatTemplateRenderer.
Changes:
- Introduce helpers and two new integration tests to validate chat template override swapping behavior (including a concurrent swap during streaming inference).
- Add renderer unit tests covering looping over
messages(role/content) andadd_generation_promptbranching.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
paddler_integration_tests/tests/chat_template.rs |
Adds helpers and new integration tests for chat template swapping between calls and during an in-flight stream. |
paddler/src/chat_template_renderer.rs |
Extends unit tests to validate message iteration rendering and conditional template branches. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let new_state = chat_template_balancer_state(AGENT_DESIRED_MODEL.clone(), template_b.clone()); | ||
|
|
||
| cluster | ||
| .balancer | ||
| .client() | ||
| .management() | ||
| .put_balancer_desired_state(&new_state) | ||
| .await | ||
| .context("failed to put new desired state")?; |
There was a problem hiding this comment.
In test_agent_drains_in_flight_then_applies_chat_template_swap, the test doesn't currently assert the key behavior implied by its name (that the agent keeps using the old chat template until the in-flight stream drains). It only asserts the stream doesn't error and that the new template is eventually applied after completion. Consider polling get_chat_template_override after put_balancer_desired_state (while the stream is still active) and asserting it remains template_a until Done, then asserting it becomes template_b after draining. To make this reliable, also consider increasing max_tokens/using a longer prompt so the stream stays active long enough to observe the pre-drain state.
No description provided.