-
Notifications
You must be signed in to change notification settings - Fork 748
test: LoRA testing with k8s #4991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis pull request adds LoRA (Low-Rank Adaptation) support to the fault tolerance testing suite. Changes include new checker classes for LoRA validation, factory logic to detect and integrate LoRA scenarios, two Kubernetes deployment manifests for LoRA aggregated and disaggregated topologies, and corresponding test configurations with LoRA-specific load profiles. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
tests/fault_tolerance/deploy/templates/vllm/lora_disagg.yaml (1)
93-167: Consider reducing YAML duplication.The environment variables and extraPodSpec sections are nearly identical between VllmDecodeWorker and VllmPrefillWorker. While explicit configuration is acceptable for test manifests, you could use YAML anchors to reduce duplication if maintaining these files becomes cumbersome.
Example with anchors:
x-common-envs: &common-envs - name: DYN_SYSTEM_ENABLED value: "true" # ... rest of common envs VllmDecodeWorker: envs: *common-envs # ...tests/fault_tolerance/deploy/lora_checker.py (2)
42-49: Unused parameterlora_name.The
lora_nameparameter is stored but never used in thecheck()method. Either remove it or add a TODO comment indicating planned usage.- def __init__(self, lora_name: Optional[str] = None): + def __init__(self, lora_name: Optional[str] = None): """Initialize LoRA discovery checker. Args: lora_name: Optional specific LoRA name to check for """ super().__init__(name="LoRADiscoveryChecker") - self.lora_name = lora_name + self.lora_name = lora_name # TODO: Use for specific LoRA validation
51-111: Placeholder checker always passes.The
LoRADiscoveryChecker.check()currently only logs information and always passes. While this is documented as a placeholder, consider whether tests relying on this checker will provide meaningful validation. At minimum, you might want to raise an assertion or log a more prominent warning if critical conditions are missing (e.g.,context.deploymentis None).Would you like me to help create an issue to track the full etcd-based implementation, or suggest a minimal assertion to add now?
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
tests/fault_tolerance/deploy/checker_factory.py(3 hunks)tests/fault_tolerance/deploy/lora_checker.py(1 hunks)tests/fault_tolerance/deploy/scenarios.py(4 hunks)tests/fault_tolerance/deploy/templates/vllm/lora_agg.yaml(1 hunks)tests/fault_tolerance/deploy/templates/vllm/lora_disagg.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-04T19:03:06.643Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.
Applied to files:
tests/fault_tolerance/deploy/templates/vllm/lora_agg.yamltests/fault_tolerance/deploy/templates/vllm/lora_disagg.yaml
📚 Learning: 2025-09-24T17:50:00.970Z
Learnt from: tzulingk
Repo: ai-dynamo/dynamo PR: 3194
File: tests/utils/managed_deployment.py:555-559
Timestamp: 2025-09-24T17:50:00.970Z
Learning: In tests/utils/managed_deployment.py, when handling None service_name in get_service method, prefer setting service_name to empty string "" rather than defaulting to frontend_service_name, to avoid confusion per user tzulingk's preference.
Applied to files:
tests/fault_tolerance/deploy/checker_factory.py
🧬 Code graph analysis (2)
tests/fault_tolerance/deploy/lora_checker.py (2)
tests/fault_tolerance/deploy/base_checker.py (2)
BaseChecker(57-105)ValidationContext(33-54)tests/fault_tolerance/deploy/test_deployment.py (1)
scenario(32-57)
tests/fault_tolerance/deploy/checker_factory.py (1)
tests/fault_tolerance/deploy/lora_checker.py (2)
LoRADiscoveryChecker(32-111)LoRAInferenceChecker(114-165)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: operator (arm64)
- GitHub Check: operator (amd64)
- GitHub Check: sglang (arm64)
- GitHub Check: vllm (arm64)
- GitHub Check: trtllm (arm64)
- GitHub Check: trtllm (amd64)
- GitHub Check: sglang (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (12)
tests/fault_tolerance/deploy/checker_factory.py (3)
37-40: LGTM!New imports for LoRA-specific checkers are correctly added and align with the new
lora_checker.pymodule.
71-78: LGTM!The LoRA scenario detection using
-lora-pattern matching is consistent with the naming convention inscenarios.py. Prepending LoRA-specific checkers before scenario and results checkers is appropriate for validating LoRA discovery early in the validation flow.
151-156: LGTM!The conditional logic correctly distinguishes between LoRA aggregated deployments (using
VllmWorker) and other deployments including LoRA disaggregated (usingVllmDecodeWorker). This aligns with the worker naming inlora_agg.yamlandlora_disagg.yaml.tests/fault_tolerance/deploy/templates/vllm/lora_agg.yaml (2)
17-18: Verify the image tag placeholder.The image tag
my-tagappears to be a placeholder. Confirm this is intentionally parameterized or replaced during test execution; otherwise, update to a valid tag or use a template variable.
19-91: LGTM!The VllmWorker configuration is well-structured for LoRA testing:
- GPU resource format
gpu: "1"follows dynamo CR conventions (based on learnings)- LoRA-specific environment variables are properly configured
- AWS credentials are marked optional, handling missing secrets gracefully
- Debug logging settings are appropriate for test scenarios
tests/fault_tolerance/deploy/templates/vllm/lora_disagg.yaml (1)
32-92: LGTM with minor suggestion.VllmDecodeWorker configuration is well-structured with proper LoRA environment variables and resource settings.
tests/fault_tolerance/deploy/scenarios.py (5)
508-542: LGTM!The
_create_lora_deployments_for_backendfunction follows the established pattern from_create_moe_deployments_for_backend. The deployment naming convention{backend}-lora-{deploy_type}-tp-{tp_size}-replicas-{replicas}is consistent and integrates well with the LoRA detection logic inchecker_factory.py.
752-762: LGTM!The
lora_loadconfiguration with moderate settings (5 clients, 50 requests per client) is appropriate for LoRA testing. The standard request rate of 1.0 requests/sec is reasonable given LoRA's lower computational overhead compared to MoE models.
554-555: LGTM!LoRA deployments are correctly integrated into
DEPLOYMENT_SPECSfollowing the established pattern. Limiting to vLLM backend initially is a reasonable scope for the initial LoRA testing implementation.
1024-1127: LGTM!The
add_lora_scenariosfunction is well-structured and comprehensive:
- Covers both aggregated and disaggregated deployment types
- Includes process termination and pod deletion failure scenarios
- Provides "none" scenarios for baseline LoRA discovery testing without failures
- Properly integrates with the checker factory for validation
1129-1130: LGTM!The module-level call to
add_lora_scenarios()follows the established pattern used byadd_token_overflow_scenarios()andadd_rolling_upgrade_scenarios().tests/fault_tolerance/deploy/lora_checker.py (1)
16-22: LGTM!Module docstring clearly documents the intended validation scope for LoRA discovery.
| # For LoRA tests, we expect high success rate | ||
| # (failures in LoRA loading should be minimal) | ||
| if success_rate < 80.0: | ||
| self.logger.warning( | ||
| f"⚠ Low success rate for LoRA inference: {success_rate:.2f}%" | ||
| ) | ||
| self.logger.warning( | ||
| " This may indicate LoRA loading or routing issues" | ||
| ) | ||
| else: | ||
| self.logger.info( | ||
| f"✓ LoRA inference success rate is acceptable: {success_rate:.2f}%" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Low success rate should fail the check, not just warn.
Per the BaseChecker contract (from base_checker.py), the check() method should "Raise AssertionError on validation failure". Currently, a success rate below 80% only logs a warning but doesn't fail the test. This could allow tests to pass silently despite LoRA loading or routing issues.
# For LoRA tests, we expect high success rate
# (failures in LoRA loading should be minimal)
if success_rate < 80.0:
self.logger.warning(
f"⚠ Low success rate for LoRA inference: {success_rate:.2f}%"
)
self.logger.warning(
" This may indicate LoRA loading or routing issues"
)
+ raise AssertionError(
+ f"LoRA inference success rate {success_rate:.2f}% is below threshold (80%)"
+ )
else:
self.logger.info(
f"✓ LoRA inference success rate is acceptable: {success_rate:.2f}%"
)🤖 Prompt for AI Agents
In tests/fault_tolerance/deploy/lora_checker.py around lines 151 to 163, the
current logic only logs a warning when LoRA success_rate < 80.0 but per
BaseChecker.check() contract this must raise an AssertionError; change the code
so that when success_rate < 80.0 you log an error (or warning) with the same
message and then raise an AssertionError with a clear message including the
success_rate (e.g. "LoRA inference success rate below threshold: X%"), leaving
the else branch to keep the info log for acceptable rates.
Overview:
Adds comprehensive LoRA (Low-Rank Adaptation) testing infrastructure for Kubernetes-based deployments with fault tolerance validation.
This PR enables testing of LoRA discovery, registration, and inference in both aggregated and disaggregated vLLM deployment modes.
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Release Notes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.