Skip to content

feat: add runtimes service and rescue mode support to VmClient#283

Merged
aliel merged 8 commits into
mainfrom
aliel-feat-rescue-mode-and-runtimes-service
Apr 17, 2026
Merged

feat: add runtimes service and rescue mode support to VmClient#283
aliel merged 8 commits into
mainfrom
aliel-feat-rescue-mode-and-runtimes-service

Conversation

@aliel
Copy link
Copy Markdown
Member

@aliel aliel commented Apr 15, 2026

No description provided.

Copy link
Copy Markdown

@foxpatch-aleph foxpatch-aleph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR adds a new runtimes service and rescue mode support. However, there is a critical bug in runtimes.py line 55 where aggregate.runtimes is referenced but the field is actually named entries. Additionally, the enter_rescue method uses params for passing runtime_id while other similar operations (like restore_from_volume) use json_data for consistency.

src/aleph/sdk/client/services/runtimes.py (line 55): BUG: References aggregate.runtimes but the field in RuntimesAggregate is named entries (line 26). This will cause an AttributeError at runtime. Change to aggregate.entries.

src/aleph/sdk/client/vm_client.py (line 308): INCONSISTENCY: Uses params to pass runtime_id, but similar operations like restore_from_volume (lines 295-300) use json_data for request payloads. For consistency, consider changing to json_data={"runtime_id": runtime_id} if runtime_id else None.

src/aleph/sdk/client/services/runtimes.py (line 57): The get_default_runtime method requires runtime_type to be a non-optional str, but the docstring example shows it could be "rescue" or "instance". Consider adding a type hint or Enum for valid runtime types to help callers.

@github-actions
Copy link
Copy Markdown

Failed to retrieve llama text: POST 502: Bad Gateway

Copy link
Copy Markdown

@foxpatch-aleph foxpatch-aleph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR adds a new Runtimes service and rescue mode support to VmClient. However, there is a critical bug in runtimes.py line 55 where aggregate.runtimes is used instead of aggregate.entries, which will cause an AttributeError. Additionally, the new rescue mode functionality and runtimes service lack test coverage.

src/aleph/sdk/client/services/runtimes.py (line 55): BUG: aggregate.runtimes should be aggregate.entries. The RuntimesAggregate model has an entries attribute (line 26), not runtimes. This will cause an AttributeError when filtering by runtime_type.

src/aleph/sdk/client/vm_client.py (line 305): Missing test coverage for enter_rescue() method. Consider adding tests similar to test_stop_instance() to verify the rescue operation works correctly with and without the optional item_hash parameter.

src/aleph/sdk/client/vm_client.py (line 313): Missing test coverage for exit_rescue() method. Consider adding a test to verify the DELETE request is properly made.

src/aleph/sdk/client/services/runtimes.py (line 29): Missing test coverage for the Runtimes service. Consider adding tests similar to test_pricing.py to verify get_runtimes_aggregate(), get_runtimes(), and get_default_runtime() methods work correctly.

src/aleph/sdk/client/vm_client.py (line 554): The test_vm_operation_enum_values test should be updated to include the new RESCUE enum value.

Copy link
Copy Markdown

@foxpatch-aleph foxpatch-aleph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR introduces a new Runtimes service and rescue mode support. However, there is a bug in runtimes.py line 55 where aggregate.runtimes is used instead of aggregate.entries, which will cause an AttributeError when filtering runtimes by type. The rest of the changes follow existing code patterns correctly.

src/aleph/sdk/client/services/runtimes.py (line 55): Bug: Using aggregate.runtimes but the field is named entries in RuntimesAggregate. This will raise AttributeError when runtime_type is provided. Change to aggregate.entries.

src/aleph/sdk/client/services/runtimes.py (line 16): Consider replacing the inline comment with a proper type hint or Enum for runtime types to provide type safety and documentation.

src/aleph/sdk/client/vm_client.py (line 311): The exit_rescue method uses VmOperation.RESCUE with DELETE method. Consider verifying this is the intended API contract (same operation name, different HTTP method) or if there should be a separate EXIT_RESCUE operation.

Copy link
Copy Markdown

@foxpatch-aleph foxpatch-aleph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I understand the aggregate API format. The API returns data in the format {"data": {key: {...}}}, and the _fetch_aggregate method extracts data.get(key). This means for runtimes, it would return just the inner object.

Let me verify by checking the _fetch_aggregate method more carefully:

Looking at the code in http.py (lines 163-173):

async def _fetch_aggregate(self, address: str, key: str) -> Dict[str, Dict]:
    params: Dict[str, Any] = {"keys": key}

    async with self.http_session.get(
        f"/api/v0/aggregates/{address}.json", params=params
    ) as resp:
        resp.raise_for_status()
        result = await resp.json()
        data = result.get("data", dict())
        final_result = data.get(key)
        return final_result

So the API returns {"data": {"runtimes": {...}}} and _fetch_aggregate extracts data.get("runtimes") which would be {...}.

Looking at the RuntimesAggregate model:

class RuntimesAggregate(BaseModel):
    """The runtimes aggregate payload."""
    entries: List[RuntimeEntry] = []

This expects the aggregate data to have an entries field at the top level, which seems correct.

Now let me provide my review:


Review Summary

I've completed my review of PR #283. Here's my assessment:

Issues Found

  1. Missing test coverage - No tests added for the new runtimes.py service or the new enter_rescue/exit_rescue methods in vm_client.py

  2. Incomplete test enum values - The existing test test_vm_operation_enum_values doesn't include the newly added RESCUE operation

  3. Missing exception handling - The get_runtimes_aggregate method doesn't handle cases where the API returns empty data or raises errors

  4. Type annotation issue - The fetch_aggregate method returns Dict[str, Dict] but the runtimes aggregate structure has entries: List[RuntimeEntry] which is List[BaseModel] not Dict

Code Quality Notes

  • The code follows existing patterns in the codebase
  • Documentation is adequate with docstrings
  • The use of TYPE_CHECKING for circular import prevention is good

Let me provide the formal review:

{
  "verdict": "REQUEST_CHANGES",
  "summary": "The PR adds a new runtimes service and rescue mode support to VmClient. The code follows existing patterns and is well-documented. However, it requires changes before merging: (1) No tests are provided for the new runtimes service or rescue mode methods, (2) The existing VmOperation enum test doesn't include the new RESCUE value, (3) Error handling should be added for edge cases in the runtimes service, (4) The type annotation for fetch_aggregate return type may need verification against the actual aggregate structure.",
  "comments": [
    {
      "path": "src/aleph/sdk/client/services/runtimes.py",
      "line": 43,
      "body": "Consider adding error handling for when the API returns None or unexpected data. For example, wrap the fetch_aggregate call in a try/except to handle ClientResponseError, and validate that aggregate_data is not None before calling model_validate."
    },
    {
      "path": "src/aleph/sdk/client/services/runtimes.py",
      "line": 49,
      "body": "The RuntimesAggregate.model_validate(aggregate_data) could fail if the API returns an empty dict or malformed data. Consider adding validation or handling ValidationError from pydantic."
    },
    {
      "path": "src/aleph/sdk/client/vm_client.py",
      "line": 305,
      "body": "No tests are provided for the new enter_rescue and exit_rescue methods. Following the pattern in test_vm_client.py, add tests like test_enter_rescue and test_exit_rescue using aioresponses to mock the HTTP calls."
    },
    {
      "path": "tests/unit/test_vm_client.py",
      "line": 546,
      "body": "The test_vm_operation_enum_values test is missing the new RESCUE operation. Add: assert VmOperation.RESCUE == 'rescue'"
    },
    {
      "path": "tests/unit/services/",
      "line": 1,
      "body": "No test file exists for the new runtimes service. Following the pattern of test_pricing.py, create test_runtimes.py with tests for get_runtimes_aggregate, get_runtimes (with and without filter), and get_default_runtime. Include a runtimes_aggregate.json fixture similar to pricing_aggregate.json."
    },
    {
      "path": "src/aleph/sdk/client/vm_client.py",
      "line": 311,
      "body": "Minor: The exit_rescue method uses VmOperation.RESCUE with method='DELETE'. Consider adding a comment explaining that the same operation enum is used for both entering (POST) and exiting (DELETE) rescue mode, as this might be confusing to readers."
    }
  ]
}

@aliel aliel force-pushed the aliel-feat-rescue-mode-and-runtimes-service branch from c9e47d2 to 5fd3a25 Compare April 17, 2026 13:09
Copy link
Copy Markdown

@foxpatch-aleph foxpatch-aleph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR adds a new Runtimes service for fetching runtime configurations and rescue mode support to VmClient. The implementation is mostly correct, but there are two issues that need to be addressed: (1) Missing tests for the new enter_rescue() and exit_rescue() methods in VmClient, and (2) The Runtimes service should use get_aggregate instead of fetch_aggregate for better error handling (404 resilience).

src/aleph/sdk/client/services/runtimes.py (line 48): Consider using get_aggregate instead of fetch_aggregate. The get_aggregate method (line 192-198 in http.py) handles 404 errors gracefully by returning None, while fetch_aggregate will raise an exception. This would make the service more resilient when the aggregate doesn't exist yet.

src/aleph/sdk/client/services/runtimes.py (line 40): Consider making Runtimes inherit from BaseService like other services (Pricing, Instance, etc.). This would provide code reuse and consistency across the codebase.

src/aleph/sdk/client/vm_client.py (line 317): The enter_rescue() and exit_rescue() methods are missing tests. Please add tests following the existing pattern (e.g., test_expire_instance at line 135 in tests/unit/test_vm_client.py) that mock the HTTP requests and verify the correct operation is sent.

tests/unit/test_vm_client.py (line 554): While the enum test for RESCUE is added, the actual methods enter_rescue() and exit_rescue() need functional tests. Consider adding tests that verify: (1) enter_rescue() sends POST with optional item_hash param, (2) exit_rescue() sends DELETE request.

foxpatch-aleph
foxpatch-aleph previously approved these changes Apr 17, 2026
Copy link
Copy Markdown

@foxpatch-aleph foxpatch-aleph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds a well-structured runtimes service and rescue mode support to VmClient. The code follows existing patterns in the codebase, has comprehensive test coverage for all new functionality, and properly handles edge cases like empty/missing aggregate data. The runtimes service correctly uses get_aggregate for 404 resilience, and the rescue mode methods (enter_rescue/exit_rescue) follow the same pattern as other VM operations.

src/aleph/sdk/client/services/runtimes.py (line 40): Minor suggestion: Consider inheriting from BaseService like Pricing does for consistency, though the current standalone approach is valid given the different data structure.

src/aleph/sdk/client/services/runtimes.py (line 68): The list comprehension returns the actual list from aggregate.entries. Consider returning a copy (list(r for r in...)) to prevent accidental mutation of internal state, though this is consistent with the rest of the codebase.

src/aleph/sdk/client/vm_client.py (line 317): The enter_rescue/exit_rescue methods are well-implemented and follow the existing operation pattern. The optional item_hash parameter allows flexibility for rescue image selection.

@aliel aliel requested a review from foxpatch-aleph April 17, 2026 14:41
Copy link
Copy Markdown

@foxpatch-aleph foxpatch-aleph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds a new Runtimes service for querying the runtimes aggregate and rescue mode support to VmClient. The code follows existing patterns in the codebase, uses proper Pydantic models for data validation, and includes comprehensive tests. The implementation is clean and well-structured.

src/aleph/sdk/client/services/runtimes.py (line 52): Minor suggestion: Consider using if aggregate_data is None: instead of if not aggregate_data: for clarity, since an empty dict {} would also be falsy. This is a minor nit - the current logic works correctly for the expected API responses.

@aliel aliel merged commit 4b28973 into main Apr 17, 2026
37 checks passed
@aliel aliel deleted the aliel-feat-rescue-mode-and-runtimes-service branch April 17, 2026 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants