HYPERFLEET-698 - test: add test cases for adapter, API validation, async processing, concurrent processing, and nodepool#29
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughThe PR restructures and expands test-design documentation across adapter, cluster, nodepool, and new concurrent-processing test files. It adds multiple adapter-focused scenarios (precondition timeouts, crash recovery with broker redelivery, pending events after restart, duplicate-event idempotency, incomplete status reports), introduces concurrent cluster/nodepool concurrency tests, and refactors cluster/nodepool tests to use environment-driven adapter configuration, formalized preconditions/steps/expectations, and explicit cleanup notes. Sequence Diagram(s)mermaid mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@test-design/testcases/async-processing.md`:
- Around line 41-71: The create-cluster step doesn't show how to set CLUSTER_ID
for later steps; after the POST to ${API_URL}/api/hyperfleet/v1/clusters
(payload name "generation-test") capture the created resource id from the
response or Location header and export it to CLUSTER_ID so subsequent commands
using ${CLUSTER_ID} work (e.g., capture the JSON response with jq to extract .id
or read the Location header and parse the ID, then export CLUSTER_ID for later
steps).
In `@test-design/testcases/concurrent-processing.md`:
- Around line 39-67: Step 1 currently fires 5 parallel POSTs but never records
returned IDs; modify the loop that posts to /api/hyperfleet/v1/clusters to
capture each response ID (e.g., parse with jq -r '.id' into a variable like ID)
and append it to an array CLUSTER_IDS so later steps can iterate
${CLUSTER_IDS[@]}; ensure the curl runs in background still but store IDs safely
(collect stdout from each backgrounded curl, parse with jq to extract the id,
and append to CLUSTER_IDS) so subsequent polling uses the captured IDs.
In `@test-design/testcases/nodepool_lifecycle_negative.md`:
- Around line 59-94: Step 3's expected result is non-deterministic ("HTTP 404 OR
empty list"); pick the intended API contract (e.g., "API returns HTTP 404 Not
Found when cluster doesn't exist") and update the test to assert only that
behavior: change the Expected Result to a single deterministic assertion (for
example "API returns HTTP 404 with Problem Details response") and remove the
alternative, and ensure the curl call using ${FAKE_CLUSTER_ID} is validated
against that single status and response shape in the test case for Step 3.
In `@test-design/testcases/nodepool.md`:
- Around line 212-245: The create step doesn't record the new NodePool's ID, so
update the Step 1 POST that creates the NodePool to capture the response id into
NODEPOOL_ID (e.g., assign NODEPOOL_ID from the create curl output using jq -r
'.id'); use a silent curl (-s) to avoid noise and then reference NODEPOOL_ID in
Step 3's GET call; also consider checking the captured value is non-empty before
proceeding.
- Around line 306-343: Step 1 currently creates nodepools but doesn't populate
NODEPOOL_IDS used in Step 3; modify the POST loop that creates np-coexist-1..3
to capture each response's nodepool ID (use curl -s and jq -r '.id' on the POST
response) and append it to an array variable NODEPOOL_IDS (e.g.,
NODEPOOL_IDS+=("$ID")) so the subsequent for NODEPOOL_ID in ${NODEPOOL_IDS[@]}
loop can iterate the created IDs reliably.
🧹 Nitpick comments (3)
test-design/testcases/nodepool_lifecycle_negative.md (1)
59-82: Add RFC 9457 response headers/fields to validation.To fully validate RFC 9457, include
Content-Type: application/problem+jsonand optionallyinstanceif supported.💡 Suggested edit
**Expected Result:** - API returns HTTP 404 Not Found - Response follows RFC 9457 Problem Details format: +- Response header `Content-Type` is `application/problem+json`{
"type": "https://api.hyperfleet.io/errors/not-found",
"title": "Not Found",
"status": 404,
"detail": "Cluster not found"
- // "instance": "" (if supported)
}test-design/testcases/api-validation.md (2)
108-111: Specify the expected error schema for name-length validation.Other tests include structured error payloads; aligning this one improves consistency and assertiveness.
💡 Suggested edit
**Expected Result:** - API returns HTTP 400 Bad Request -- Response contains validation error about name length +- Response contains validation error about name length (code/title/status/detail)
264-307: Clarify error response shape for invalid Kind / JSON.Consider specifying the expected error code/title/status (and Content-Type for JSON parsing errors) to make the tests deterministic.
💡 Suggested edit
**Expected Result:** - API returns HTTP 400 Bad Request -- Response contains validation error about invalid Kind +- Response contains validation error about invalid Kind (code/title/status/detail) ... **Expected Result:** - API returns HTTP 400 Bad Request -- Response contains JSON parsing error +- Response contains JSON parsing error with problem-details or parser error fields
| #### Step 1: Create a cluster and wait for Ready state | ||
| **Action:** | ||
| ```bash | ||
| # Create cluster | ||
| curl -X POST ${API_URL}/api/hyperfleet/v1/clusters \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "kind": "Cluster", | ||
| "name": "generation-test", | ||
| "spec": {"platform": {"type": "gcp", "gcp": {"projectID": "test", "region": "us-central1"}}} | ||
| }' | ||
|
|
||
| # Wait for Ready state | ||
| ``` | ||
|
|
||
| **Expected Result:** | ||
| - Cluster created with generation=1 | ||
| - Cluster reaches Ready state | ||
|
|
||
| --- | ||
|
|
||
| #### Step 2: Record initial generation and observed_generation | ||
| **Action:** | ||
| ```bash | ||
| # Get cluster generation | ||
| curl -s ${API_URL}/api/hyperfleet/v1/clusters/${CLUSTER_ID} | jq '.generation' | ||
|
|
||
| # Get adapter observed_generation | ||
| curl -s ${API_URL}/api/hyperfleet/v1/clusters/${CLUSTER_ID}/statuses | jq '.items[0].observed_generation' | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Capture CLUSTER_ID after creation.
Several steps rely on ${CLUSTER_ID} but the create request doesn’t show how it’s obtained.
💡 Suggested edit
-# Create cluster
-curl -X POST ${API_URL}/api/hyperfleet/v1/clusters \
+# Create cluster
+CLUSTER_ID=$(curl -s -X POST ${API_URL}/api/hyperfleet/v1/clusters \
-H "Content-Type: application/json" \
-d '{
"kind": "Cluster",
"name": "generation-test",
"spec": {"platform": {"type": "gcp", "gcp": {"projectID": "test", "region": "us-central1"}}}
- }'
+ }' | jq -r '.id')📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #### Step 1: Create a cluster and wait for Ready state | |
| **Action:** | |
| ```bash | |
| # Create cluster | |
| curl -X POST ${API_URL}/api/hyperfleet/v1/clusters \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "kind": "Cluster", | |
| "name": "generation-test", | |
| "spec": {"platform": {"type": "gcp", "gcp": {"projectID": "test", "region": "us-central1"}}} | |
| }' | |
| # Wait for Ready state | |
| ``` | |
| **Expected Result:** | |
| - Cluster created with generation=1 | |
| - Cluster reaches Ready state | |
| --- | |
| #### Step 2: Record initial generation and observed_generation | |
| **Action:** | |
| ```bash | |
| # Get cluster generation | |
| curl -s ${API_URL}/api/hyperfleet/v1/clusters/${CLUSTER_ID} | jq '.generation' | |
| # Get adapter observed_generation | |
| curl -s ${API_URL}/api/hyperfleet/v1/clusters/${CLUSTER_ID}/statuses | jq '.items[0].observed_generation' | |
| ``` | |
| # Create cluster | |
| CLUSTER_ID=$(curl -s -X POST ${API_URL}/api/hyperfleet/v1/clusters \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "kind": "Cluster", | |
| "name": "generation-test", | |
| "spec": {"platform": {"type": "gcp", "gcp": {"projectID": "test", "region": "us-central1"}}} | |
| }' | jq -r '.id') | |
| # Wait for Ready state |
🤖 Prompt for AI Agents
In `@test-design/testcases/async-processing.md` around lines 41 - 71, The
create-cluster step doesn't show how to set CLUSTER_ID for later steps; after
the POST to ${API_URL}/api/hyperfleet/v1/clusters (payload name
"generation-test") capture the created resource id from the response or Location
header and export it to CLUSTER_ID so subsequent commands using ${CLUSTER_ID}
work (e.g., capture the JSON response with jq to extract .id or read the
Location header and parse the ID, then export CLUSTER_ID for later steps).
| #### Step 1: Submit 5 cluster creation requests simultaneously | ||
| **Action:** | ||
| - Submit 5 POST requests in parallel using background processes: | ||
| ```bash | ||
| for i in $(seq 1 5); do | ||
| curl -X POST ${API_URL}/api/hyperfleet/v1/clusters \ | ||
| -H "Content-Type: application/json" \ | ||
| -d "{ | ||
| \"kind\": \"Cluster\", | ||
| \"name\": \"concurrent-test-${i}\", | ||
| \"spec\": {\"platform\": {\"type\": \"gcp\", \"gcp\": {\"projectID\": \"test\", \"region\": \"us-central1\"}}} | ||
| }" & | ||
| done | ||
| wait | ||
| ``` | ||
|
|
||
| **Expected Result:** | ||
| - All 5 requests return successful responses (HTTP 200/201) | ||
| - Each response contains a unique cluster ID | ||
| - No request is rejected or fails due to concurrency | ||
|
|
||
| #### Step 2: Wait for all clusters to be processed | ||
| **Action:** | ||
| - Poll each cluster's status until all reach Ready state or a timeout is reached: | ||
| ```bash | ||
| for CLUSTER_ID in ${CLUSTER_IDS[@]}; do | ||
| curl -s ${API_URL}/api/hyperfleet/v1/clusters/${CLUSTER_ID} | jq '.conditions' | ||
| done | ||
| ``` |
There was a problem hiding this comment.
Capture CLUSTER_IDS from create responses.
Later steps depend on ${CLUSTER_IDS[@]} but Step 1 doesn’t record them.
💡 Suggested edit
-```bash
-for i in $(seq 1 5); do
- curl -X POST ${API_URL}/api/hyperfleet/v1/clusters \
+```bash
+CLUSTER_IDS=()
+for i in $(seq 1 5); do
+ ID=$(curl -s -X POST ${API_URL}/api/hyperfleet/v1/clusters \
-H "Content-Type: application/json" \
-d "{
\"kind\": \"Cluster\",
\"name\": \"concurrent-test-${i}\",
\"spec\": {\"platform\": {\"type\": \"gcp\", \"gcp\": {\"projectID\": \"test\", \"region\": \"us-central1\"}}}
- }" &
+ }" | jq -r '.id')
+ CLUSTER_IDS+=("$ID") &
done
wait</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
In @test-design/testcases/concurrent-processing.md around lines 39 - 67, Step 1
currently fires 5 parallel POSTs but never records returned IDs; modify the loop
that posts to /api/hyperfleet/v1/clusters to capture each response ID (e.g.,
parse with jq -r '.id' into a variable like ID) and append it to an array
CLUSTER_IDS so later steps can iterate ${CLUSTER_IDS[@]}; ensure the curl runs
in background still but store IDs safely (collect stdout from each backgrounded
curl, parse with jq to extract the id, and append to CLUSTER_IDS) so subsequent
polling uses the captured IDs.
</details>
<!-- fingerprinting:phantom:triton:eagle -->
<!-- This is an auto-generated comment by CodeRabbit -->
| **Expected Result:** | ||
| - API returns HTTP 404 Not Found | ||
| - Response follows RFC 9457 Problem Details format: | ||
| ```json | ||
| { | ||
| "type": "https://api.hyperfleet.io/errors/not-found", | ||
| "title": "Not Found", | ||
| "status": 404, | ||
| "detail": "Cluster not found" | ||
| } | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| #### Step 2: Verify error response format (RFC 9457) | ||
| **Action:** | ||
| - Parse the error response and verify required fields | ||
|
|
||
| **Expected Result:** | ||
| - Response contains `type` field | ||
| - Response contains `title` field | ||
| - Response contains `status` field with value 404 | ||
| - Optional: Response contains `detail` field with descriptive message | ||
|
|
||
| --- | ||
|
|
||
| #### Step 3: Verify no nodepool was created | ||
| **Action:** | ||
| ```bash | ||
| curl -X GET ${API_URL}/api/hyperfleet/v1/clusters/${FAKE_CLUSTER_ID}/nodepools | ||
| ``` | ||
|
|
||
| **Expected Result:** | ||
| - API returns HTTP 404 (cluster doesn't exist, so nodepools list is not accessible) | ||
| - OR returns empty list if API allows listing nodepools for non-existent clusters | ||
|
|
There was a problem hiding this comment.
Make the Step 3 expected result deterministic.
“HTTP 404 OR empty list” makes the test non-assertive. Pick the intended API behavior and validate only that to avoid false positives.
💡 Suggested edit
-**Expected Result:**
-- API returns HTTP 404 (cluster doesn't exist, so nodepools list is not accessible)
-- OR returns empty list if API allows listing nodepools for non-existent clusters
+**Expected Result:**
+- API returns HTTP 404 (cluster doesn't exist, so nodepools list is not accessible)🤖 Prompt for AI Agents
In `@test-design/testcases/nodepool_lifecycle_negative.md` around lines 59 - 94,
Step 3's expected result is non-deterministic ("HTTP 404 OR empty list"); pick
the intended API contract (e.g., "API returns HTTP 404 Not Found when cluster
doesn't exist") and update the test to assert only that behavior: change the
Expected Result to a single deterministic assertion (for example "API returns
HTTP 404 with Problem Details response") and remove the alternative, and ensure
the curl call using ${FAKE_CLUSTER_ID} is validated against that single status
and response shape in the test case for Step 3.
test-design/testcases/nodepool.md
Outdated
| #### Step 1: Create a nodepool under the cluster | ||
| **Action:** | ||
| - Submit a POST request to create a NodePool resource: | ||
| ```bash | ||
| curl -X POST ${API_URL}/api/hyperfleet/v1/nodepools \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "kind": "NodePool", | ||
| "name": "list-get-test", | ||
| "cluster_id": "'${CLUSTER_ID}'", | ||
| "spec": { ... } | ||
| }' | ||
| ``` | ||
|
|
||
| **Expected Result:** | ||
| - API returns successful response with nodepool ID | ||
|
|
||
| #### Step 2: List all nodepools belonging to the cluster | ||
| **Action:** | ||
| ```bash | ||
| curl -s ${API_URL}/api/hyperfleet/v1/clusters/${CLUSTER_ID}/nodepools | jq '.' | ||
| ``` | ||
|
|
||
| **Expected Result:** | ||
| - Response returns HTTP 200 (OK) status code | ||
| - Response contains an array of nodepools | ||
| - The created nodepool appears in the list | ||
| - Each nodepool entry includes: id, name, cluster_id, conditions, spec | ||
|
|
||
| #### Step 3: Get individual nodepool details by ID | ||
| **Action:** | ||
| ```bash | ||
| curl -s ${API_URL}/api/hyperfleet/v1/nodepools/${NODEPOOL_ID} | jq '.' | ||
| ``` |
There was a problem hiding this comment.
Define how NODEPOOL_ID is captured.
The test uses NODEPOOL_ID in Step 3 without recording it in Step 1. Add a step or command snippet to capture the created nodepool ID from the create response.
💡 Suggested edit
**Action:**
- Submit a POST request to create a NodePool resource:
```bash
-curl -X POST ${API_URL}/api/hyperfleet/v1/nodepools \
+NODEPOOL_ID=$(curl -s -X POST ${API_URL}/api/hyperfleet/v1/nodepools \
-H "Content-Type: application/json" \
-d '{
"kind": "NodePool",
"name": "list-get-test",
"cluster_id": "'${CLUSTER_ID}'",
"spec": { ... }
- }'
+ }' | jq -r '.id')</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
In @test-design/testcases/nodepool.md around lines 212 - 245, The create step
doesn't record the new NodePool's ID, so update the Step 1 POST that creates the
NodePool to capture the response id into NODEPOOL_ID (e.g., assign NODEPOOL_ID
from the create curl output using jq -r '.id'); use a silent curl (-s) to avoid
noise and then reference NODEPOOL_ID in Step 3's GET call; also consider
checking the captured value is non-empty before proceeding.
</details>
<!-- fingerprinting:phantom:triton:eagle -->
<!-- This is an auto-generated comment by CodeRabbit -->
test-design/testcases/nodepool.md
Outdated
| #### Step 1: Create multiple nodepools under the same cluster | ||
| **Action:** | ||
| - Create 3 nodepools with different names: | ||
| ```bash | ||
| for i in 1 2 3; do | ||
| curl -X POST ${API_URL}/api/hyperfleet/v1/nodepools \ | ||
| -H "Content-Type: application/json" \ | ||
| -d "{ | ||
| \"kind\": \"NodePool\", | ||
| \"name\": \"np-coexist-${i}\", | ||
| \"cluster_id\": \"${CLUSTER_ID}\", | ||
| \"spec\": { ... } | ||
| }" | ||
| done | ||
| ``` | ||
|
|
||
| **Expected Result:** | ||
| - All 3 nodepools are created successfully | ||
| - Each returns a unique nodepool ID | ||
|
|
||
| #### Step 2: Verify all nodepools appear in the list | ||
| **Action:** | ||
| ```bash | ||
| curl -s ${API_URL}/api/hyperfleet/v1/clusters/${CLUSTER_ID}/nodepools | jq '.items | length' | ||
| ``` | ||
|
|
||
| **Expected Result:** | ||
| - List contains all 3 nodepools | ||
| - Each nodepool has a distinct ID and name | ||
|
|
||
| #### Step 3: Verify each nodepool reaches Ready state independently | ||
| **Action:** | ||
| - Check each nodepool's conditions: | ||
| ```bash | ||
| for NODEPOOL_ID in ${NODEPOOL_IDS[@]}; do | ||
| curl -s ${API_URL}/api/hyperfleet/v1/nodepools/${NODEPOOL_ID} | jq '.conditions' | ||
| done | ||
| ``` |
There was a problem hiding this comment.
Clarify how NODEPOOL_IDS is populated.
Step 3 uses ${NODEPOOL_IDS[@]} but Step 1 doesn’t capture the IDs. Add collection logic so the loop is reproducible.
💡 Suggested edit
-```bash
-for i in 1 2 3; do
- curl -X POST ${API_URL}/api/hyperfleet/v1/nodepools \
+```bash
+NODEPOOL_IDS=()
+for i in 1 2 3; do
+ ID=$(curl -s -X POST ${API_URL}/api/hyperfleet/v1/nodepools \
-H "Content-Type: application/json" \
-d "{
\"kind\": \"NodePool\",
\"name\": \"np-coexist-${i}\",
\"cluster_id\": \"${CLUSTER_ID}\",
\"spec\": { ... }
- }"
+ }" | jq -r '.id')
+ NODEPOOL_IDS+=("$ID")
done</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
In @test-design/testcases/nodepool.md around lines 306 - 343, Step 1 currently
creates nodepools but doesn't populate NODEPOOL_IDS used in Step 3; modify the
POST loop that creates np-coexist-1..3 to capture each response's nodepool ID
(use curl -s and jq -r '.id' on the POST response) and append it to an array
variable NODEPOOL_IDS (e.g., NODEPOOL_IDS+=("$ID")) so the subsequent for
NODEPOOL_ID in ${NODEPOOL_IDS[@]} loop can iterate the created IDs reliably.
</details>
<!-- fingerprinting:phantom:triton:eagle -->
<!-- This is an auto-generated comment by CodeRabbit -->
test-design/testcases/nodepool.md
Outdated
|
|
||
| 1. [Nodepools Resource Type - Workflow Validation](#test-title-nodepools-resource-type---workflow-validation) | ||
| 2. [Nodepools Resource Type - K8s Resource Check Aligned with Preinstalled NodePool Related Adapters Specified](#test-title-nodepools-resource-type---k8s-resource-check-aligned-with-preinstalled-nodepool-related-adapters-specified) | ||
| 3. [Nodepools Resource Type - List and Get API Operations](#test-title-nodepools-resource-type---list-and-get-api-operations) |
There was a problem hiding this comment.
Is this one duplicated with Nodepools Resource Type - Workflow Validation?
There was a problem hiding this comment.
You are right. The "Workflow Validation" test already covers a GET operation in Step 3. The only additional coverage here is the List endpoint. I'll merge the List/Get verification into the "Workflow Validation" test as additional steps and remove this separate test case.
| ```bash | ||
| kubectl set env deployment -n hyperfleet -l app.kubernetes.io/instance=example-adapter SIMULATE_RESULT=failure | ||
| kubectl rollout status deployment/example-adapter-hyperfleet-adapter -n hyperfleet --timeout=60s | ||
| ``` |
There was a problem hiding this comment.
We may like there is a failure-model adapters rather than updating the origin adapter's configuration. Every update/rollback will have the risk to make the environment become dirty.
There was a problem hiding this comment.
Agreed. Modifying the existing adapter's environment variable has the risk of leaving the environment dirty if cleanup fails. I'll update the test to deploy a separate failure-adapter Helm release with SIMULATE_RESULT=failure pre-configured, and delete it after the test completes.
test-design/testcases/adapter.md
Outdated
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "kind": "Cluster", | ||
| "name": "failure-test", |
There was a problem hiding this comment.
It is better to have a random string. And also I'd like the cluster should have a label with the test run id.
There was a problem hiding this comment.
Agreed. I'll update it.
There was a problem hiding this comment.
For the set cluster with label about the run id, I think this would be better handled at the automation framework level rather than specifying it in individual test case documents.
Cluster creation now uses testdata/payloads/clusters/cluster-request.json which already contains {{.Random}} template for unique names.
test-design/testcases/adapter.md
Outdated
| --- | ||
|
|
||
|
|
||
| ## Test Title: Adapter Job failure correctly reports Health=False |
There was a problem hiding this comment.
I think it should be the basic case to validate the workflow with the adapter status is false, it is not on the adapter.
There was a problem hiding this comment.
Updated this test case to cluster.md as "Cluster can reflect adapter failure in top-level status" (Tier1, Negative).
| | **Field** | **Value** | | ||
| |-----------|-----------| | ||
| | **Pos/Neg** | Negative | | ||
| | **Priority** | Tier1 | |
There was a problem hiding this comment.
Agreed. I'll update it.
|
|
||
| --- | ||
|
|
||
| ## Test Title: API validates Kind field |
There was a problem hiding this comment.
I am thinking it should be covered by the component unit/integration test.
There was a problem hiding this comment.
Agreed. Removed this test case
|
|
||
| --- | ||
|
|
||
| ## Test Title: API handles database connection failure gracefully |
There was a problem hiding this comment.
I think this testing is out of scope of the E2E testing.
There was a problem hiding this comment.
Agreed. Removed this test case
| @@ -0,0 +1,108 @@ | |||
| # Feature: NodePool Lifecycle - Negative Tests | |||
There was a problem hiding this comment.
Do we need an individual file for this?
There was a problem hiding this comment.
Merged the negative test into nodepool.md.
File nodepool_lifecycle_negative.md deleted.
test-design/testcases/nodepool.md
Outdated
| 1. [Nodepools Resource Type - Workflow Validation](#test-title-nodepools-resource-type---workflow-validation) | ||
| 2. [Nodepools Resource Type - K8s Resource Check Aligned with Preinstalled NodePool Related Adapters Specified](#test-title-nodepools-resource-type---k8s-resource-check-aligned-with-preinstalled-nodepool-related-adapters-specified) | ||
| 3. [Nodepools Resource Type - List and Get API Operations](#test-title-nodepools-resource-type---list-and-get-api-operations) | ||
| 4. [Nodepools Resource Type - Multiple NodePools Coexistence](#test-title-nodepools-resource-type---multiple-nodepools-coexistence) |
There was a problem hiding this comment.
is it better move this to concurrent-processing.md?
There was a problem hiding this comment.
Agreed. Moved "Multiple NodePools Coexistence" to concurrent-processing.md.
| @@ -0,0 +1,386 @@ | |||
| # Feature: Asynchronous Processing and Generation Tracking | |||
There was a problem hiding this comment.
For the Asynchronous Processing, most of them are black box for us, except the disaster cases, I do not think we need break the structure and dig every components behavior, that should be the component-level unit/integration testing, but not E2E.
There was a problem hiding this comment.
Agreed. Removed 3 component-level tests (generation tracking). Kept 2 tests relevant to E2E scope (resilience and idempotency) and moved them to adapter.md. File async-processing.md deleted.
| | **Field** | **Value** | | ||
| |-----------|-----------| | ||
| | **Pos/Neg** | Negative | | ||
| | **Priority** | Tier2 | |
There was a problem hiding this comment.
@yasun1 I still have some confusion about Tier0/1/2. We mark happy path for Tier0. Hoe about Tier1? I think for the importance+negative cases, it is tier1. Is it right?
There was a problem hiding this comment.
Agreed with set Tier1 for this case, since it is important negative cases. cc. @yasun1 for double confirm.
test-design/testcases/adapter.md
Outdated
| - Set adapter to failure mode: | ||
| ```bash | ||
| kubectl set env deployment -n hyperfleet -l app.kubernetes.io/instance=example-adapter SIMULATE_RESULT=failure | ||
| kubectl rollout status deployment/example-adapter-hyperfleet-adapter -n hyperfleet --timeout=60s |
There was a problem hiding this comment.
Do the two cmds generate via AI? I think it is not suitable for our current deployment progress.If it is not right, we can use a sentence to describe in the beginning
There was a problem hiding this comment.
Adapter deployment is now described in Preconditions as a sentence (e.g., "A dedicated failure-adapter is deployed via Helm with pre-configured failure behavior"). and this case is moved into cluster.md.
test-design/testcases/adapter.md
Outdated
| **Expected Result:** | ||
| - Job exists in the cluster namespace | ||
| - Job status shows `Failed=True` | ||
| - Job exit code is non-zero |
There was a problem hiding this comment.
For the step3, I think the action and expected result is not better.Since we just need to check the status via API endpoint.If there is E2E checkpoint, it should check resources creation or not. We don't need to check the adapter jobs.You can refer to tier0 cases that dandan added
There was a problem hiding this comment.
Updated. Test steps now verify only via API endpoints (/clusters/{cluster_id}/statuses), not internal logs.
test-design/testcases/adapter.md
Outdated
| - `Available` condition remains `status: "False"` | ||
| - Cluster does not transition to Ready while adapter reports failure | ||
|
|
||
| #### Step 6: Restore adapter to normal mode |
There was a problem hiding this comment.
Same with yanming's comment in above
There was a problem hiding this comment.
Updated. Deploy a separate failure-adapter and clean up instead of restore the adapter
test-design/testcases/adapter.md
Outdated
|
|
||
| This test validates that when an adapter crashes during event processing, the message broker correctly redelivers the message after the adapter recovers. This ensures that no events are lost due to adapter failures and the system maintains eventual consistency. | ||
|
|
||
| **Note:** This test requires SIMULATE_RESULT=crash mode in the example-adapter, which may not be implemented. |
There was a problem hiding this comment.
I think this can be added in ##Preconditions as testdata part
There was a problem hiding this comment.
Done. SIMULATE_RESULT adapter configuration is now in Preconditions
test-design/testcases/adapter.md
Outdated
| - Set adapter environment variable to crash mode: | ||
| ```bash | ||
| kubectl set env deployment -n hyperfleet -l app.kubernetes.io/instance=example-adapter SIMULATE_RESULT=crash | ||
| kubectl rollout status deployment/example-adapter-hyperfleet-adapter -n hyperfleet |
test-design/testcases/adapter.md
Outdated
| ```bash | ||
| curl -X POST ${API_URL}/api/hyperfleet/v1/clusters \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ |
There was a problem hiding this comment.
We have a request example under testdata, you can use it
There was a problem hiding this comment.
Done. All cluster creation steps now use testdata/payloads/clusters/cluster-request.json
test-design/testcases/adapter.md
Outdated
|
|
||
| **Expected Result:** | ||
| - API returns successful response with cluster ID | ||
| - Sentinel publishes event to broker |
There was a problem hiding this comment.
For the result, I think it would be better to use the actional step to verify. For this sentence, how can we verify it via action.
Sentinel publishes event to broker
There was a problem hiding this comment.
Done. Moved "Sentinel publishes event to broker" from Expected Result to a Note section. Expected Result now only contains verifiable checkpoints (API responses, pod status via kubectl). The internal flow description is kept as a Note to provide context for understanding how the event reaches the adapter, which helps readers follow the test logic.
…ync processing, concurrent processing, and nodepool
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (2)
test-design/testcases/concurrent-processing.md (2)
140-153:⚠️ Potential issue | 🟡 MinorCapture nodepool IDs in Step 1 for use in Step 3.
Similar to the cluster scenario, nodepool IDs created in Step 1 should be captured for use in subsequent verification steps.
💡 Suggested fix
-```bash -for i in 1 2 3; do - curl -X POST ${API_URL}/api/hyperfleet/v1/nodepools \ +```bash +NODEPOOL_IDS=() +for i in 1 2 3; do + ID=$(curl -s -X POST ${API_URL}/api/hyperfleet/v1/nodepools \ -H "Content-Type: application/json" \ - -d `@testdata/payloads/nodepools/gcp.json` + -d `@testdata/payloads/nodepools/gcp.json` | jq -r '.id') + NODEPOOL_IDS+=("$ID") done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-design/testcases/concurrent-processing.md` around lines 140 - 153, Update Step 1 to capture and store the created nodepool IDs so they can be used in Step 3: run the POST loop silently, parse each response JSON for the "id" field (e.g., using jq -r '.id'), assign that value to a variable (e.g., ID) and append it to an array (e.g., NODEPOOL_IDS) instead of discarding the response; ensure the loop uses the same POST endpoint (/api/hyperfleet/v1/nodepools) and that NODEPOOL_IDS is referenced later in Step 3 for verification.
40-55:⚠️ Potential issue | 🟡 MinorCapture
CLUSTER_IDSfrom create responses.Step 1 fires 5 parallel POST requests but doesn't record the returned cluster IDs. Later steps (Step 2-4) depend on
{cluster_id}but have no way to iterate the created clusters.💡 Suggested fix
-```bash -for i in $(seq 1 5); do - curl -X POST ${API_URL}/api/hyperfleet/v1/clusters \ +```bash +CLUSTER_IDS=() +for i in $(seq 1 5); do + ID=$(curl -s -X POST ${API_URL}/api/hyperfleet/v1/clusters \ -H "Content-Type: application/json" \ - -d `@testdata/payloads/clusters/cluster-request.json` & + -d `@testdata/payloads/clusters/cluster-request.json` | jq -r '.id') + CLUSTER_IDS+=("$ID") done -wait🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-design/testcases/concurrent-processing.md` around lines 40 - 55, The test currently issues 5 parallel POSTs but never saves the returned cluster IDs; modify the loop that calls curl to capture each response's ID (e.g., parse JSON with jq -r '.id') and append it to a CLUSTER_IDS array, ensuring you use curl -s to suppress progress and if you background requests keep track and wait for completion before proceeding; then update subsequent steps to iterate over CLUSTER_IDS for {cluster_id} usage.
🧹 Nitpick comments (13)
testdata/adapter-configs/clusters-cl-deployment/adapter-task-resource-deployment.yaml (2)
23-28: Consider adding security context for defense-in-depth (optional for test data).Static analysis flagged missing security context settings. While this is test data, adding basic hardening would align with Kubernetes security best practices:
🛡️ Optional security hardening
spec: + securityContext: + runAsNonRoot: true + seccompProfile: + type: RuntimeDefault containers: - name: test image: nginx:latest + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL ports: - containerPort: 80Note: Full
readOnlyRootFilesystem: truewould require additional volume mounts for nginx's cache/pid directories, which may be overkill for test purposes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testdata/adapter-configs/clusters-cl-deployment/adapter-task-resource-deployment.yaml` around lines 23 - 28, Add a basic security context to the test container declared under containers -> name: test (and optionally a pod-level securityContext) to harden the YAML: set runAsNonRoot to true, set a non-root runAsUser (e.g., 1000), set allowPrivilegeEscalation to false, and consider setting readOnlyRootFilesystem (or leave it false if nginx needs writable dirs); update the container entry (and/or pod spec) so the security fields are present alongside image: nginx:latest and ports to satisfy static analysis while avoiding breaking the test deployment.
26-26: Consider pinning the nginx image version.Using
nginx:latestcan lead to non-reproducible test results if the image is updated. Consider pinning to a specific version for consistent test behavior.📌 Pin image version
- image: nginx:latest + image: nginx:1.25-alpine🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testdata/adapter-configs/clusters-cl-deployment/adapter-task-resource-deployment.yaml` at line 26, The YAML uses an unpinned image reference "image: nginx:latest" which can lead to non-reproducible tests; update the adapter-task-resource-deployment manifest to replace the "nginx:latest" reference with a specific, immutable tag or digest (for example a concrete semver like nginx:1.25.2 or a sha256 image digest) so tests pull a fixed image version.testdata/adapter-configs/nodepools-np-configmap/values.yaml (1)
28-31: Minor: Inconsistent indentation in rbac section.Line 29 has 3 spaces of indentation while the rest of the file uses 2-space indentation. This is a minor YAML style inconsistency.
♻️ Fix indentation
rbac: - resources: + resources: - configmaps - configmaps/status🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testdata/adapter-configs/nodepools-np-configmap/values.yaml` around lines 28 - 31, The rbac section has inconsistent indentation: ensure the "rbac" block uses 2-space indentation consistently by aligning the "resources:" key and its list items ("configmaps", "configmaps/status") with two spaces per indent level; update the indentation around the rbac -> resources -> list entries so the "resources" key and its dash-prefixed items are indented consistently (2 spaces) under "rbac".pkg/helper/k8s.go (2)
84-95: Empty label selector validation may cause unexpected behavior.When
expectedLabelsis empty,buildLabelSelectorreturns an empty string, andValidateLabelSelector("")returns an error ("label selector cannot be empty"). This meansVerifyJobCompleteandVerifyDeploymentAvailablecannot be called with empty expected labels, even though that might be a valid use case (find any job in namespace).Consider whether this is intentional behavior or if you want to allow empty label selectors (which would list all resources).
♻️ Optional: Allow empty label selector
// Validate inputs to prevent command injection if err := ValidateK8sName(namespace); err != nil { return fmt.Errorf("invalid namespace: %w", err) } - if err := ValidateLabelSelector(labelSelector); err != nil { - return fmt.Errorf("invalid label selector: %w", err) + if labelSelector != "" { + if err := ValidateLabelSelector(labelSelector); err != nil { + return fmt.Errorf("invalid label selector: %w", err) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/helper/k8s.go` around lines 84 - 95, The current flow in VerifyJobComplete (and similarly VerifyDeploymentAvailable) treats an empty label selector as invalid because buildLabelSelector returns "" and ValidateLabelSelector("") errors; decide to allow empty selectors and implement that by skipping the ValidateLabelSelector call when labelSelector == "" (i.e., only call ValidateLabelSelector when non-empty), or alternatively adjust ValidateLabelSelector to accept empty string and document the behavior; update VerifyJobComplete to skip validation for empty labelSelector and ensure any downstream code that relies on a non-empty selector handles the empty case correctly (references: VerifyJobComplete, buildLabelSelector, ValidateLabelSelector, VerifyDeploymentAvailable).
240-261: Deployment availability check requires both conditions.The current logic requires
availableReplicas > 0AND theAvailablecondition to beTrue. This is stricter than typical availability checks. A deployment withavailableReplicas > 0but no explicitAvailable=Truecondition would fail verification.Consider if this is the intended behavior, as some deployments may not have explicit conditions set.
♻️ Alternative: Consider availableReplicas alone as sufficient
- deployAvailable := false - if availableReplicas > 0 { - // Also check for Available condition - for _, cond := range conditions { - condMap, ok := cond.(map[string]interface{}) - if !ok { - continue - } - if condMap["type"] == "Available" && condMap["status"] == "True" { - deployAvailable = true - break - } - } - } + deployAvailable := availableReplicas > 0 + // Optionally also verify Available condition if present + for _, cond := range conditions { + condMap, ok := cond.(map[string]interface{}) + if !ok { + continue + } + if condMap["type"] == "Available" && condMap["status"] == "False" { + deployAvailable = false + break + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/helper/k8s.go` around lines 240 - 261, The current check requires both availableReplicas>0 and an explicit "Available=True" condition (variables: availableReplicas, conditions, deployAvailable), which rejects deployments that have replicas but no condition entry; change the logic so that availableReplicas>0 is treated as sufficient for availability (set deployAvailable = true when availableReplicas > 0) and only fall back to scanning conditions for "Available" when availableReplicas == 0 or when you explicitly want to support a mode that requires conditions; update the branching around the loop over conditions (and the error return message) accordingly so deployments with replicas but no conditions pass.test-design/testcases/nodepool.md (1)
45-52: Consider capturing nodepool_id from create response.Step 1 creates a nodepool but doesn't show how to capture the returned
nodepool_idfor use in subsequent steps (Step 2, 3, 4). Consider adding the ID capture similar to concurrent-processing tests:NODEPOOL_ID=$(curl -s -X POST ${API_URL}/api/hyperfleet/v1/nodepools \ -H "Content-Type: application/json" \ -d `@testdata/payloads/nodepools/gcp.json` | jq -r '.id')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-design/testcases/nodepool.md` around lines 45 - 52, Capture and export the created nodepool's id from the POST response in Step 1 and use that variable in subsequent steps; specifically, parse the JSON response to extract the `.id` field into a shell variable (e.g., NODEPOOL_ID) immediately after the POST in the Step 1 block and update Step 2, Step 3, and Step 4 to reference that variable (NODEPOOL_ID) instead of hardcoding or omitting the id so all later requests use the created nodepool's id.pkg/helper/validation.go (1)
98-114: Consider handling edge cases inAdapterNameToClusterConditionType.The function handles the happy path well, but consider these edge cases:
- Empty adapter name returns just "Successful"
- Names with consecutive dashes like
cl--jobproduce empty parts that are silently skippedThis behavior may be intentional, but documenting or validating the input would improve robustness.
♻️ Optional: Add input validation
func (h *Helper) AdapterNameToClusterConditionType(adapterName string) string { + if adapterName == "" { + return "" + } // Split adapter name by "-" (e.g., "cl-namespace" -> ["cl", "namespace"]) parts := strings.Split(adapterName, "-")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/helper/validation.go` around lines 98 - 114, AdapterNameToClusterConditionType currently builds "Successful" even for empty or all-dash inputs and silently skips empty segments from consecutive dashes; update the function (AdapterNameToClusterConditionType) to first normalize and validate the input by trimming whitespace and dashes and returning an empty string immediately if the normalized adapterName is empty, and then continue building the camelized parts while explicitly filtering out empty segments (to handle consecutive dashes) before appending "Successful".testdata/adapter-configs/clusters-cl-job/adapter-task-resource-job.yaml (1)
176-177: Pinstatus-reporterto an immutable tag or digest.Line 176 uses
:latest, which makes e2e runs non-deterministic across time and environments. Replace with a pinned tag or image digest:- image: registry.ci.openshift.org/ci/status-reporter:latest + image: registry.ci.openshift.org/ci/status-reporter:`@sha256`:<hash>Note: Similar mutable tags exist elsewhere in adapter configs (values.yaml files) and should also be addressed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testdata/adapter-configs/clusters-cl-job/adapter-task-resource-job.yaml` around lines 176 - 177, Replace the mutable image reference "registry.ci.openshift.org/ci/status-reporter:latest" with an immutable tag or digest (for example a specific version tag or sha256 digest) so e2e runs are deterministic; update the image field (the "image:" entry currently set to :latest) to the pinned value and search other adapter config/value.yaml files for other occurrences of "status-reporter:latest" (or other ":latest" images) and pin them similarly.test-design/testcases/adapter.md (5)
245-256: Clarify verification of message redelivery vs. Sentinel republishing.Step 4 verifies that the event is eventually processed, but doesn't distinguish whether this happened due to:
- Message broker redelivery (the intended test scenario)
- Sentinel republishing during its polling cycle (mentioned in line 282)
Consider adding a step to check adapter logs or metrics to confirm the event was received via broker redelivery rather than a fresh Sentinel publish. This would validate the actual redelivery mechanism being tested.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-design/testcases/adapter.md` around lines 245 - 256, Update Step 4 to explicitly verify broker redelivery by adding a sub-step that inspects crash-adapter logs/metrics for redelivery-specific indicators (e.g., consumer redelivery flag, delivery_attempts count, broker message-id, or "redelivered" log line) and compare against Sentinel republish traces (Sentinel poll/publish timestamps or unique publish IDs); reference the crash-adapter's consumer log output and Sentinel publish logs/metrics to confirm the processed event came from broker redelivery rather than a fresh Sentinel publish.
336-339: Use consistent placeholder format.Line 338 uses
<adapter-release-name>as a placeholder, while the rest of the document uses the{variable_name}format (e.g.,{cluster_id}). Consider standardizing on the curly brace format for consistency:{adapter_release_name}.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-design/testcases/adapter.md` around lines 336 - 339, Update the placeholder in the kubectl delete command to match the document's curly-brace style: replace the angle-bracket placeholder `<adapter-release-name>` in the command snippet `kubectl delete pod -n hyperfleet -l app.kubernetes.io/instance=<adapter-release-name>` with the consistent `{adapter_release_name}` format so it reads `kubectl delete pod -n hyperfleet -l app.kubernetes.io/instance={adapter_release_name}`.
315-357: Clarify cluster ID variable capture for cleanup.The test creates two clusters (Steps 1 and 3) but doesn't explicitly show how to capture their IDs as
cluster_id_1andcluster_id_2for use in Step 5 cleanup. Consider adding notes in the Expected Result sections to capture the cluster IDs:Step 1 Expected Result:
# Capture the first cluster ID from the response CLUSTER_ID_1=$(curl -s -X POST ... | jq -r '.id')Step 3 Expected Result:
# Capture the second cluster ID CLUSTER_ID_2=$(curl -s -X POST ... | jq -r '.id')This makes the cleanup step at lines 376-377 more actionable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-design/testcases/adapter.md` around lines 315 - 357, Update the test case to explicitly capture the two created cluster IDs for cleanup: in the Step 1 Expected Result add a note to extract and store the first cluster's id as CLUSTER_ID_1 after the POST response, and in the Step 3 Expected Result add a note to extract and store the second cluster's id as CLUSTER_ID_2; reference these variables (CLUSTER_ID_1, CLUSTER_ID_2) in the later cleanup Step 5 so it's clear which IDs to delete.
57-65: Enhance verification to check specific error details.The test currently only verifies that
Availablecondition isFalse, but doesn't validate the error message content or reason field. Consider adding verification of the specific error reason and message to ensure the adapter correctly identifies it as an invalid K8s resource failure (not just any generic failure).For example:
# Verify the reason field contains indication of invalid resource curl -s ${API_URL}/api/hyperfleet/v1/clusters/{cluster_id}/statuses | \ jq '.items[] | select(.adapter=="test-adapter") | .conditions[] | select(.type=="Available") | {status, reason, message}'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-design/testcases/adapter.md` around lines 57 - 65, Update Step 2 to assert not only that the test-adapter's Available condition has status "False" but also that the condition.reason and condition.message indicate an invalid Kubernetes resource; when polling GET /api/hyperfleet/v1/clusters/{cluster_id}/statuses for the adapter named "test-adapter", extract the .conditions[] for type "Available" and verify .reason contains an indication like "Invalid" or "InvalidK8sResource" and that .message references the invalid resource details so the test distinguishes this specific failure from generic errors.
451-463: Clarify expected job count verification.Line 458 states "should be 1 or managed correctly" which is vague. For idempotency verification, be more specific about the expected behavior:
- If the adapter uses Jobs with
generateName, multiple completed jobs may exist (which is acceptable if they produced the same result)- If the adapter reuses Jobs, there should be exactly one
Consider updating the verification to:
# Verify only one active/running job exists (completed jobs may accumulate) kubectl get jobs -n {cluster_id} --field-selector status.successful=1 | wc -l # Should be exactly 1 successful job🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-design/testcases/adapter.md` around lines 451 - 463, Update "Step 3: Verify no duplicate resources created" to replace the vague "should be 1 or managed correctly" with explicit idempotency checks: state that if the adapter uses Jobs with generateName then completed jobs may accumulate but there must be exactly one active/running Job, and if the adapter reuses Jobs there should be exactly one Job total; instruct to verify active job count via kubectl filtering on status.active and to verify completed runs via kubectl filtering on status.successful (expect exactly 1 successful run).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@e2e/cluster/creation.go`:
- Around line 24-35: Reset the per-spec identifiers before attempting cluster
creation to avoid leaking stale values: in the ginkgo.BeforeEach closure (the
setup that calls helper.New() and h.Client.CreateClusterFromPayload), explicitly
clear or reinitialize the variables clusterID and clusterName (the ones used
later in cleanup) prior to calling CreateClusterFromPayload; then proceed to
create the cluster and assign the returned values as currently done (refer to
helper.New(), h.Client.CreateClusterFromPayload, cluster.ID/cluster.Name and the
variables clusterID and clusterName).
- Around line 309-313: The select in the polling loop that waits on timeout and
ticker.C should also handle context cancellation: add a case <-ctx.Done()
alongside the existing cases and call ginkgo.Fail (or fail the spec) with a
clear message (e.g., "Context canceled while waiting for cl-deployment Available
condition") and return; update the block around the call to
h.Client.GetClusterStatuses(ctx, clusterID) so the loop exits immediately when
ctx is canceled to avoid further polling.
In `@pkg/helper/k8s.go`:
- Around line 28-41: VerifyNamespaceActive is missing validation of the
namespace parameter; call ValidateK8sName(name) at the start of
VerifyNamespaceActive and return its error if validation fails, so the namespace
is validated consistently with VerifyJobComplete and VerifyDeploymentAvailable
before constructing/exec.CommandContext and running kubectl.
In `@test-design/testcases/adapter.md`:
- Around line 494-598: Add a cleanup step to the test in
test-design/testcases/adapter.md: append a new "Step 5: Cleanup resources" that
instructs removing the test cluster and its namespace (delete the namespace
identified by cluster_id) and marks the expected result that the namespace and
associated resources are removed; also note the alternative to use the
HyperFleet DELETE cluster API (curl -X DELETE
${API_URL}/api/hyperfleet/v1/clusters/{cluster_id}) once CLM supports it.
Reference the existing steps (Step 1–4) and ensure the new step mirrors cleanup
style used in other test cases (namespace deletion as workaround until cluster
DELETE is supported).
In
`@testdata/adapter-configs/clusters-cl-job/adapter-task-resource-job-rolebinding.yaml`:
- Around line 15-18: In the subjects block where the ServiceAccount is defined
(name: "job-{{ .clusterId }}-sa"), remove the extra whitespace inside the
namespace template placeholder so that namespace uses the exact template "{{
.clusterId }}" (replace "{{ .clusterId }}" with "{{ .clusterId }}") to match
other occurrences and avoid templating mismatch.
In `@testdata/adapter-configs/clusters-cl-job/adapter-task-resource-job.yaml`:
- Around line 27-214: Add explicit pod- and container-level securityContext to
harden the Job: set spec.securityContext with runAsNonRoot: true and runAsUser
(e.g., 1000), and for each container (names "test-job" and "status-reporter")
add securityContext fields: allowPrivilegeEscalation: false,
readOnlyRootFilesystem: true, capabilities: drop: ["ALL"], runAsNonRoot: true,
runAsUser: 1000, and seccompProfile: { type: RuntimeDefault } (or
runtime/default). Ensure these symbols are updated: top-level
spec.securityContext and the securityContext blocks inside the "test-job" and
"status-reporter" container definitions.
---
Duplicate comments:
In `@test-design/testcases/concurrent-processing.md`:
- Around line 140-153: Update Step 1 to capture and store the created nodepool
IDs so they can be used in Step 3: run the POST loop silently, parse each
response JSON for the "id" field (e.g., using jq -r '.id'), assign that value to
a variable (e.g., ID) and append it to an array (e.g., NODEPOOL_IDS) instead of
discarding the response; ensure the loop uses the same POST endpoint
(/api/hyperfleet/v1/nodepools) and that NODEPOOL_IDS is referenced later in Step
3 for verification.
- Around line 40-55: The test currently issues 5 parallel POSTs but never saves
the returned cluster IDs; modify the loop that calls curl to capture each
response's ID (e.g., parse JSON with jq -r '.id') and append it to a CLUSTER_IDS
array, ensuring you use curl -s to suppress progress and if you background
requests keep track and wait for completion before proceeding; then update
subsequent steps to iterate over CLUSTER_IDS for {cluster_id} usage.
---
Nitpick comments:
In `@pkg/helper/k8s.go`:
- Around line 84-95: The current flow in VerifyJobComplete (and similarly
VerifyDeploymentAvailable) treats an empty label selector as invalid because
buildLabelSelector returns "" and ValidateLabelSelector("") errors; decide to
allow empty selectors and implement that by skipping the ValidateLabelSelector
call when labelSelector == "" (i.e., only call ValidateLabelSelector when
non-empty), or alternatively adjust ValidateLabelSelector to accept empty string
and document the behavior; update VerifyJobComplete to skip validation for empty
labelSelector and ensure any downstream code that relies on a non-empty selector
handles the empty case correctly (references: VerifyJobComplete,
buildLabelSelector, ValidateLabelSelector, VerifyDeploymentAvailable).
- Around line 240-261: The current check requires both availableReplicas>0 and
an explicit "Available=True" condition (variables: availableReplicas,
conditions, deployAvailable), which rejects deployments that have replicas but
no condition entry; change the logic so that availableReplicas>0 is treated as
sufficient for availability (set deployAvailable = true when availableReplicas >
0) and only fall back to scanning conditions for "Available" when
availableReplicas == 0 or when you explicitly want to support a mode that
requires conditions; update the branching around the loop over conditions (and
the error return message) accordingly so deployments with replicas but no
conditions pass.
In `@pkg/helper/validation.go`:
- Around line 98-114: AdapterNameToClusterConditionType currently builds
"Successful" even for empty or all-dash inputs and silently skips empty segments
from consecutive dashes; update the function (AdapterNameToClusterConditionType)
to first normalize and validate the input by trimming whitespace and dashes and
returning an empty string immediately if the normalized adapterName is empty,
and then continue building the camelized parts while explicitly filtering out
empty segments (to handle consecutive dashes) before appending "Successful".
In `@test-design/testcases/adapter.md`:
- Around line 245-256: Update Step 4 to explicitly verify broker redelivery by
adding a sub-step that inspects crash-adapter logs/metrics for
redelivery-specific indicators (e.g., consumer redelivery flag,
delivery_attempts count, broker message-id, or "redelivered" log line) and
compare against Sentinel republish traces (Sentinel poll/publish timestamps or
unique publish IDs); reference the crash-adapter's consumer log output and
Sentinel publish logs/metrics to confirm the processed event came from broker
redelivery rather than a fresh Sentinel publish.
- Around line 336-339: Update the placeholder in the kubectl delete command to
match the document's curly-brace style: replace the angle-bracket placeholder
`<adapter-release-name>` in the command snippet `kubectl delete pod -n
hyperfleet -l app.kubernetes.io/instance=<adapter-release-name>` with the
consistent `{adapter_release_name}` format so it reads `kubectl delete pod -n
hyperfleet -l app.kubernetes.io/instance={adapter_release_name}`.
- Around line 315-357: Update the test case to explicitly capture the two
created cluster IDs for cleanup: in the Step 1 Expected Result add a note to
extract and store the first cluster's id as CLUSTER_ID_1 after the POST
response, and in the Step 3 Expected Result add a note to extract and store the
second cluster's id as CLUSTER_ID_2; reference these variables (CLUSTER_ID_1,
CLUSTER_ID_2) in the later cleanup Step 5 so it's clear which IDs to delete.
- Around line 57-65: Update Step 2 to assert not only that the test-adapter's
Available condition has status "False" but also that the condition.reason and
condition.message indicate an invalid Kubernetes resource; when polling GET
/api/hyperfleet/v1/clusters/{cluster_id}/statuses for the adapter named
"test-adapter", extract the .conditions[] for type "Available" and verify
.reason contains an indication like "Invalid" or "InvalidK8sResource" and that
.message references the invalid resource details so the test distinguishes this
specific failure from generic errors.
- Around line 451-463: Update "Step 3: Verify no duplicate resources created" to
replace the vague "should be 1 or managed correctly" with explicit idempotency
checks: state that if the adapter uses Jobs with generateName then completed
jobs may accumulate but there must be exactly one active/running Job, and if the
adapter reuses Jobs there should be exactly one Job total; instruct to verify
active job count via kubectl filtering on status.active and to verify completed
runs via kubectl filtering on status.successful (expect exactly 1 successful
run).
In `@test-design/testcases/nodepool.md`:
- Around line 45-52: Capture and export the created nodepool's id from the POST
response in Step 1 and use that variable in subsequent steps; specifically,
parse the JSON response to extract the `.id` field into a shell variable (e.g.,
NODEPOOL_ID) immediately after the POST in the Step 1 block and update Step 2,
Step 3, and Step 4 to reference that variable (NODEPOOL_ID) instead of
hardcoding or omitting the id so all later requests use the created nodepool's
id.
In
`@testdata/adapter-configs/clusters-cl-deployment/adapter-task-resource-deployment.yaml`:
- Around line 23-28: Add a basic security context to the test container declared
under containers -> name: test (and optionally a pod-level securityContext) to
harden the YAML: set runAsNonRoot to true, set a non-root runAsUser (e.g.,
1000), set allowPrivilegeEscalation to false, and consider setting
readOnlyRootFilesystem (or leave it false if nginx needs writable dirs); update
the container entry (and/or pod spec) so the security fields are present
alongside image: nginx:latest and ports to satisfy static analysis while
avoiding breaking the test deployment.
- Line 26: The YAML uses an unpinned image reference "image: nginx:latest" which
can lead to non-reproducible tests; update the adapter-task-resource-deployment
manifest to replace the "nginx:latest" reference with a specific, immutable tag
or digest (for example a concrete semver like nginx:1.25.2 or a sha256 image
digest) so tests pull a fixed image version.
In `@testdata/adapter-configs/clusters-cl-job/adapter-task-resource-job.yaml`:
- Around line 176-177: Replace the mutable image reference
"registry.ci.openshift.org/ci/status-reporter:latest" with an immutable tag or
digest (for example a specific version tag or sha256 digest) so e2e runs are
deterministic; update the image field (the "image:" entry currently set to
:latest) to the pinned value and search other adapter config/value.yaml files
for other occurrences of "status-reporter:latest" (or other ":latest" images)
and pin them similarly.
In `@testdata/adapter-configs/nodepools-np-configmap/values.yaml`:
- Around line 28-31: The rbac section has inconsistent indentation: ensure the
"rbac" block uses 2-space indentation consistently by aligning the "resources:"
key and its list items ("configmaps", "configmaps/status") with two spaces per
indent level; update the indentation around the rbac -> resources -> list
entries so the "resources" key and its dash-prefixed items are indented
consistently (2 spaces) under "rbac".
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d0a53e18-801e-4b2f-8dc8-17881db2bae2
📒 Files selected for processing (26)
configs/config.yamldeploy-scripts/lib/sentinel.she2e/cluster/creation.gopkg/helper/k8s.gopkg/helper/validation.gotest-design/testcases/adapter.mdtest-design/testcases/cluster.mdtest-design/testcases/concurrent-processing.mdtest-design/testcases/nodepool.mdtestdata/adapter-configs/clusters-cl-deployment/adapter-config.yamltestdata/adapter-configs/clusters-cl-deployment/adapter-task-config.yamltestdata/adapter-configs/clusters-cl-deployment/adapter-task-resource-deployment.yamltestdata/adapter-configs/clusters-cl-deployment/values.yamltestdata/adapter-configs/clusters-cl-job/adapter-config.yamltestdata/adapter-configs/clusters-cl-job/adapter-task-config.yamltestdata/adapter-configs/clusters-cl-job/adapter-task-resource-job-role.yamltestdata/adapter-configs/clusters-cl-job/adapter-task-resource-job-rolebinding.yamltestdata/adapter-configs/clusters-cl-job/adapter-task-resource-job-serviceaccount.yamltestdata/adapter-configs/clusters-cl-job/adapter-task-resource-job.yamltestdata/adapter-configs/clusters-cl-job/values.yamltestdata/adapter-configs/clusters-cl-namespace/adapter-config.yamltestdata/adapter-configs/clusters-cl-namespace/values.yamltestdata/adapter-configs/nodepools-np-configmap/adapter-config.yamltestdata/adapter-configs/nodepools-np-configmap/adapter-task-config.yamltestdata/adapter-configs/nodepools-np-configmap/adapter-task-resource-configmap.yamltestdata/adapter-configs/nodepools-np-configmap/values.yaml
| ginkgo.BeforeEach(func(ctx context.Context) { | ||
| h = helper.New() | ||
|
|
||
| // Create cluster for all tests in this suite | ||
| cluster, err := h.Client.CreateClusterFromPayload(ctx, "testdata/payloads/clusters/cluster-request.json") | ||
| Expect(err).NotTo(HaveOccurred(), "failed to create cluster") | ||
| Expect(cluster.Id).NotTo(BeNil(), "cluster ID should be generated") | ||
| Expect(cluster.Name).NotTo(BeEmpty(), "cluster name should be present") | ||
| clusterID = *cluster.Id | ||
| clusterName = cluster.Name | ||
| ginkgo.GinkgoWriter.Printf("Created cluster ID: %s, Name: %s\n", clusterID, clusterName) | ||
| }) |
There was a problem hiding this comment.
Reset per-spec identifiers before creating a cluster.
If creation fails before reassignment, stale clusterID/clusterName from a previous spec can leak into cleanup logic.
🛠️ Proposed fix
ginkgo.BeforeEach(func(ctx context.Context) {
h = helper.New()
+ clusterID = ""
+ clusterName = ""
// Create cluster for all tests in this suite
cluster, err := h.Client.CreateClusterFromPayload(ctx, "testdata/payloads/clusters/cluster-request.json")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@e2e/cluster/creation.go` around lines 24 - 35, Reset the per-spec identifiers
before attempting cluster creation to avoid leaking stale values: in the
ginkgo.BeforeEach closure (the setup that calls helper.New() and
h.Client.CreateClusterFromPayload), explicitly clear or reinitialize the
variables clusterID and clusterName (the ones used later in cleanup) prior to
calling CreateClusterFromPayload; then proceed to create the cluster and assign
the returned values as currently done (refer to helper.New(),
h.Client.CreateClusterFromPayload, cluster.ID/cluster.Name and the variables
clusterID and clusterName).
| select { | ||
| case <-timeout: | ||
| ginkgo.Fail("Timed out waiting for cl-deployment Available condition to become True") | ||
| case <-ticker.C: | ||
| statuses, err := h.Client.GetClusterStatuses(ctx, clusterID) |
There was a problem hiding this comment.
Handle context cancellation in the dependency polling loop.
The loop currently exits only on timeout or success. Add a ctx.Done() branch to fail fast on canceled specs.
🛠️ Proposed fix
for {
select {
+ case <-ctx.Done():
+ ginkgo.Fail(fmt.Sprintf("Context canceled while waiting for cl-deployment transition: %v", ctx.Err()))
case <-timeout:
ginkgo.Fail("Timed out waiting for cl-deployment Available condition to become True")
case <-ticker.C:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| select { | |
| case <-timeout: | |
| ginkgo.Fail("Timed out waiting for cl-deployment Available condition to become True") | |
| case <-ticker.C: | |
| statuses, err := h.Client.GetClusterStatuses(ctx, clusterID) | |
| select { | |
| case <-ctx.Done(): | |
| ginkgo.Fail(fmt.Sprintf("Context canceled while waiting for cl-deployment transition: %v", ctx.Err())) | |
| case <-timeout: | |
| ginkgo.Fail("Timed out waiting for cl-deployment Available condition to become True") | |
| case <-ticker.C: | |
| statuses, err := h.Client.GetClusterStatuses(ctx, clusterID) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@e2e/cluster/creation.go` around lines 309 - 313, The select in the polling
loop that waits on timeout and ticker.C should also handle context cancellation:
add a case <-ctx.Done() alongside the existing cases and call ginkgo.Fail (or
fail the spec) with a clear message (e.g., "Context canceled while waiting for
cl-deployment Available condition") and return; update the block around the call
to h.Client.GetClusterStatuses(ctx, clusterID) so the loop exits immediately
when ctx is canceled to avoid further polling.
| // VerifyNamespaceActive verifies a namespace exists and is in Active phase | ||
| func (h *Helper) VerifyNamespaceActive(ctx context.Context, name string, expectedLabels, expectedAnnotations map[string]string) error { | ||
| logger.Info("verifying namespace status", "namespace", name) | ||
|
|
||
| // Create context with timeout | ||
| cmdCtx, cancel := context.WithTimeout(ctx, 30*time.Second) | ||
| defer cancel() | ||
|
|
||
| // Get namespace as JSON | ||
| cmd := exec.CommandContext(cmdCtx, "kubectl", "get", "namespace", name, "-o", "json") | ||
| output, err := cmd.CombinedOutput() | ||
| if err != nil { | ||
| return fmt.Errorf("failed to get namespace %s: %w (output: %s)", name, err, string(output)) | ||
| } |
There was a problem hiding this comment.
Missing input validation for namespace name in VerifyNamespaceActive.
VerifyJobComplete and VerifyDeploymentAvailable both validate the namespace parameter using ValidateK8sName(), but VerifyNamespaceActive doesn't. For consistency and to prevent potential command injection, add the same validation.
🛡️ Suggested fix
func (h *Helper) VerifyNamespaceActive(ctx context.Context, name string, expectedLabels, expectedAnnotations map[string]string) error {
logger.Info("verifying namespace status", "namespace", name)
+ // Validate input to prevent command injection
+ if err := ValidateK8sName(name); err != nil {
+ return fmt.Errorf("invalid namespace name: %w", err)
+ }
+
// Create context with timeout
cmdCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
defer cancel()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // VerifyNamespaceActive verifies a namespace exists and is in Active phase | |
| func (h *Helper) VerifyNamespaceActive(ctx context.Context, name string, expectedLabels, expectedAnnotations map[string]string) error { | |
| logger.Info("verifying namespace status", "namespace", name) | |
| // Create context with timeout | |
| cmdCtx, cancel := context.WithTimeout(ctx, 30*time.Second) | |
| defer cancel() | |
| // Get namespace as JSON | |
| cmd := exec.CommandContext(cmdCtx, "kubectl", "get", "namespace", name, "-o", "json") | |
| output, err := cmd.CombinedOutput() | |
| if err != nil { | |
| return fmt.Errorf("failed to get namespace %s: %w (output: %s)", name, err, string(output)) | |
| } | |
| // VerifyNamespaceActive verifies a namespace exists and is in Active phase | |
| func (h *Helper) VerifyNamespaceActive(ctx context.Context, name string, expectedLabels, expectedAnnotations map[string]string) error { | |
| logger.Info("verifying namespace status", "namespace", name) | |
| // Validate input to prevent command injection | |
| if err := ValidateK8sName(name); err != nil { | |
| return fmt.Errorf("invalid namespace name: %w", err) | |
| } | |
| // Create context with timeout | |
| cmdCtx, cancel := context.WithTimeout(ctx, 30*time.Second) | |
| defer cancel() | |
| // Get namespace as JSON | |
| cmd := exec.CommandContext(cmdCtx, "kubectl", "get", "namespace", name, "-o", "json") | |
| output, err := cmd.CombinedOutput() | |
| if err != nil { | |
| return fmt.Errorf("failed to get namespace %s: %w (output: %s)", name, err, string(output)) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/helper/k8s.go` around lines 28 - 41, VerifyNamespaceActive is missing
validation of the namespace parameter; call ValidateK8sName(name) at the start
of VerifyNamespaceActive and return its error if validation fails, so the
namespace is validated consistently with VerifyJobComplete and
VerifyDeploymentAvailable before constructing/exec.CommandContext and running
kubectl.
| subjects: | ||
| - kind: ServiceAccount | ||
| name: "job-{{ .clusterId }}-sa" | ||
| namespace: "{{ .clusterId }}" |
There was a problem hiding this comment.
Extra whitespace in namespace template placeholder.
Line 18 has {{ .clusterId }} with an extra space before the closing braces. This should be {{ .clusterId }} to match the pattern used elsewhere in this file (lines 5, 7, 17).
🐛 Fix
subjects:
- kind: ServiceAccount
name: "job-{{ .clusterId }}-sa"
- namespace: "{{ .clusterId }}"
+ namespace: "{{ .clusterId }}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| subjects: | |
| - kind: ServiceAccount | |
| name: "job-{{ .clusterId }}-sa" | |
| namespace: "{{ .clusterId }}" | |
| subjects: | |
| - kind: ServiceAccount | |
| name: "job-{{ .clusterId }}-sa" | |
| namespace: "{{ .clusterId }}" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@testdata/adapter-configs/clusters-cl-job/adapter-task-resource-job-rolebinding.yaml`
around lines 15 - 18, In the subjects block where the ServiceAccount is defined
(name: "job-{{ .clusterId }}-sa"), remove the extra whitespace inside the
namespace template placeholder so that namespace uses the exact template "{{
.clusterId }}" (replace "{{ .clusterId }}" with "{{ .clusterId }}") to match
other occurrences and avoid templating mismatch.
| spec: | ||
| serviceAccountName: "job-{{ .clusterId }}-sa" | ||
| restartPolicy: Never | ||
|
|
||
| # Volumes | ||
| volumes: | ||
| - name: results | ||
| emptyDir: {} | ||
|
|
||
| containers: | ||
| - name: test-job | ||
| image: alpine:3.19 | ||
| imagePullPolicy: IfNotPresent | ||
| command: ["/bin/sh", "-c"] | ||
| args: | ||
| - | | ||
| set -e | ||
|
|
||
| # Default configuration | ||
| RESULTS_PATH="${RESULTS_PATH:-/results/adapter-result.json}" | ||
| SIMULATE_RESULT="${SIMULATE_RESULT:-success}" | ||
| SLEEP_DURATION="${SLEEP_DURATION:-0}" | ||
|
|
||
| echo "Dummy job starting..." | ||
| echo "Simulating result: ${SIMULATE_RESULT}" | ||
| echo "Results path: ${RESULTS_PATH}" | ||
| echo "Sleep duration: ${SLEEP_DURATION} seconds" | ||
|
|
||
| # Sleep for the configured duration | ||
| if [ "${SLEEP_DURATION}" -gt 0 ]; then | ||
| echo "Sleeping for ${SLEEP_DURATION} seconds..." | ||
| sleep "${SLEEP_DURATION}" | ||
| echo "Sleep completed" | ||
| fi | ||
|
|
||
| # Ensure results directory exists | ||
| RESULTS_DIR=$(dirname "${RESULTS_PATH}") | ||
| mkdir -p "${RESULTS_DIR}" | ||
|
|
||
| case "${SIMULATE_RESULT}" in | ||
| success) | ||
| echo "Writing success result..." | ||
| cat > "${RESULTS_PATH}" <<EOF | ||
| { | ||
| "status": "success", | ||
| "reason": "ValidationPassed", | ||
| "message": "Environment validated successfully (simulated)", | ||
| "details": { | ||
| "simulation": true, | ||
| "checks_run": 5, | ||
| "checks_passed": 5, | ||
| "timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)" | ||
| } | ||
| } | ||
| EOF | ||
| echo "Success result written to ${RESULTS_PATH}" | ||
| exit 0 | ||
| ;; | ||
|
|
||
| failure) | ||
| echo "Writing failure result..." | ||
| cat > "${RESULTS_PATH}" <<EOF | ||
| { | ||
| "status": "failure", | ||
| "reason": "MissingPermissions", | ||
| "message": "Service account lacks required IAM permissions (simulated)", | ||
| "details": { | ||
| "simulation": true, | ||
| "missing_permissions": [ | ||
| "compute.instances.list", | ||
| "iam.serviceAccounts.get" | ||
| ], | ||
| "service_account": "dummy-sa@project.iam.gserviceaccount.com", | ||
| "timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)" | ||
| } | ||
| } | ||
| EOF | ||
| echo "Failure result written to ${RESULTS_PATH}" | ||
| exit 1 | ||
| ;; | ||
|
|
||
| hang) | ||
| echo "Simulating hang (sleeping indefinitely)..." | ||
| sleep 9999999 | ||
| ;; | ||
|
|
||
| crash) | ||
| echo "Simulating crash (exiting without writing results)..." | ||
| exit 1 | ||
| ;; | ||
|
|
||
| invalid-json) | ||
| echo "Writing invalid JSON result..." | ||
| echo "{ this is not valid json }" > "${RESULTS_PATH}" | ||
| exit 0 | ||
| ;; | ||
|
|
||
| missing-status) | ||
| echo "Writing result with missing status field..." | ||
| cat > "${RESULTS_PATH}" <<EOF | ||
| { | ||
| "reason": "TestReason", | ||
| "message": "This result is missing the status field" | ||
| } | ||
| EOF | ||
| exit 0 | ||
| ;; | ||
|
|
||
| *) | ||
| echo "Unknown SIMULATE_RESULT value: ${SIMULATE_RESULT}" | ||
| echo "Valid values: success, failure, hang, crash, invalid-json, missing-status" | ||
| echo "Defaulting to success..." | ||
| cat > "${RESULTS_PATH}" <<EOF | ||
| { | ||
| "status": "success", | ||
| "reason": "ValidationPassed", | ||
| "message": "Environment validated successfully (default simulation)", | ||
| "details": { | ||
| "simulation": true, | ||
| "note": "Unknown SIMULATE_RESULT value, defaulted to success" | ||
| } | ||
| } | ||
| EOF | ||
| exit 0 | ||
| ;; | ||
| esac | ||
| env: | ||
| # Configure simulation scenario | ||
| # Valid values: success, failure, hang, crash, invalid-json, missing-status | ||
| - name: SIMULATE_RESULT | ||
| value: "{{ .simulateResult }}" # <- this gets resolved from adapterconfig params | ||
| # Configure how long the job runs before completing (in seconds) | ||
| - name: SLEEP_DURATION | ||
| value: "{{ .jobSleepDuration }}" # <- this gets resolved from adapterconfig params | ||
| - name: RESULTS_PATH | ||
| value: "/results/adapter-result.json" | ||
| volumeMounts: | ||
| - name: results | ||
| mountPath: /results | ||
| resources: | ||
| requests: | ||
| memory: "64Mi" | ||
| cpu: "100m" | ||
| limits: | ||
| memory: "128Mi" | ||
| cpu: "200m" | ||
|
|
||
| # Status reporter sidecar | ||
| - name: status-reporter | ||
| image: registry.ci.openshift.org/ci/status-reporter:latest | ||
| imagePullPolicy: Always | ||
| env: | ||
| # Required environment variables | ||
| - name: JOB_NAME | ||
| value: "test-job-{{ .clusterId }}-{{ .generationSpec }}" | ||
| - name: JOB_NAMESPACE | ||
| valueFrom: | ||
| fieldRef: | ||
| fieldPath: metadata.namespace | ||
| - name: POD_NAME | ||
| valueFrom: | ||
| fieldRef: | ||
| fieldPath: metadata.name | ||
|
|
||
| # Optional configuration | ||
| - name: RESULTS_PATH | ||
| value: "/results/adapter-result.json" | ||
| - name: POLL_INTERVAL_SECONDS | ||
| value: "2" | ||
| - name: MAX_WAIT_TIME_SECONDS | ||
| value: "300" | ||
| - name: CONDITION_TYPE | ||
| value: "Available" | ||
| - name: LOG_LEVEL | ||
| value: "info" | ||
| - name: ADAPTER_CONTAINER_NAME | ||
| value: "test-job" | ||
|
|
||
| volumeMounts: | ||
| - name: results | ||
| mountPath: /results | ||
| resources: | ||
| requests: | ||
| memory: "64Mi" | ||
| cpu: "100m" | ||
| limits: | ||
| memory: "128Mi" | ||
| cpu: "200m" |
There was a problem hiding this comment.
Harden pod/container security context for this Job.
Line 27 onward runs both containers with default security context. Add explicit hardening to avoid writable root FS and privilege-escalation defaults.
🔒 Proposed hardening patch
spec:
backoffLimit: 0
activeDeadlineSeconds: 310 # 300 + 10 second buffer
template:
metadata:
labels:
app: test-job
spec:
+ securityContext:
+ seccompProfile:
+ type: RuntimeDefault
serviceAccountName: "job-{{ .clusterId }}-sa"
restartPolicy: Never
@@
- name: test-job
image: alpine:3.19
imagePullPolicy: IfNotPresent
+ securityContext:
+ allowPrivilegeEscalation: false
+ readOnlyRootFilesystem: true
+ capabilities:
+ drop: ["ALL"]
@@
- name: status-reporter
image: registry.ci.openshift.org/ci/status-reporter:latest
imagePullPolicy: Always
+ securityContext:
+ allowPrivilegeEscalation: false
+ readOnlyRootFilesystem: true
+ capabilities:
+ drop: ["ALL"]🧰 Tools
🪛 Trivy (0.69.1)
[error] 175-214: Root file system is not read-only
Container 'status-reporter' of Job 'test-job-{{ .clusterId }}-{{ .generationSpec }}' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[error] 37-172: Root file system is not read-only
Container 'test-job' of Job 'test-job-{{ .clusterId }}-{{ .generationSpec }}' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[error] 37-172: Default security context configured
container test-job-{{ .clusterId }}-{{ .generationSpec }} in {{ .clusterId }} namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
[error] 175-214: Default security context configured
container test-job-{{ .clusterId }}-{{ .generationSpec }} in {{ .clusterId }} namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
[error] 27-214: Default security context configured
job test-job-{{ .clusterId }}-{{ .generationSpec }} in {{ .clusterId }} namespace is using the default security context, which allows root privileges
Rule: KSV-0118
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@testdata/adapter-configs/clusters-cl-job/adapter-task-resource-job.yaml`
around lines 27 - 214, Add explicit pod- and container-level securityContext to
harden the Job: set spec.securityContext with runAsNonRoot: true and runAsUser
(e.g., 1000), and for each container (names "test-job" and "status-reporter")
add securityContext fields: allowPrivilegeEscalation: false,
readOnlyRootFilesystem: true, capabilities: drop: ["ALL"], runAsNonRoot: true,
runAsUser: 1000, and seccompProfile: { type: RuntimeDefault } (or
runtime/default). Ensure these symbols are updated: top-level
spec.securityContext and the securityContext blocks inside the "test-job" and
"status-reporter" container definitions.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
test-design/testcases/concurrent-processing.md (1)
40-62:⚠️ Potential issue | 🟡 MinorCapture cluster IDs from creation responses.
Step 2 and subsequent steps reference
{cluster_id}but Step 1 doesn't record the returned cluster IDs from the parallel POST requests. Without capturing these IDs, the test cannot proceed with polling or verification steps.🔧 Suggested fix
#### Step 1: Submit 5 cluster creation requests simultaneously **Action:** - Submit 5 POST requests in parallel (each call generates a unique name via `{{.Random}}` template): ```bash +CLUSTER_IDS=() for i in $(seq 1 5); do - curl -X POST ${API_URL}/api/hyperfleet/v1/clusters \ + ID=$(curl -s -X POST ${API_URL}/api/hyperfleet/v1/clusters \ -H "Content-Type: application/json" \ - -d `@testdata/payloads/clusters/cluster-request.json` & + -d `@testdata/payloads/clusters/cluster-request.json` | jq -r '.id') + CLUSTER_IDS+=("$ID") & done wait🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-design/testcases/concurrent-processing.md` around lines 40 - 62, Step 1 currently fires 5 parallel POSTs but never captures the returned cluster IDs needed by Step 2; update the loop that posts to the clusters endpoint (the POST to /api/hyperfleet/v1/clusters) to run curl with -s, parse the JSON response with jq to extract the .id, append each id into a CLUSTER_IDS array (or similar variable) while still backgrounding the requests and keep the final wait so all requests finish; then reference CLUSTER_IDS in Step 2 when polling /api/hyperfleet/v1/clusters/{cluster_id} to check conditions.test-design/testcases/adapter.md (1)
494-598:⚠️ Potential issue | 🟡 MinorAdd cleanup step for test cluster resource.
This test case creates a cluster resource (precondition line 518) and submits status reports to it, but lacks a cleanup step to remove the cluster and associated resources. All other test cases in this document include explicit cleanup steps.
🧹 Suggested cleanup step
+#### Step 5: Cleanup resources + +**Action:** +- Delete the namespace created for this cluster: +```bash +kubectl delete namespace {cluster_id} +``` + +**Expected Result:** +- Namespace and all associated resources are deleted successfully + +**Note:** This is a workaround cleanup method. Once CLM supports DELETE operations for "clusters" resource type, the namespace deletion should be replaced with: +```bash +curl -X DELETE ${API_URL}/api/hyperfleet/v1/clusters/{cluster_id} +```🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-design/testcases/adapter.md` around lines 494 - 598, The test "API can handle incomplete adapter status reports gracefully" is missing a cleanup step to remove the created cluster and resources; add a Cleanup section after Step 4 that deletes the test cluster namespace (e.g., `kubectl delete namespace {cluster_id}`) and specifies expected result ("Namespace and associated resources deleted successfully") and note to replace with the API DELETE call when CLM supports DELETE for clusters; reference the test by its title and the resources created in the preconditions/steps so reviewers can locate and append the cleanup block.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test-design/testcases/concurrent-processing.md`:
- Around line 140-163: The creation loop doesn't record nodepool IDs, so modify
the POST loop that creates nodepools to capture each creation response's id (use
jq -r '.id') into an array (e.g., NODEPOOL_IDS) for later verification; update
the loop around the curl call (the for ... do ... done block that posts to
/api/hyperfleet/v1/nodepools) to assign the response id to a variable and append
it to NODEPOOL_IDS so subsequent steps can iterate the stored IDs to check
individual nodepool conditions.
---
Duplicate comments:
In `@test-design/testcases/adapter.md`:
- Around line 494-598: The test "API can handle incomplete adapter status
reports gracefully" is missing a cleanup step to remove the created cluster and
resources; add a Cleanup section after Step 4 that deletes the test cluster
namespace (e.g., `kubectl delete namespace {cluster_id}`) and specifies expected
result ("Namespace and associated resources deleted successfully") and note to
replace with the API DELETE call when CLM supports DELETE for clusters;
reference the test by its title and the resources created in the
preconditions/steps so reviewers can locate and append the cleanup block.
In `@test-design/testcases/concurrent-processing.md`:
- Around line 40-62: Step 1 currently fires 5 parallel POSTs but never captures
the returned cluster IDs needed by Step 2; update the loop that posts to the
clusters endpoint (the POST to /api/hyperfleet/v1/clusters) to run curl with -s,
parse the JSON response with jq to extract the .id, append each id into a
CLUSTER_IDS array (or similar variable) while still backgrounding the requests
and keep the final wait so all requests finish; then reference CLUSTER_IDS in
Step 2 when polling /api/hyperfleet/v1/clusters/{cluster_id} to check
conditions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: aa012679-5f26-45e2-ad68-f5730f6c5e22
📒 Files selected for processing (4)
test-design/testcases/adapter.mdtest-design/testcases/cluster.mdtest-design/testcases/concurrent-processing.mdtest-design/testcases/nodepool.md
| #### Step 1: Create multiple nodepools under the same cluster | ||
| **Action:** | ||
| - Submit 3 POST requests to create NodePool resources (each call generates a unique name via `{{.Random}}` template): | ||
| ```bash | ||
| for i in 1 2 3; do | ||
| curl -X POST ${API_URL}/api/hyperfleet/v1/nodepools \ | ||
| -H "Content-Type: application/json" \ | ||
| -d @testdata/payloads/nodepools/gcp.json | ||
| done | ||
| ``` | ||
|
|
||
| **Expected Result:** | ||
| - All 3 nodepools are created successfully | ||
| - Each returns a unique nodepool ID | ||
|
|
||
| #### Step 2: Verify all nodepools appear in the list | ||
| **Action:** | ||
| ```bash | ||
| curl -s ${API_URL}/api/hyperfleet/v1/clusters/{cluster_id}/nodepools | jq '.items | length' | ||
| ``` | ||
|
|
||
| **Expected Result:** | ||
| - List contains all 3 nodepools | ||
| - Each nodepool has a distinct ID and name |
There was a problem hiding this comment.
Capture nodepool IDs from creation responses.
Step 3 needs to verify each nodepool's conditions individually, but Step 1 doesn't record the nodepool IDs returned from the creation requests. Add logic to capture each response's ID into an array.
🔧 Suggested fix
#### Step 1: Create multiple nodepools under the same cluster
**Action:**
- Submit 3 POST requests to create NodePool resources (each call generates a unique name via `{{.Random}}` template):
```bash
+NODEPOOL_IDS=()
for i in 1 2 3; do
- curl -X POST ${API_URL}/api/hyperfleet/v1/nodepools \
+ ID=$(curl -s -X POST ${API_URL}/api/hyperfleet/v1/nodepools \
-H "Content-Type: application/json" \
- -d `@testdata/payloads/nodepools/gcp.json`
+ -d `@testdata/payloads/nodepools/gcp.json` | jq -r '.id')
+ NODEPOOL_IDS+=("$ID")
done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test-design/testcases/concurrent-processing.md` around lines 140 - 163, The
creation loop doesn't record nodepool IDs, so modify the POST loop that creates
nodepools to capture each creation response's id (use jq -r '.id') into an array
(e.g., NODEPOOL_IDS) for later verification; update the loop around the curl
call (the for ... do ... done block that posts to /api/hyperfleet/v1/nodepools)
to assign the response id to a variable and append it to NODEPOOL_IDS so
subsequent steps can iterate the stored IDs to check individual nodepool
conditions.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
test-design/testcases/adapter.md (1)
57-62: Use bounded polling criteria instead of one-shot GETsFor async flows, a single GET can race with processing completion. Add explicit polling interval + timeout + terminal condition (e.g.,
Available=Falsewith target reason, orApplied=True/Available=True) to reduce flakiness.Also applies to: 150-155, 365-367
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-design/testcases/adapter.md` around lines 57 - 62, Replace the one-shot GET against the /api/hyperfleet/v1/clusters/{cluster_id}/statuses endpoint with a bounded polling loop that retries at a fixed interval until a timeout, checking a clear terminal condition (e.g., status.Available==false with the expected reason for failure, or status.Applied==true && status.Available==true for success); implement interval, max timeout, and fall-back failure assertion to avoid race conditions and apply the same change to the other similar GET checks in the file (the blocks around the other occurrences mentioned).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test-design/testcases/adapter.md`:
- Around line 457-464: Replace the ambiguous assertion "Check job count (should
be 1 or managed correctly)" with a strict, machine-validateable expectation:
specify the exact allowed job states and counts (for example, "Expect exactly 1
Job resource in namespace {cluster_id} with .status.succeeded==1 and no other
active/failed jobs" or "Allow 0 or 1 active jobs: if 0 then ensure previous Job
has .status.succeeded==1; if 1 then ensure it is in .status.active==1 and not
failed"). Update the kubectl example and test validation wording to include the
exact criteria to check (.status.active/.status.succeeded/.status.failed) so
automated checks can assert pass/fail deterministically.
- Around line 582-584: The Expected Result currently lists two contradictory
outcomes ("accepts the report with empty conditions" vs "returns a clear
validation error (HTTP 400)"), making the test non-deterministic; update the
Expected Result block so it asserts a single behavior according to the API
contract (pick either acceptance or explicit 400 validation), replace the dual
bullets with one definitive expectation, and adjust the test assertions and
title/description that reference this case (the "accepts the report with empty
conditions" and "returns a clear validation error (HTTP 400)" bullets) so the
test only validates that one chosen behavior and fails on the other.
- Around line 111-132: The precondition described uses clusterStatus capture and
a condition on clusterPhase that includes "Ready", which can be met and breaks
the intended timeout scenario; update the AdapterConfig preconditions (the
preconditions block, specifically the clusterStatus apiCall and its conditions
on clusterPhase) so the condition cannot be satisfied (for example replace
values ["NotReady","Ready"] with an impossible value like ["NonExistentPhase"]
or use an equality operator with a clearly invalid phase) to ensure the
timeout-adapter path is exercised deterministically.
---
Nitpick comments:
In `@test-design/testcases/adapter.md`:
- Around line 57-62: Replace the one-shot GET against the
/api/hyperfleet/v1/clusters/{cluster_id}/statuses endpoint with a bounded
polling loop that retries at a fixed interval until a timeout, checking a clear
terminal condition (e.g., status.Available==false with the expected reason for
failure, or status.Applied==true && status.Available==true for success);
implement interval, max timeout, and fall-back failure assertion to avoid race
conditions and apply the same change to the other similar GET checks in the file
(the blocks around the other occurrences mentioned).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8908fed1-50aa-42e1-af6a-51adce36a80a
📒 Files selected for processing (1)
test-design/testcases/adapter.md
| 3. A dedicated timeout-adapter is deployed via Helm with AdapterConfig containing preconditions that cannot be met, for example: | ||
| ```yaml | ||
| preconditions: | ||
| - name: "clusterStatus" | ||
| apiCall: | ||
| method: "GET" | ||
| url: "{{ .hyperfleetApiBaseUrl }}/api/hyperfleet/{{ .hyperfleetApiVersion }}/clusters/{{ .clusterId }}" | ||
| timeout: 10s | ||
| retryAttempts: 3 | ||
| retryBackoff: "exponential" | ||
| capture: | ||
| - name: "clusterName" | ||
| field: "name" | ||
| - name: "clusterPhase" | ||
| field: "status.phase" | ||
| - name: "generationId" | ||
| field: "generation" | ||
| conditions: | ||
| - field: "clusterPhase" | ||
| operator: "in" | ||
| values: ["NotReady", "Ready"] | ||
| ``` |
There was a problem hiding this comment.
Precondition contradicts the timeout objective
Line 111 says the preconditions “cannot be met,” but Line 131 allows "Ready", which is typically reachable. That can invalidate the timeout scenario and make the test pass/fail nondeterministically.
Proposed doc fix
- conditions:
- - field: "clusterPhase"
- operator: "in"
- values: ["NotReady", "Ready"]
+ conditions:
+ - field: "clusterPhase"
+ operator: "in"
+ values: ["NeverReachablePhase"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test-design/testcases/adapter.md` around lines 111 - 132, The precondition
described uses clusterStatus capture and a condition on clusterPhase that
includes "Ready", which can be met and breaks the intended timeout scenario;
update the AdapterConfig preconditions (the preconditions block, specifically
the clusterStatus apiCall and its conditions on clusterPhase) so the condition
cannot be satisfied (for example replace values ["NotReady","Ready"] with an
impossible value like ["NonExistentPhase"] or use an equality operator with a
clearly invalid phase) to ensure the timeout-adapter path is exercised
deterministically.
| # Check job count (should be 1 or managed correctly) | ||
| kubectl get jobs -n {cluster_id} | ||
| ``` | ||
|
|
||
| **Expected Result:** | ||
| - Only one namespace exists for the cluster | ||
| - No duplicate jobs or resources | ||
|
|
There was a problem hiding this comment.
Assertion wording is too ambiguous for validation
Line 457/463 says job count should be “1 or managed correctly,” which is hard to validate objectively. Define a strict expected state (exact count or explicit allowed states with criteria).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test-design/testcases/adapter.md` around lines 457 - 464, Replace the
ambiguous assertion "Check job count (should be 1 or managed correctly)" with a
strict, machine-validateable expectation: specify the exact allowed job states
and counts (for example, "Expect exactly 1 Job resource in namespace
{cluster_id} with .status.succeeded==1 and no other active/failed jobs" or
"Allow 0 or 1 active jobs: if 0 then ensure previous Job has
.status.succeeded==1; if 1 then ensure it is in .status.active==1 and not
failed"). Update the kubectl example and test validation wording to include the
exact criteria to check (.status.active/.status.succeeded/.status.failed) so
automated checks can assert pass/fail deterministically.
| **Expected Result:** | ||
| - API either accepts the report with empty conditions, or returns a clear validation error (HTTP 400) | ||
| - API does not crash or return HTTP 500 |
There was a problem hiding this comment.
Expected result is non-deterministic
Line 583 allows two opposite outcomes (“accepts” or “HTTP 400”). This makes the test non-actionable as a regression gate. Please lock this to one expected behavior based on the API contract.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test-design/testcases/adapter.md` around lines 582 - 584, The Expected Result
currently lists two contradictory outcomes ("accepts the report with empty
conditions" vs "returns a clear validation error (HTTP 400)"), making the test
non-deterministic; update the Expected Result block so it asserts a single
behavior according to the API contract (pick either acceptance or explicit 400
validation), replace the dual bullets with one definitive expectation, and
adjust the test assertions and title/description that reference this case (the
"accepts the report with empty conditions" and "returns a clear validation error
(HTTP 400)" bullets) so the test only validates that one chosen behavior and
fails on the other.
Summary by CodeRabbit