HYPERFLEET-612, HYPERFLEET-613 - test: automate test cases for nodepool resources. by 86254860 · Pull Request #36 · openshift-hyperfleet/hyperfleet-e2e

86254860 · 2026-03-02T08:27:17Z

Summary

Automate nodepool resource test cases:
- Nodepools Resource Type - Workflow Validation
- Nodepools Resource Type - K8s Resource Check
Add and update required helper functions
Align test-design cases with the implemented automated tests
Adjust sentinel settings for nodepool resource

Testing

All 5 cases (2 new + 3 existing) passed with Angel’s Sentinel PR deployed.

This PR depends on the merge of Angel’s PR merge.

Summary by CodeRabbit

Tests
- Expanded nodepool creation workflow with staged adapter-first validations, per-adapter metadata/status checks, Kubernetes resource verifications, reordered test flow, and improved setup/teardown sequencing (wait-for-ready before cleanup).
- Enhanced cluster lifecycle tests to validate adapter-level conditions and final-state expectations.
Chores
- Added ConfigMap verification with strict single-result lookup for adapter resources.
- Updated test payloads, nodepool naming/labels/environment, and Helm settings for nodepools.
Documentation
- Revised nodepool test cases with clearer steps, prerequisites, and automation guidance.

…ol resources, including basic workflow and Kubernetes resource validation

openshift-ci · 2026-03-02T08:27:26Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yingzhanredhat for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-03-02T08:27:33Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Restructures E2E nodepool tests (new Describe blocks, BeforeEach/AfterEach using context, updated payloads and nodepool naming), adds Kubernetes client methods to list ConfigMaps by labels and to return a unique ConfigMap, and introduces Helper.VerifyConfigMap to validate ConfigMap labels/annotations. Renames helper AdapterNameToClusterConditionType → AdapterNameToConditionType and updates callers. Also updates testdata (adapter CEL expression and payload label changes) and a Helm value key in a deploy script. Final-state and adapter-specific validations were added to tests.

Sequence Diagram(s)

sequenceDiagram
participant TestRunner as Test Runner
participant Helper as Helper.VerifyConfigMap
participant Client as K8s Client
participant APIServer as Kubernetes API Server

TestRunner->>Helper: VerifyConfigMap(ctx, namespace, expectedLabels, expectedAnnotations)
Helper->>Client: GetUniqueConfigMapByLabels(ctx, namespace, labels)
Client->>APIServer: List ConfigMaps?namespace=...&labelSelector=...
APIServer-->>Client: []ConfigMap
Client-->>Helper: []ConfigMap
Helper->>Helper: ensure single result, validate annotations
Helper-->>TestRunner: success / error

sequenceDiagram
participant TestRunner as Test Runner
participant HyperfleetAPI as Hyperfleet API
participant Adapters as Adapters
participant ConditionHelper as AdapterNameToConditionType

TestRunner->>HyperfleetAPI: POST clusters/{cid}/nodepools (create)
HyperfleetAPI-->>TestRunner: 202 Accepted (nodepool id)
loop poll status
  TestRunner->>HyperfleetAPI: GET clusters/{cid}/nodepools/{nid}
  HyperfleetAPI->>Adapters: trigger adapters (configmap, job, etc.)
  Adapters-->>HyperfleetAPI: per-adapter status updates
  HyperfleetAPI-->>TestRunner: nodepool status with per-adapter conditions
end
TestRunner->>ConditionHelper: AdapterNameToConditionType(adapterName)
ConditionHelper-->>TestRunner: condition type
TestRunner->>HyperfleetAPI: final GET clusters/{cid}/nodepools/{nid} (verify observedGeneration, Ready/Available)
HyperfleetAPI-->>TestRunner: final state

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Add workaround for nodepool sentinel deployment #28: Similar change to deploy-scripts/lib/sentinel.sh Helm value key for nodepools ownerReferences.
HYPERFLEET-591, HYPERFLEET-593 - test: automate K8s Resources Check and Adapter Dependency Relationships Workflow Validation #27: Renames AdapterNameToClusterConditionType → AdapterNameToConditionType and updates callers (same helper).
HYPERFLEET-620 - test: add clusters job and deployment adapter configurations #26: Introduces/edits clusters-cl-job adapter configs; overlaps with the CEL expression change in adapter-task-config.yaml.

Suggested labels

lgtm, approved

Suggested reviewers

xueli181114
rh-amarin
aredenba-rh

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly references the automated nodepool resource test cases (HYPERFLEET-612, HYPERFLEET-613), which aligns with the main objective of automating nodepool resource tests and helper function updates.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

e2e/nodepool/creation.go (1)
186-227: Test will fail if new nodepool adapters are configured without adding verifiers.

The test explicitly fails when an adapter in h.Cfg.Adapters.NodePool lacks a corresponding verifier (lines 215-217). This is intentional fail-fast behavior, but consider documenting this requirement in the test comments or adding a note that new adapters require verifier additions here.

Currently only "np-configmap" has a verifier defined. If additional nodepool adapters are configured in the future, this test will fail until verifiers are added.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e/nodepool/creation.go` around lines 186 - 227, The test will fail if
h.Cfg.Adapters.NodePool contains an adapter without a verifier in
adapterResourceVerifiers; add a clear comment above the adapterResourceVerifiers
map explaining that every nodepool adapter listed in h.Cfg.Adapters.NodePool
must have a corresponding entry in adapterResourceVerifiers (or tests will
ginkgo.Fail when the verifier is missing), and include guidance to update
adapterResourceVerifiers when adding new adapters (referencing
adapterResourceVerifiers, h.Cfg.Adapters.NodePool, and the ginkgo.Fail check
that enforces this).
test-design/testcases/nodepool.md (1)
54-64: Initial-state assertion is too strict and may be flaky.

As written, this requires both Ready=False and Available=False on the first observed GET. In fast environments, conditions can transition before first poll and cause false failures. Prefer asserting “not yet terminal success” with bounded polling/timing semantics.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-design/testcases/nodepool.md` around lines 54 - 64, Step 1's initial
assertion in "Step 1: Verify initial status of nodepool" is too strict (requires
Ready=False and Available=False on the first GET); change it to perform bounded
polling of the GET
${API_URL}/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id} with
a short retry loop and timeout and assert only that the nodepool is not yet in a
terminal-success state (i.e., do not require both Ready=False and
Available=False on first observation — instead fail only if Ready=True &&
Available=True immediately or if the terminal-success state is observed
persistently after the timeout); update the expected-result text to describe the
polling behavior and the relaxed condition.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test-design/testcases/nodepool.md`:
- Around line 88-90: The document uses two different config paths
(`configs/config.yaml` vs `config.yaml`) causing ambiguity; pick one canonical
path (e.g., `configs/config.yaml`) and update every occurrence in this file to
match it, including the note that mentions `adapters.nodepool` and the
environment variable `HYPERFLEET_ADAPTERS_NODEPOOL`, and also update the
references around lines noted (174-175) so all mentions are identical and
consistent.
- Around line 214-218: The "Wait for cluster Ready" step can hang indefinitely;
add an explicit timeout and fallback so namespace deletion proceeds even if
Ready isn't reached: change the wait logic that blocks before running "kubectl
delete namespace {cluster_id}" to use a bounded wait (e.g., kubectl wait
--for=condition=Ready --timeout=<reasonable-duration> or equivalent polling with
a deadline) and if the timeout is reached log a warning and continue with a
best-effort cleanup by executing the namespace deletion; update the instructions
for the "Wait for cluster Ready" step and the subsequent "kubectl delete
namespace {cluster_id}" step to reflect the timeout and fallback behavior.

---

Nitpick comments:
In `@e2e/nodepool/creation.go`:
- Around line 186-227: The test will fail if h.Cfg.Adapters.NodePool contains an
adapter without a verifier in adapterResourceVerifiers; add a clear comment
above the adapterResourceVerifiers map explaining that every nodepool adapter
listed in h.Cfg.Adapters.NodePool must have a corresponding entry in
adapterResourceVerifiers (or tests will ginkgo.Fail when the verifier is
missing), and include guidance to update adapterResourceVerifiers when adding
new adapters (referencing adapterResourceVerifiers, h.Cfg.Adapters.NodePool, and
the ginkgo.Fail check that enforces this).

In `@test-design/testcases/nodepool.md`:
- Around line 54-64: Step 1's initial assertion in "Step 1: Verify initial
status of nodepool" is too strict (requires Ready=False and Available=False on
the first GET); change it to perform bounded polling of the GET
${API_URL}/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id} with
a short retry loop and timeout and assert only that the nodepool is not yet in a
terminal-success state (i.e., do not require both Ready=False and
Available=False on first observation — instead fail only if Ready=True &&
Available=True immediately or if the terminal-success state is observed
persistently after the timeout); update the expected-result text to describe the
polling behavior and the relaxed condition.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d099a2f and 2b11ad4.

📒 Files selected for processing (9)

e2e/cluster/creation.go
e2e/nodepool/creation.go
pkg/client/kubernetes/client.go
pkg/helper/k8s.go
pkg/helper/validation.go
test-design/testcases/nodepool.md
testdata/adapter-configs/clusters-cl-job/adapter-task-config.yaml
testdata/payloads/clusters/cluster-request.json
testdata/payloads/nodepools/nodepool-request.json

test-design/testcases/nodepool.md

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

test-design/testcases/nodepool.md (2)

112-115: ⚠️ Potential issue | 🟠 Major

Add bounded/best-effort cleanup in workflow test teardown.

Line 114 can still block teardown. Align this scenario with the timeout/fallback cleanup strategy used later in the file.

Suggested doc update

- kubectl delete namespace {cluster_id}
+ kubectl delete namespace {cluster_id} --wait=false
+ # Optionally: poll namespace deletion with a bounded timeout and log warning on timeout

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test-design/testcases/nodepool.md` around lines 112 - 115, The current
teardown issues with a blocking "kubectl delete namespace {cluster_id}" call;
change it to a bounded/fallback cleanup like the later workflow: initiate
deletion (kubectl delete namespace {cluster_id} --wait=false), poll for
namespace termination with a timeout (e.g., 60s), and if the namespace still
exists after timeout perform a forced fallback (kubectl delete namespace
{cluster_id} --grace-period=0 --force or remove finalizers via kubectl patch)
and treat failures as non-fatal so teardown proceeds; update the teardown
section replacing the single blocking delete with this timeout+fallback pattern
referencing the existing "kubectl delete namespace {cluster_id}" command and the
test teardown logic.

219-223: ⚠️ Potential issue | 🟡 Minor

Cleanup command snippet is incomplete.

Line 219 through Line 223 describes timeout behavior but does not provide the actual wait command, making the procedure non-actionable.

Suggested doc update

 # Wait for cluster Ready with timeout (best-effort)
+kubectl wait --for=condition=Ready "cluster/{cluster_id}" --timeout=5m || \
+  echo "WARN: cluster not Ready within timeout; continuing with best-effort cleanup"

 # Delete namespace regardless of Ready state
 kubectl delete namespace {cluster_id}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test-design/testcases/nodepool.md` around lines 219 - 223, The doc has a
placeholder "Wait for cluster Ready with timeout (best-effort)" but no concrete
wait command; update the section so it shows the actual kubectl wait invocation
(e.g., use kubectl wait --for=condition=Ready --timeout=<duration> on the
appropriate resource) and then clarify the follow-up delete step so it
explicitly runs kubectl delete namespace {cluster_id} regardless of wait result;
reference and update the "Wait for cluster Ready with timeout (best-effort)"
heading and the subsequent kubectl delete namespace {cluster_id} line to include
the wait command, timeout parameter, and a note that delete runs irrespective of
the wait outcome.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test-design/testcases/nodepool.md`:
- Around line 62-63: The tests assert brittle intermediate `False` for NodePool
conditions: change the checks against NodePool `Ready` and `Available`
(currently asserting `status: False` at lines referencing the initial state and
the transition at lines ~102-103) to resilient assertions that wait for the
desired state or assert idempotent transitions; specifically, replace exact
initial `status: False` assertions with a wait-for-condition helper (e.g.,
waitForNodePoolCondition or assertEventuallyTrue/assertNotTrue) that polls the
NodePool's `Ready` and `Available` conditions and accepts either already-True or
a False->True transition, and update the tests that check the explicit `False ->
True` transition to assert that the condition becomes True within a timeout
rather than requiring the intermediate False.
- Around line 67-71: The doc currently issues a single GET to
/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses but
then assumes adapter conditions are fully converged (lines 75–87); change the
test steps to use bounded polling: repeatedly GET that endpoint with a fixed
interval and max timeout (e.g., poll every 5s up to 2–5 minutes or X retries)
and stop when the expected adapter conditions are observed or the timeout is
reached, failing the test if not converged; update the nodepool.md instructions
to explicitly document the retry interval, max retries/timeout, convergence
criterion, and an example curl-based loop or reference to the test harness retry
utility so the check is deterministic.

---

Duplicate comments:
In `@test-design/testcases/nodepool.md`:
- Around line 112-115: The current teardown issues with a blocking "kubectl
delete namespace {cluster_id}" call; change it to a bounded/fallback cleanup
like the later workflow: initiate deletion (kubectl delete namespace
{cluster_id} --wait=false), poll for namespace termination with a timeout (e.g.,
60s), and if the namespace still exists after timeout perform a forced fallback
(kubectl delete namespace {cluster_id} --grace-period=0 --force or remove
finalizers via kubectl patch) and treat failures as non-fatal so teardown
proceeds; update the teardown section replacing the single blocking delete with
this timeout+fallback pattern referencing the existing "kubectl delete namespace
{cluster_id}" command and the test teardown logic.
- Around line 219-223: The doc has a placeholder "Wait for cluster Ready with
timeout (best-effort)" but no concrete wait command; update the section so it
shows the actual kubectl wait invocation (e.g., use kubectl wait
--for=condition=Ready --timeout=<duration> on the appropriate resource) and then
clarify the follow-up delete step so it explicitly runs kubectl delete namespace
{cluster_id} regardless of wait result; reference and update the "Wait for
cluster Ready with timeout (best-effort)" heading and the subsequent kubectl
delete namespace {cluster_id} line to include the wait command, timeout
parameter, and a note that delete runs irrespective of the wait outcome.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b11ad4 and 709f5c4.

📒 Files selected for processing (2)

deploy-scripts/lib/sentinel.sh
test-design/testcases/nodepool.md

test-design/testcases/nodepool.md

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

test-design/testcases/nodepool.md (1)
68-71: ⚠️ Potential issue | 🟠 Major

Polling step still lacks concrete interval/timeout mechanics.

Line 68 says “poll … or timeout,” but the example is a single GET. Please include an explicit bounded loop (interval + max duration + fail condition) so execution is deterministic.
Proposed doc patch
-Poll adapter statuses until all required adapters report `Applied/Available/Health=True` or timeout:
+Poll adapter statuses until all required adapters report `Applied/Available/Health=True` or timeout (e.g., every 10s for up to 5m):
 ```bash
-curl -X GET ${API_URL}/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses
+timeout 300 bash -c '
+  while true; do
+    resp=$(curl -s -X GET ${API_URL}/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses)
+    # assert all required adapters have Applied/Available/Health == True
+    # break when converged; otherwise sleep and retry
+    sleep 10
+  done
+' || { echo "Timed out waiting for adapter convergence"; exit 1; }
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @test-design/testcases/nodepool.md around lines 68 - 71, The polling doc step
uses a single GET and lacks deterministic retry mechanics; update the nodepool
status polling example (the curl GET to
/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses) to
show a bounded loop that retries at a fixed interval (e.g., INTERVAL seconds) up
to a maximum timeout (e.g., MAX_DURATION seconds), inspects the JSON response
for each required adapter having Applied/Available/Health == true, breaks when
all are true, and returns a non-zero failure (with an explanatory message) if
the max duration is exceeded; explicitly name the variables INTERVAL and
MAX_DURATION in the example and show the sleep/retry and fail condition so
execution is deterministic.
</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🧹 Nitpick comments (1)</summary><blockquote>

<details>
<summary>test-design/testcases/nodepool.md (1)</summary><blockquote>

`174-175`: **Adapter source should be documented consistently across both test cases.**

This section references only `configs/config.yaml`; consider also mentioning `HYPERFLEET_ADAPTERS_NODEPOOL` here (as in the workflow test) to avoid config-source ambiguity.

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @test-design/testcases/nodepool.md around lines 174 - 175, Update the
documentation in nodepool.md to consistently cite both possible adapter
configuration sources by adding a mention that adapters can be defined either in
configs/config.yaml under adapters.nodepool or via the
HYPERFLEET_ADAPTERS_NODEPOOL environment variable (mirror the wording used in
the workflow test), and adjust the sentence describing the np-configmap adapter
to reference both sources so readers know where to look for adapter
configuration.
</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @test-design/testcases/nodepool.md:

Around line 219-222: Add a runnable kubectl wait command before the kubectl
delete namespace line so the doc actually performs a best-effort wait with
timeout; specifically replace or augment the placeholder with an explicit
command such as "kubectl wait --for=condition=Ready namespace/{cluster_id}
--timeout=5m" (adjust timeout as needed) and then keep the existing "kubectl
delete namespace {cluster_id}" line so implementers can wait for Ready with a
timeout and still clean up.

Duplicate comments:
In @test-design/testcases/nodepool.md:

Around line 68-71: The polling doc step uses a single GET and lacks
deterministic retry mechanics; update the nodepool status polling example (the
curl GET to
/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses) to
show a bounded loop that retries at a fixed interval (e.g., INTERVAL seconds) up
to a maximum timeout (e.g., MAX_DURATION seconds), inspects the JSON response
for each required adapter having Applied/Available/Health == true, breaks when
all are true, and returns a non-zero failure (with an explanatory message) if
the max duration is exceeded; explicitly name the variables INTERVAL and
MAX_DURATION in the example and show the sleep/retry and fail condition so
execution is deterministic.

Nitpick comments:
In @test-design/testcases/nodepool.md:

Around line 174-175: Update the documentation in nodepool.md to consistently
cite both possible adapter configuration sources by adding a mention that
adapters can be defined either in configs/config.yaml under adapters.nodepool or
via the HYPERFLEET_ADAPTERS_NODEPOOL environment variable (mirror the wording
used in the workflow test), and adjust the sentence describing the np-configmap
adapter to reference both sources so readers know where to look for adapter
configuration.
</details>

---

<details>
<summary>ℹ️ Review info</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 709f5c499cece968eaa7fc82baa88f2c4a00f39f and 46adca05dc7a6ef20017e6d19f3905ec1b8fd209.

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `deploy-scripts/lib/sentinel.sh`
* `test-design/testcases/nodepool.md`

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary>

* deploy-scripts/lib/sentinel.sh

</details>

</details>

test-design/testcases/nodepool.md

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

test-design/testcases/nodepool.md (1)

95-99: Add explicit timeout/poll criteria for the Ready wait in Step 3.

Line 95 says “wait for transition” but doesn’t define bounds. Please specify polling interval + max timeout so automation can fail deterministically instead of hanging.

Suggested doc patch

-- Wait for nodepool Ready condition to transition to True
+- Poll nodepool `Ready` (and `Available`) every 10s for up to 5m; fail the test on timeout

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test-design/testcases/nodepool.md` around lines 95 - 99, Update Step 3's
"wait for nodepool Ready condition to transition to True" to include explicit
polling criteria: specify a polling interval (e.g., every 10s) and a maximum
timeout (e.g., 10 minutes) and state the exact success condition to check
(nodepool status Ready == True); also indicate the failure behavior when the
timeout is exceeded so automation can fail deterministically. Reference the Step
3 wait for Ready condition text in nodepool.md for where to add these polling
interval/timeout details.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test-design/testcases/nodepool.md`:
- Line 26: The document's metadata "Updated" date in the top section currently
reads 2026-02-28 and should be aligned with the rest of the PR (2026-03-02);
update the first section's Updated table cell (the row containing "**Updated** |
2026-02-28") to 2026-03-02 so it matches the later usage (e.g., the other
section that shows 2026-03-02).

---

Nitpick comments:
In `@test-design/testcases/nodepool.md`:
- Around line 95-99: Update Step 3's "wait for nodepool Ready condition to
transition to True" to include explicit polling criteria: specify a polling
interval (e.g., every 10s) and a maximum timeout (e.g., 10 minutes) and state
the exact success condition to check (nodepool status Ready == True); also
indicate the failure behavior when the timeout is exceeded so automation can
fail deterministically. Reference the Step 3 wait for Ready condition text in
nodepool.md for where to add these polling interval/timeout details.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 46adca0 and 2e5be0f.

📒 Files selected for processing (2)

deploy-scripts/lib/sentinel.sh
test-design/testcases/nodepool.md

test-design/testcases/nodepool.md

… and update with coderabbit review comment

coderabbitai

🧹 Nitpick comments (1)

test-design/testcases/nodepool.md (1)
109-124: Inconsistent cleanup approach between the two test cases.

This cleanup (Step 4) deletes the namespace without waiting for cluster Ready, while the second test's cleanup (Step 3, lines 214-224) includes a kubectl wait --for=condition=Ready --timeout=5m to "prevent namespace deletion conflicts."

If the wait is needed to prevent conflicts in the second test, consider applying the same safeguard here for consistency and reliability.
📝 Suggested alignment
 #### Step 4: Cleanup resources (AfterEach)

 **Action:**
-- Delete the namespace created for this cluster:
+- Wait for cluster Ready condition with timeout (best-effort), then delete the namespace:
 ```bash
+# Wait for cluster Ready with timeout (best-effort)
+kubectl wait --for=condition=Ready --timeout=5m cluster/{cluster_id} || \
+  echo "WARN: cluster did not become Ready within timeout; proceeding with cleanup"
+
+# Delete namespace regardless of Ready state
 kubectl delete namespace {cluster_id}
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @test-design/testcases/nodepool.md around lines 109 - 124, Step 4's cleanup
currently runs "kubectl delete namespace {cluster_id}" immediately; align it
with the other test by adding a best-effort wait for cluster readiness before
deleting the namespace: insert a "kubectl wait --for=condition=Ready
--timeout=5m cluster/{cluster_id} || echo 'WARN: cluster did not become Ready
within timeout; proceeding with cleanup'" immediately before the existing
"kubectl delete namespace {cluster_id}" so the deletion uses the same safeguard
as the second test.
</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @test-design/testcases/nodepool.md:

Around line 109-124: Step 4's cleanup currently runs "kubectl delete namespace
{cluster_id}" immediately; align it with the other test by adding a best-effort
wait for cluster readiness before deleting the namespace: insert a "kubectl wait
--for=condition=Ready --timeout=5m cluster/{cluster_id} || echo 'WARN: cluster
did not become Ready within timeout; proceeding with cleanup'" immediately
before the existing "kubectl delete namespace {cluster_id}" so the deletion uses
the same safeguard as the second test.
</details>

---

<details>
<summary>ℹ️ Review info</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 2e5be0f1a975304fc4a697d8f4717e2e3a80927d and d0045a6e3635cbe8438d990bcf323490b26f9926.

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `deploy-scripts/lib/sentinel.sh`
* `test-design/testcases/nodepool.md`

</details>

</details>

rafabene · 2026-03-04T15:12:24Z

e2e/nodepool/creation.go

+				openapi.ResourceConditionStatusTrue,
+				h.Cfg.Timeouts.Cluster.Ready,
+			)
+			Expect(err).NotTo(HaveOccurred(), "cluster Ready condition should transition to True before cleanup")


The Expect on line 247 will skip CleanupTestCluster if the cluster never
reaches Ready. In Ginkgo v2, a failed Expect panics within the closure, so
line 250 never runs — leaving orphaned namespaces behind.

The test-design doc already describes the right approach (best-effort
cleanup), but the code contradicts it. Suggestion:

Suggested change

Expect(err).NotTo(HaveOccurred(), "cluster Ready condition should transition to True before cleanup")

if err != nil {

ginkgo.GinkgoWriter.Printf("WARNING: cluster %s did not reach Ready state before cleanup: %v\n", clusterID, err)

}

This way cleanup always runs regardless of cluster state.

rafabene · 2026-03-04T15:14:08Z

e2e/nodepool/creation.go

+					ginkgo.By("Verify initial status of nodepool")
+					// Verify initial conditions are False, indicating workflow has not completed yet
+					// This ensures the nodepool starts in the correct initial state
+					np, err := h.Client.GetNodePool(ctx, clusterID, nodepoolID)


The initial state check (lines 53-65) does a single GetNodePool without an
Eventually wrapper. Two failure modes:

Conditions not yet populated: If the API returns before setting initial
conditions, HasResourceCondition returns false — but the error message blames
the condition value, not its absence.

Race with fast adapters: In a fast environment, conditions could already be
True.

Consider wrapping this in an Eventually with a short timeout, or at minimum,
first check that the conditions slice is non-empty before asserting their
values:

Eventually(func(g Gomega) { np, err := h.Client.GetNodePool(ctx, clusterID, nodepoolID) g.Expect(err).NotTo(HaveOccurred()) g.Expect(np.Status).NotTo(BeNil()) g.Expect(np.Status.Conditions).NotTo(BeEmpty(), "conditions should be populated") hasReadyFalse := h.HasResourceCondition(np.Status.Conditions, client.ConditionTypeReady, openapi.ResourceConditionStatusFalse) g.Expect(hasReadyFalse).To(BeTrue(), "initial Ready should be False") }, "10s", h.Cfg.Polling.Interval).Should(Succeed())

rafabene · 2026-03-04T15:15:40Z

testdata/README.md still references nodepools/gcp.json in 3 places (lines 28,
39, 87) but the file was renamed to nodepool-request.json in this PR. These
should be updated to match:

Line 28: gcp.json → nodepool-request.json
Line 39: "testdata/payloads/nodepools/gcp.json" →
"testdata/payloads/nodepools/nodepool-request.json"
Line 87: nodepools/gcp.json → nodepools/nodepool-request.json

rafabene · 2026-03-04T15:19:12Z

test-design/testcases/nodepool.md

+    - `hyperfleet.io/resource-type`: "configmap"
+  - ConfigMap has correct annotations:
+    - `hyperfleet.io/generation`: "1" (for new creation request)
+  - ConfigMap contains expected data fields from template (nodepool.id, nodepool.name, cluster.id, etc.)


The expected result for Step 1 of the K8s Resource Check test case says:

ConfigMap contains expected data fields from template (nodepool.id, nodepool.name, cluster.id, etc.)

But VerifyConfigMap only checks labels and annotations — it doesn't inspect .data at all. Two options:

Add data field verification to VerifyConfigMap (or a new helper) and call it from the test

Remove the claim from the test-design doc if data verification is out ofscope for now

Either way, the doc and code should match.

yingzhanredhat · 2026-03-05T01:59:27Z

e2e/nodepool/creation.go

+								"adapter %s should have valid created_time", adapter.Adapter)
+							g.Expect(adapter.LastReportTime).NotTo(BeZero(),
+								"adapter %s should have valid last_report_time", adapter.Adapter)
+							g.Expect(adapter.ObservedGeneration).To(Equal(int32(1)),


I think we can put the fixed generation value to constants.go too like Condition types. So it is helpful for futher maintenance

yingzhanredhat · 2026-03-05T02:04:35Z

e2e/nodepool/creation.go

+
+					hasAvailable := h.HasResourceCondition(finalNodePool.Status.Conditions,
+						client.ConditionTypeAvailable, openapi.ResourceConditionStatusTrue)
+					Expect(hasAvailable).To(BeTrue(), "nodepool should have Available=True condition")


In nodepool API endpoint level, only the two condition type as expected? I think we should add the checkpoint to verify only the expected condition(Ready and Avaiable) in list.Same for cluster case too.

HYPERFLEET-612, HYPERFLEET-613 - test: automate test cases for nodepo…

2b11ad4

…ol resources, including basic workflow and Kubernetes resource validation

openshift-ci bot requested review from rh-amarin and xueli181114 March 2, 2026 08:27