HYPERFLEET-612, HYPERFLEET-613 - test: automate test cases for nodepool resources.#36
HYPERFLEET-612, HYPERFLEET-613 - test: automate test cases for nodepool resources.#3686254860 wants to merge 2 commits intoopenshift-hyperfleet:mainfrom
Conversation
…ol resources, including basic workflow and Kubernetes resource validation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughRestructures E2E nodepool tests (new Describe blocks, BeforeEach/AfterEach using context, updated payloads and nodepool naming), adds Kubernetes client methods to list ConfigMaps by labels and to return a unique ConfigMap, and introduces Helper.VerifyConfigMap to validate ConfigMap labels/annotations. Renames helper AdapterNameToClusterConditionType → AdapterNameToConditionType and updates callers. Also updates testdata (adapter CEL expression and payload label changes) and a Helm value key in a deploy script. Final-state and adapter-specific validations were added to tests. Sequence Diagram(s)sequenceDiagram
participant TestRunner as Test Runner
participant Helper as Helper.VerifyConfigMap
participant Client as K8s Client
participant APIServer as Kubernetes API Server
TestRunner->>Helper: VerifyConfigMap(ctx, namespace, expectedLabels, expectedAnnotations)
Helper->>Client: GetUniqueConfigMapByLabels(ctx, namespace, labels)
Client->>APIServer: List ConfigMaps?namespace=...&labelSelector=...
APIServer-->>Client: []ConfigMap
Client-->>Helper: []ConfigMap
Helper->>Helper: ensure single result, validate annotations
Helper-->>TestRunner: success / error
sequenceDiagram
participant TestRunner as Test Runner
participant HyperfleetAPI as Hyperfleet API
participant Adapters as Adapters
participant ConditionHelper as AdapterNameToConditionType
TestRunner->>HyperfleetAPI: POST clusters/{cid}/nodepools (create)
HyperfleetAPI-->>TestRunner: 202 Accepted (nodepool id)
loop poll status
TestRunner->>HyperfleetAPI: GET clusters/{cid}/nodepools/{nid}
HyperfleetAPI->>Adapters: trigger adapters (configmap, job, etc.)
Adapters-->>HyperfleetAPI: per-adapter status updates
HyperfleetAPI-->>TestRunner: nodepool status with per-adapter conditions
end
TestRunner->>ConditionHelper: AdapterNameToConditionType(adapterName)
ConditionHelper-->>TestRunner: condition type
TestRunner->>HyperfleetAPI: final GET clusters/{cid}/nodepools/{nid} (verify observedGeneration, Ready/Available)
HyperfleetAPI-->>TestRunner: final state
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
e2e/nodepool/creation.go (1)
186-227: Test will fail if new nodepool adapters are configured without adding verifiers.The test explicitly fails when an adapter in
h.Cfg.Adapters.NodePoollacks a corresponding verifier (lines 215-217). This is intentional fail-fast behavior, but consider documenting this requirement in the test comments or adding a note that new adapters require verifier additions here.Currently only
"np-configmap"has a verifier defined. If additional nodepool adapters are configured in the future, this test will fail until verifiers are added.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@e2e/nodepool/creation.go` around lines 186 - 227, The test will fail if h.Cfg.Adapters.NodePool contains an adapter without a verifier in adapterResourceVerifiers; add a clear comment above the adapterResourceVerifiers map explaining that every nodepool adapter listed in h.Cfg.Adapters.NodePool must have a corresponding entry in adapterResourceVerifiers (or tests will ginkgo.Fail when the verifier is missing), and include guidance to update adapterResourceVerifiers when adding new adapters (referencing adapterResourceVerifiers, h.Cfg.Adapters.NodePool, and the ginkgo.Fail check that enforces this).test-design/testcases/nodepool.md (1)
54-64: Initial-state assertion is too strict and may be flaky.As written, this requires both
Ready=FalseandAvailable=Falseon the first observed GET. In fast environments, conditions can transition before first poll and cause false failures. Prefer asserting “not yet terminal success” with bounded polling/timing semantics.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-design/testcases/nodepool.md` around lines 54 - 64, Step 1's initial assertion in "Step 1: Verify initial status of nodepool" is too strict (requires Ready=False and Available=False on the first GET); change it to perform bounded polling of the GET ${API_URL}/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id} with a short retry loop and timeout and assert only that the nodepool is not yet in a terminal-success state (i.e., do not require both Ready=False and Available=False on first observation — instead fail only if Ready=True && Available=True immediately or if the terminal-success state is observed persistently after the timeout); update the expected-result text to describe the polling behavior and the relaxed condition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test-design/testcases/nodepool.md`:
- Around line 88-90: The document uses two different config paths
(`configs/config.yaml` vs `config.yaml`) causing ambiguity; pick one canonical
path (e.g., `configs/config.yaml`) and update every occurrence in this file to
match it, including the note that mentions `adapters.nodepool` and the
environment variable `HYPERFLEET_ADAPTERS_NODEPOOL`, and also update the
references around lines noted (174-175) so all mentions are identical and
consistent.
- Around line 214-218: The "Wait for cluster Ready" step can hang indefinitely;
add an explicit timeout and fallback so namespace deletion proceeds even if
Ready isn't reached: change the wait logic that blocks before running "kubectl
delete namespace {cluster_id}" to use a bounded wait (e.g., kubectl wait
--for=condition=Ready --timeout=<reasonable-duration> or equivalent polling with
a deadline) and if the timeout is reached log a warning and continue with a
best-effort cleanup by executing the namespace deletion; update the instructions
for the "Wait for cluster Ready" step and the subsequent "kubectl delete
namespace {cluster_id}" step to reflect the timeout and fallback behavior.
---
Nitpick comments:
In `@e2e/nodepool/creation.go`:
- Around line 186-227: The test will fail if h.Cfg.Adapters.NodePool contains an
adapter without a verifier in adapterResourceVerifiers; add a clear comment
above the adapterResourceVerifiers map explaining that every nodepool adapter
listed in h.Cfg.Adapters.NodePool must have a corresponding entry in
adapterResourceVerifiers (or tests will ginkgo.Fail when the verifier is
missing), and include guidance to update adapterResourceVerifiers when adding
new adapters (referencing adapterResourceVerifiers, h.Cfg.Adapters.NodePool, and
the ginkgo.Fail check that enforces this).
In `@test-design/testcases/nodepool.md`:
- Around line 54-64: Step 1's initial assertion in "Step 1: Verify initial
status of nodepool" is too strict (requires Ready=False and Available=False on
the first GET); change it to perform bounded polling of the GET
${API_URL}/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id} with
a short retry loop and timeout and assert only that the nodepool is not yet in a
terminal-success state (i.e., do not require both Ready=False and
Available=False on first observation — instead fail only if Ready=True &&
Available=True immediately or if the terminal-success state is observed
persistently after the timeout); update the expected-result text to describe the
polling behavior and the relaxed condition.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
e2e/cluster/creation.goe2e/nodepool/creation.gopkg/client/kubernetes/client.gopkg/helper/k8s.gopkg/helper/validation.gotest-design/testcases/nodepool.mdtestdata/adapter-configs/clusters-cl-job/adapter-task-config.yamltestdata/payloads/clusters/cluster-request.jsontestdata/payloads/nodepools/nodepool-request.json
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
test-design/testcases/nodepool.md (2)
112-115:⚠️ Potential issue | 🟠 MajorAdd bounded/best-effort cleanup in workflow test teardown.
Line 114 can still block teardown. Align this scenario with the timeout/fallback cleanup strategy used later in the file.
Suggested doc update
- kubectl delete namespace {cluster_id} + kubectl delete namespace {cluster_id} --wait=false + # Optionally: poll namespace deletion with a bounded timeout and log warning on timeout🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-design/testcases/nodepool.md` around lines 112 - 115, The current teardown issues with a blocking "kubectl delete namespace {cluster_id}" call; change it to a bounded/fallback cleanup like the later workflow: initiate deletion (kubectl delete namespace {cluster_id} --wait=false), poll for namespace termination with a timeout (e.g., 60s), and if the namespace still exists after timeout perform a forced fallback (kubectl delete namespace {cluster_id} --grace-period=0 --force or remove finalizers via kubectl patch) and treat failures as non-fatal so teardown proceeds; update the teardown section replacing the single blocking delete with this timeout+fallback pattern referencing the existing "kubectl delete namespace {cluster_id}" command and the test teardown logic.
219-223:⚠️ Potential issue | 🟡 MinorCleanup command snippet is incomplete.
Line 219 through Line 223 describes timeout behavior but does not provide the actual wait command, making the procedure non-actionable.
Suggested doc update
# Wait for cluster Ready with timeout (best-effort) +kubectl wait --for=condition=Ready "cluster/{cluster_id}" --timeout=5m || \ + echo "WARN: cluster not Ready within timeout; continuing with best-effort cleanup" # Delete namespace regardless of Ready state kubectl delete namespace {cluster_id}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-design/testcases/nodepool.md` around lines 219 - 223, The doc has a placeholder "Wait for cluster Ready with timeout (best-effort)" but no concrete wait command; update the section so it shows the actual kubectl wait invocation (e.g., use kubectl wait --for=condition=Ready --timeout=<duration> on the appropriate resource) and then clarify the follow-up delete step so it explicitly runs kubectl delete namespace {cluster_id} regardless of wait result; reference and update the "Wait for cluster Ready with timeout (best-effort)" heading and the subsequent kubectl delete namespace {cluster_id} line to include the wait command, timeout parameter, and a note that delete runs irrespective of the wait outcome.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test-design/testcases/nodepool.md`:
- Around line 62-63: The tests assert brittle intermediate `False` for NodePool
conditions: change the checks against NodePool `Ready` and `Available`
(currently asserting `status: False` at lines referencing the initial state and
the transition at lines ~102-103) to resilient assertions that wait for the
desired state or assert idempotent transitions; specifically, replace exact
initial `status: False` assertions with a wait-for-condition helper (e.g.,
waitForNodePoolCondition or assertEventuallyTrue/assertNotTrue) that polls the
NodePool's `Ready` and `Available` conditions and accepts either already-True or
a False->True transition, and update the tests that check the explicit `False ->
True` transition to assert that the condition becomes True within a timeout
rather than requiring the intermediate False.
- Around line 67-71: The doc currently issues a single GET to
/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses but
then assumes adapter conditions are fully converged (lines 75–87); change the
test steps to use bounded polling: repeatedly GET that endpoint with a fixed
interval and max timeout (e.g., poll every 5s up to 2–5 minutes or X retries)
and stop when the expected adapter conditions are observed or the timeout is
reached, failing the test if not converged; update the nodepool.md instructions
to explicitly document the retry interval, max retries/timeout, convergence
criterion, and an example curl-based loop or reference to the test harness retry
utility so the check is deterministic.
---
Duplicate comments:
In `@test-design/testcases/nodepool.md`:
- Around line 112-115: The current teardown issues with a blocking "kubectl
delete namespace {cluster_id}" call; change it to a bounded/fallback cleanup
like the later workflow: initiate deletion (kubectl delete namespace
{cluster_id} --wait=false), poll for namespace termination with a timeout (e.g.,
60s), and if the namespace still exists after timeout perform a forced fallback
(kubectl delete namespace {cluster_id} --grace-period=0 --force or remove
finalizers via kubectl patch) and treat failures as non-fatal so teardown
proceeds; update the teardown section replacing the single blocking delete with
this timeout+fallback pattern referencing the existing "kubectl delete namespace
{cluster_id}" command and the test teardown logic.
- Around line 219-223: The doc has a placeholder "Wait for cluster Ready with
timeout (best-effort)" but no concrete wait command; update the section so it
shows the actual kubectl wait invocation (e.g., use kubectl wait
--for=condition=Ready --timeout=<duration> on the appropriate resource) and then
clarify the follow-up delete step so it explicitly runs kubectl delete namespace
{cluster_id} regardless of wait result; reference and update the "Wait for
cluster Ready with timeout (best-effort)" heading and the subsequent kubectl
delete namespace {cluster_id} line to include the wait command, timeout
parameter, and a note that delete runs irrespective of the wait outcome.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
test-design/testcases/nodepool.md (1)
68-71:⚠️ Potential issue | 🟠 MajorPolling step still lacks concrete interval/timeout mechanics.
Line 68 says “poll … or timeout,” but the example is a single GET. Please include an explicit bounded loop (interval + max duration + fail condition) so execution is deterministic.
Proposed doc patch
-Poll adapter statuses until all required adapters report `Applied/Available/Health=True` or timeout: +Poll adapter statuses until all required adapters report `Applied/Available/Health=True` or timeout (e.g., every 10s for up to 5m): ```bash -curl -X GET ${API_URL}/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses +timeout 300 bash -c ' + while true; do + resp=$(curl -s -X GET ${API_URL}/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses) + # assert all required adapters have Applied/Available/Health == True + # break when converged; otherwise sleep and retry + sleep 10 + done +' || { echo "Timed out waiting for adapter convergence"; exit 1; }</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@test-design/testcases/nodepool.mdaround lines 68 - 71, The polling doc step
uses a single GET and lacks deterministic retry mechanics; update the nodepool
status polling example (the curl GET to
/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses) to
show a bounded loop that retries at a fixed interval (e.g., INTERVAL seconds) up
to a maximum timeout (e.g., MAX_DURATION seconds), inspects the JSON response
for each required adapter having Applied/Available/Health == true, breaks when
all are true, and returns a non-zero failure (with an explanatory message) if
the max duration is exceeded; explicitly name the variables INTERVAL and
MAX_DURATION in the example and show the sleep/retry and fail condition so
execution is deterministic.</details> </blockquote></details> </blockquote></details> <details> <summary>🧹 Nitpick comments (1)</summary><blockquote> <details> <summary>test-design/testcases/nodepool.md (1)</summary><blockquote> `174-175`: **Adapter source should be documented consistently across both test cases.** This section references only `configs/config.yaml`; consider also mentioning `HYPERFLEET_ADAPTERS_NODEPOOL` here (as in the workflow test) to avoid config-source ambiguity. <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@test-design/testcases/nodepool.mdaround lines 174 - 175, Update the
documentation in nodepool.md to consistently cite both possible adapter
configuration sources by adding a mention that adapters can be defined either in
configs/config.yaml under adapters.nodepool or via the
HYPERFLEET_ADAPTERS_NODEPOOL environment variable (mirror the wording used in
the workflow test), and adjust the sentence describing the np-configmap adapter
to reference both sources so readers know where to look for adapter
configuration.</details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Inline comments:
In@test-design/testcases/nodepool.md:
- Around line 219-222: Add a runnable kubectl wait command before the kubectl
delete namespace line so the doc actually performs a best-effort wait with
timeout; specifically replace or augment the placeholder with an explicit
command such as "kubectl wait --for=condition=Ready namespace/{cluster_id}
--timeout=5m" (adjust timeout as needed) and then keep the existing "kubectl
delete namespace {cluster_id}" line so implementers can wait for Ready with a
timeout and still clean up.
Duplicate comments:
In@test-design/testcases/nodepool.md:
- Around line 68-71: The polling doc step uses a single GET and lacks
deterministic retry mechanics; update the nodepool status polling example (the
curl GET to
/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses) to
show a bounded loop that retries at a fixed interval (e.g., INTERVAL seconds) up
to a maximum timeout (e.g., MAX_DURATION seconds), inspects the JSON response
for each required adapter having Applied/Available/Health == true, breaks when
all are true, and returns a non-zero failure (with an explanatory message) if
the max duration is exceeded; explicitly name the variables INTERVAL and
MAX_DURATION in the example and show the sleep/retry and fail condition so
execution is deterministic.
Nitpick comments:
In@test-design/testcases/nodepool.md:
- Around line 174-175: Update the documentation in nodepool.md to consistently
cite both possible adapter configuration sources by adding a mention that
adapters can be defined either in configs/config.yaml under adapters.nodepool or
via the HYPERFLEET_ADAPTERS_NODEPOOL environment variable (mirror the wording
used in the workflow test), and adjust the sentence describing the np-configmap
adapter to reference both sources so readers know where to look for adapter
configuration.</details> --- <details> <summary>ℹ️ Review info</summary> **Configuration used**: Organization UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 709f5c499cece968eaa7fc82baa88f2c4a00f39f and 46adca05dc7a6ef20017e6d19f3905ec1b8fd209. </details> <details> <summary>📒 Files selected for processing (2)</summary> * `deploy-scripts/lib/sentinel.sh` * `test-design/testcases/nodepool.md` </details> <details> <summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary> * deploy-scripts/lib/sentinel.sh </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test-design/testcases/nodepool.md (1)
95-99: Add explicit timeout/poll criteria for the Ready wait in Step 3.Line 95 says “wait for transition” but doesn’t define bounds. Please specify polling interval + max timeout so automation can fail deterministically instead of hanging.
Suggested doc patch
-- Wait for nodepool Ready condition to transition to True +- Poll nodepool `Ready` (and `Available`) every 10s for up to 5m; fail the test on timeout🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-design/testcases/nodepool.md` around lines 95 - 99, Update Step 3's "wait for nodepool Ready condition to transition to True" to include explicit polling criteria: specify a polling interval (e.g., every 10s) and a maximum timeout (e.g., 10 minutes) and state the exact success condition to check (nodepool status Ready == True); also indicate the failure behavior when the timeout is exceeded so automation can fail deterministically. Reference the Step 3 wait for Ready condition text in nodepool.md for where to add these polling interval/timeout details.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test-design/testcases/nodepool.md`:
- Line 26: The document's metadata "Updated" date in the top section currently
reads 2026-02-28 and should be aligned with the rest of the PR (2026-03-02);
update the first section's Updated table cell (the row containing "**Updated** |
2026-02-28") to 2026-03-02 so it matches the later usage (e.g., the other
section that shows 2026-03-02).
---
Nitpick comments:
In `@test-design/testcases/nodepool.md`:
- Around line 95-99: Update Step 3's "wait for nodepool Ready condition to
transition to True" to include explicit polling criteria: specify a polling
interval (e.g., every 10s) and a maximum timeout (e.g., 10 minutes) and state
the exact success condition to check (nodepool status Ready == True); also
indicate the failure behavior when the timeout is exceeded so automation can
fail deterministically. Reference the Step 3 wait for Ready condition text in
nodepool.md for where to add these polling interval/timeout details.
… and update with coderabbit review comment
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test-design/testcases/nodepool.md (1)
109-124: Inconsistent cleanup approach between the two test cases.This cleanup (Step 4) deletes the namespace without waiting for cluster Ready, while the second test's cleanup (Step 3, lines 214-224) includes a
kubectl wait --for=condition=Ready --timeout=5mto "prevent namespace deletion conflicts."If the wait is needed to prevent conflicts in the second test, consider applying the same safeguard here for consistency and reliability.
📝 Suggested alignment
#### Step 4: Cleanup resources (AfterEach) **Action:** -- Delete the namespace created for this cluster: +- Wait for cluster Ready condition with timeout (best-effort), then delete the namespace: ```bash +# Wait for cluster Ready with timeout (best-effort) +kubectl wait --for=condition=Ready --timeout=5m cluster/{cluster_id} || \ + echo "WARN: cluster did not become Ready within timeout; proceeding with cleanup" + +# Delete namespace regardless of Ready state kubectl delete namespace {cluster_id}</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@test-design/testcases/nodepool.mdaround lines 109 - 124, Step 4's cleanup
currently runs "kubectl delete namespace {cluster_id}" immediately; align it
with the other test by adding a best-effort wait for cluster readiness before
deleting the namespace: insert a "kubectl wait --for=condition=Ready
--timeout=5m cluster/{cluster_id} || echo 'WARN: cluster did not become Ready
within timeout; proceeding with cleanup'" immediately before the existing
"kubectl delete namespace {cluster_id}" so the deletion uses the same safeguard
as the second test.</details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In@test-design/testcases/nodepool.md:
- Around line 109-124: Step 4's cleanup currently runs "kubectl delete namespace
{cluster_id}" immediately; align it with the other test by adding a best-effort
wait for cluster readiness before deleting the namespace: insert a "kubectl wait
--for=condition=Ready --timeout=5m cluster/{cluster_id} || echo 'WARN: cluster
did not become Ready within timeout; proceeding with cleanup'" immediately
before the existing "kubectl delete namespace {cluster_id}" so the deletion uses
the same safeguard as the second test.</details> --- <details> <summary>ℹ️ Review info</summary> **Configuration used**: Organization UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 2e5be0f1a975304fc4a697d8f4717e2e3a80927d and d0045a6e3635cbe8438d990bcf323490b26f9926. </details> <details> <summary>📒 Files selected for processing (2)</summary> * `deploy-scripts/lib/sentinel.sh` * `test-design/testcases/nodepool.md` </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
| openapi.ResourceConditionStatusTrue, | ||
| h.Cfg.Timeouts.Cluster.Ready, | ||
| ) | ||
| Expect(err).NotTo(HaveOccurred(), "cluster Ready condition should transition to True before cleanup") |
There was a problem hiding this comment.
The Expect on line 247 will skip CleanupTestCluster if the cluster never
reaches Ready. In Ginkgo v2, a failed Expect panics within the closure, so
line 250 never runs — leaving orphaned namespaces behind.
The test-design doc already describes the right approach (best-effort
cleanup), but the code contradicts it. Suggestion:
| Expect(err).NotTo(HaveOccurred(), "cluster Ready condition should transition to True before cleanup") | |
| if err != nil { | |
| ginkgo.GinkgoWriter.Printf("WARNING: cluster %s did not reach Ready state before cleanup: %v\n", clusterID, err) | |
| } |
This way cleanup always runs regardless of cluster state.
| ginkgo.By("Verify initial status of nodepool") | ||
| // Verify initial conditions are False, indicating workflow has not completed yet | ||
| // This ensures the nodepool starts in the correct initial state | ||
| np, err := h.Client.GetNodePool(ctx, clusterID, nodepoolID) |
There was a problem hiding this comment.
The initial state check (lines 53-65) does a single GetNodePool without an
Eventually wrapper. Two failure modes:
- Conditions not yet populated: If the API returns before setting initial
conditions, HasResourceCondition returns false — but the error message blames
the condition value, not its absence. - Race with fast adapters: In a fast environment, conditions could already be
True.
Consider wrapping this in an Eventually with a short timeout, or at minimum,
first check that the conditions slice is non-empty before asserting their
values:
Eventually(func(g Gomega) {
np, err := h.Client.GetNodePool(ctx, clusterID, nodepoolID)
g.Expect(err).NotTo(HaveOccurred())
g.Expect(np.Status).NotTo(BeNil())
g.Expect(np.Status.Conditions).NotTo(BeEmpty(), "conditions should be
populated")
hasReadyFalse := h.HasResourceCondition(np.Status.Conditions,
client.ConditionTypeReady, openapi.ResourceConditionStatusFalse)
g.Expect(hasReadyFalse).To(BeTrue(), "initial Ready should be False")
}, "10s", h.Cfg.Polling.Interval).Should(Succeed())|
testdata/README.md still references nodepools/gcp.json in 3 places (lines 28,
|
| - `hyperfleet.io/resource-type`: "configmap" | ||
| - ConfigMap has correct annotations: | ||
| - `hyperfleet.io/generation`: "1" (for new creation request) | ||
| - ConfigMap contains expected data fields from template (nodepool.id, nodepool.name, cluster.id, etc.) |
There was a problem hiding this comment.
The expected result for Step 1 of the K8s Resource Check test case says:
ConfigMap contains expected data fields from template (nodepool.id, nodepool.name, cluster.id, etc.)
But VerifyConfigMap only checks labels and annotations — it doesn't inspect .data at all. Two options:
- Add data field verification to VerifyConfigMap (or a new helper) and call it from the test
- Remove the claim from the test-design doc if data verification is out ofscope for now
Either way, the doc and code should match.
| "adapter %s should have valid created_time", adapter.Adapter) | ||
| g.Expect(adapter.LastReportTime).NotTo(BeZero(), | ||
| "adapter %s should have valid last_report_time", adapter.Adapter) | ||
| g.Expect(adapter.ObservedGeneration).To(Equal(int32(1)), |
There was a problem hiding this comment.
I think we can put the fixed generation value to constants.go too like Condition types. So it is helpful for futher maintenance
|
|
||
| hasAvailable := h.HasResourceCondition(finalNodePool.Status.Conditions, | ||
| client.ConditionTypeAvailable, openapi.ResourceConditionStatusTrue) | ||
| Expect(hasAvailable).To(BeTrue(), "nodepool should have Available=True condition") |
There was a problem hiding this comment.
In nodepool API endpoint level, only the two condition type as expected? I think we should add the checkpoint to verify only the expected condition(Ready and Avaiable) in list.Same for cluster case too.
Summary
Testing
All 5 cases (2 new + 3 existing) passed with Angel’s Sentinel PR deployed.
This PR depends on the merge of Angel’s PR merge.
Summary by CodeRabbit
Tests
Chores
Documentation