feat: add nodegroup resource limits#5461
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds per-queue, per-nodegroup resource limits to the nodegroup scheduler plugin using a queue annotation, with supporting tests and documentation.
Changes:
- Parse
volcano.sh/nodegroup-resource-limitsfrom queue annotations and enforce limits during predicate evaluation. - Track per-(queue,nodegroup) allocated resources across session open and allocation/deallocation events.
- Add unit tests and user-guide documentation for the new annotation behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/scheduler/plugins/nodegroup/nodegroup.go | Implements annotation parsing, allocation accounting, and predicate-time enforcement of per-nodegroup limits. |
| pkg/scheduler/plugins/nodegroup/errors.go | Adds new scheduling failure reasons for invalid/exceeded resource limits. |
| pkg/scheduler/plugins/nodegroup/nodegroup_test.go | Adds tests for limit enforcement and allocation event updates. |
| docs/user-guide/how_to_use_nodegroup_plugin.md | Documents the new queue annotation format and scheduling semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return fmt.Errorf("job %s not found in session", task.Job) | ||
| } | ||
| attr := np.queueAttrs[job.Queue] | ||
| if attr.resourceLimitErr != nil { | ||
| return newFitErr(task, node, errNodeGroupResourceLimitInvalid) | ||
| } |
There was a problem hiding this comment.
Fixed in the latest push by making the resourceLimitErr check nil-safe before reading it.
| err := ssn.PredicateFn(targetTask, node) | ||
| code := api.Success | ||
| if err != nil { | ||
| code = err.(*api.FitError).Status[0].Code | ||
| } |
There was a problem hiding this comment.
Fixed in the latest push by using errors.As and failing the test with the unexpected error type.
| for _, job := range ssn.Jobs { | ||
| for _, task := range job.Tasks { | ||
| if task.Name == test.targetTaskName { | ||
| targetTask = task | ||
| break | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Fixed in the latest push with a labeled break once the target task is found.
There was a problem hiding this comment.
Code Review
This pull request introduces a feature to configure queue resource limits for each nodegroup via the volcano.sh/nodegroup-resource-limits annotation, updating the documentation, scheduler plugin logic, and adding corresponding unit tests. The review feedback highlights a potential nil pointer dereference in nodegroup.go where attr.resourceLimitErr is accessed without a nil check on attr.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| return fmt.Errorf("job %s not found in session", task.Job) | ||
| } | ||
| attr := np.queueAttrs[job.Queue] | ||
| if attr.resourceLimitErr != nil { |
There was a problem hiding this comment.
A nil pointer dereference will occur here if attr is nil. In Volcano, job.Queue might not always be present in np.queueAttrs (for example, if the queue is not found in the session), which is why other parts of this plugin explicitly check for attr == nil (e.g., line 421).
Please add a nil check for attr before accessing attr.resourceLimitErr.
| if attr.resourceLimitErr != nil { | |
| if attr != nil && attr.resourceLimitErr != nil { |
There was a problem hiding this comment.
Fixed in the latest push by guarding attr before checking resourceLimitErr.
ab2708f to
9e0926b
Compare
| limit := attr.nodeGroupResourceLimits[group] | ||
| if limit == nil { | ||
| return nil | ||
| } | ||
|
|
||
| allocated := attr.nodeGroupAllocated[group] | ||
| if allocated == nil { | ||
| allocated = api.EmptyResource() | ||
| } | ||
| futureUsed := allocated.Clone().Add(task.Resreq) | ||
| if ok, resourceNames := futureUsed.LessEqualWithDimensionAndResourcesName(limit, task.Resreq); !ok { | ||
| return fmt.Errorf("nodegroup %s resource limit exceeded, insufficient resources: %v", group, resourceNames) | ||
| } |
There was a problem hiding this comment.
Fixed in the latest push by preserving the resource keys explicitly configured for each nodegroup and comparing only those dimensions. Added CPU-only limit tests for omitted memory and CPU enforcement.
| if !errors.As(err, &fitErr) { | ||
| t.Fatalf("expected *api.FitError, got %T: %v", err, err) | ||
| } | ||
| return fitErr.Status[0].Code |
There was a problem hiding this comment.
Fixed in the latest push by checking that FitError.Status is non-empty before reading the first status code.
9e0926b to
34a4c29
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
34a4c29 to
75f9ffb
Compare
hajnalmt
left a comment
There was a problem hiding this comment.
/area scheduling
Thanks for the PR, cool feature, this is a nice addition to the nodegroup plugin. The limit accounting path is now much clearer, the edge cases from the previous bot comments look covered, and the docs/tests make the new Queue annotation behavior easy to follow. I left one API-surface comment about where the public volcano.sh/* keys should live before this gets finalized.
Great work!
| NodeGroupNameKey = "volcano.sh/nodegroup-name" | ||
| rootQueueID = "root" | ||
|
|
||
| nodeGroupResourceLimitsAnnotationKey = "volcano.sh/nodegroup-resource-limits" |
There was a problem hiding this comment.
Since both of these are user-facing volcano.sh/* keys, can we move them to the official scheduling API constants under staging/src/volcano.sh/apis/pkg/apis/scheduling/v1beta1/labels.go and reference them from here? NodeGroupNameKey is already used as an external node label key, and volcano.sh/nodegroup-resource-limits is now a documented Queue annotation, so keeping the literals private to the plugin makes the public API surface harder to discover and easier to duplicate later.
There was a problem hiding this comment.
Yes export this as a const in the staging/src/volcano.sh/apis just as mate said will be better
There was a problem hiding this comment.
Moved both user-facing keys to scheduling v1beta1 API constants in labels.go and updated the nodegroup plugin/tests to reference schedulingv1.NodeGroupNameKey and schedulingv1.NodeGroupResourceLimitsAnnotationKey.
|
/cc Will take a look in recent days |
|
@JesseStutler: GitHub didn't allow me to request PR reviews from the following users: in, recent, days, Will, take, a, look. Note that only volcano-sh members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: ruanwenjun <wenjun@apache.org>
75f9ffb to
f3a1bf1
Compare
Summary
volcano.sh/nodegroup-resource-limitsqueue annotation to cap queue usage per nodegroup.queue + nodegroupin the nodegroup plugin and reject nodes when future usage would exceed the configured limit.Closes #4954
Test Plan
go test ./pkg/scheduler/plugins/nodegroup