Skip to content

fix(capacity): copy parent ancestors to avoid slice aliasing in hiera…#5502

Open
WHOIM1205 wants to merge 1 commit into
volcano-sh:masterfrom
WHOIM1205:fix/capacity-ancestors-aliasing
Open

fix(capacity): copy parent ancestors to avoid slice aliasing in hiera…#5502
WHOIM1205 wants to merge 1 commit into
volcano-sh:masterfrom
WHOIM1205:fix/capacity-ancestors-aliasing

Conversation

@WHOIM1205

Copy link
Copy Markdown
Contributor

Summary

Fix a slice-aliasing issue in hierarchical queue ancestor construction that could corrupt ancestor chains in deep queue hierarchies, leading to incorrect quota enforcement and resource accounting.

Problem

updateAncestors() built a child queue's ancestor chain using:

append(parent.ancestors, parentInfo.UID)

When the parent ancestor slice had spare capacity, Go reused the same backing array. As a result, sibling subtrees could unintentionally share memory, allowing later updates to overwrite ancestor entries belonging to other queues.

This could cause:

  • Incorrect hierarchical quota validation
  • Wrong enqueue admission decisions
  • Resource accounting being attributed to the wrong ancestor queue
  • Potential quota bypass or starvation scenarios in deep hierarchical queue trees

Root Cause

The ancestor list stored in queueOpts was reused via append(), which allowed sibling queues to alias the same backing array.

Since hierarchical capacity and enqueue checks rely on ancestor chains, corrupted ancestry could propagate into scheduling decisions.

Fix

Ancestor Construction

  • Replace direct append() usage in updateAncestors() with explicit slice allocation and copy.
  • Ensure every queue receives an independent ancestor chain.

Defensive Hardening

Avoid mutating stored ancestor slices in:

  • checkQueueAllocatableHierarchically()
  • checkJobEnqueueableHierarchically()

Both paths now construct temporary lists using fresh slices before appending queue IDs.

Testing

Added regression test:

Test_updateAncestors_noSliceAliasing

The test:

  1. Creates a deep hierarchical queue tree with multiple sibling branches.
  2. Builds ancestor chains using the scheduler logic.
  3. Verifies every queue receives the expected ancestor path.
  4. Ensures sibling branches cannot overwrite each other's ancestry.

The test reproduces the bug on the old implementation and passes with the fix.

Verification

  • ✅ Regression test passes
  • ✅ Regression test fails on the pre-fix implementation
  • ✅ Full capacity plugin test suite passes
  • go vet passes successfully

Impact

This change guarantees that ancestor chains remain isolated and immutable across sibling subtrees, restoring correct hierarchical quota enforcement, enqueue validation, and resource accounting for deep queue hierarchies while preserving existing behavior for shallow hierarchies.

…rchical queues

Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
Copilot AI review requested due to automatic review settings June 24, 2026 19:54
@volcano-sh-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hajnalmt for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 24, 2026
@WHOIM1205

Copy link
Copy Markdown
Contributor Author

Hi @hajnalmt , this PR includes a fix for the hierarchical queue ancestor slice-aliasing issue along with a regression test. All tests are passing. I'd appreciate a review when you have time. Thanks!

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a slice-aliasing bug in the capacity scheduler plugin where appending to a parent's ancestors slice could corrupt the ancestor chain in deep hierarchies due to shared backing arrays. The fix introduces manual slice allocation and copying to guarantee fresh backing arrays. The reviewer suggests using idiomatic 3-index slice expressions (slice[max:len:cap]) instead of verbose manual allocations and copies to simplify the code while still preventing slice-aliasing issues.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +1488 to +1492
parentAncestors := cp.queueOpts[parentInfo.UID].ancestors
ancestors := make([]api.QueueID, len(parentAncestors)+1)
copy(ancestors, parentAncestors)
ancestors[len(ancestors)-1] = parentInfo.UID
cp.queueOpts[queue.UID].ancestors = ancestors

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a 3-index slice expression (parentAncestors[:len(parentAncestors):len(parentAncestors)]) is a highly idiomatic and concise way in Go to append to a slice while guaranteeing that the original slice's backing array is never mutated. This completely avoids the need for manual slice allocation, copying, and indexing.

Suggested change
parentAncestors := cp.queueOpts[parentInfo.UID].ancestors
ancestors := make([]api.QueueID, len(parentAncestors)+1)
copy(ancestors, parentAncestors)
ancestors[len(ancestors)-1] = parentInfo.UID
cp.queueOpts[queue.UID].ancestors = ancestors
parentAncestors := cp.queueOpts[parentInfo.UID].ancestors
cp.queueOpts[queue.UID].ancestors = append(parentAncestors[:len(parentAncestors):len(parentAncestors)], parentInfo.UID)

Comment on lines +1692 to +1695
ancestors := cp.queueOpts[queue.UID].ancestors
list := make([]api.QueueID, 0, len(ancestors)+1)
list = append(list, ancestors...)
list = append(list, queue.UID)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a 3-index slice expression (ancestors[:len(ancestors):len(ancestors)]) is a highly idiomatic and concise way in Go to append to a slice while guaranteeing that the original slice's backing array is never mutated. This simplifies the verbose make and double append boilerplate into a single, clean operation.

Suggested change
ancestors := cp.queueOpts[queue.UID].ancestors
list := make([]api.QueueID, 0, len(ancestors)+1)
list = append(list, ancestors...)
list = append(list, queue.UID)
ancestors := cp.queueOpts[queue.UID].ancestors
list := append(ancestors[:len(ancestors):len(ancestors)], queue.UID)

Comment on lines +1743 to +1746
ancestors := cp.queueOpts[queue.UID].ancestors
list := make([]api.QueueID, 0, len(ancestors)+1)
list = append(list, ancestors...)
list = append(list, queue.UID)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a 3-index slice expression (ancestors[:len(ancestors):len(ancestors)]) is a highly idiomatic and concise way in Go to append to a slice while guaranteeing that the original slice's backing array is never mutated. This simplifies the verbose make and double append boilerplate into a single, clean operation.

Suggested change
ancestors := cp.queueOpts[queue.UID].ancestors
list := make([]api.QueueID, 0, len(ancestors)+1)
list = append(list, ancestors...)
list = append(list, queue.UID)
ancestors := cp.queueOpts[queue.UID].ancestors
list := append(ancestors[:len(ancestors):len(ancestors)], queue.UID)

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Go slice-aliasing bug in the capacity plugin’s hierarchical queue ancestor construction, preventing corruption of ancestor chains in deep queue trees that could lead to incorrect hierarchical quota enforcement and resource accounting.

Changes:

  • Rebuild ancestor chains in updateAncestors() using a fresh slice allocation + copy to avoid backing-array aliasing between sibling subtrees.
  • Avoid appending onto stored ancestors slices in hierarchical allocatable/enqueueable checks by building a separate list before traversal.
  • Add a regression test that constructs a deep, branching queue hierarchy and validates exact ancestor paths for each queue.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
pkg/scheduler/plugins/capacity/capacity.go Eliminates ancestor-slice backing-array aliasing and hardens hierarchical checks against accidental mutation.
pkg/scheduler/plugins/capacity/capacity_test.go Adds regression coverage for deep hierarchical queue ancestry correctness across sibling branches.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1692 to +1695
ancestors := cp.queueOpts[queue.UID].ancestors
list := make([]api.QueueID, 0, len(ancestors)+1)
list = append(list, ancestors...)
list = append(list, queue.UID)
Comment on lines 1739 to +1743
func (cp *capacityPlugin) checkJobEnqueueableHierarchically(ssn *framework.Session, queue *api.QueueInfo, job *api.JobInfo) bool {
// If hierarchical queue is not enabled, list will only contain the queue itself.
list := append(cp.queueOpts[queue.UID].ancestors, queue.UID)
// Do not append onto the stored ancestors slice: that could mutate its shared
// backing array. Copy into a fresh slice instead.
ancestors := cp.queueOpts[queue.UID].ancestors
Comment on lines +2136 to +2140
cp.queueOpts[uid] = cp.newQueueAttr(ssn.Queues[uid])
if err := cp.updateAncestors(ssn.Queues[uid], ssn, map[api.QueueID]struct{}{}); err != nil {
t.Fatalf("updateAncestors(%s) returned error: %v", s.name, err)
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants