Skip to content

fix(controller): handle multi-container pod exit codes#5460

Open
avinxshKD wants to merge 1 commit into
volcano-sh:masterfrom
avinxshKD:fix/vcjob-exitcode-multicontainer
Open

fix(controller): handle multi-container pod exit codes#5460
avinxshKD wants to merge 1 commit into
volcano-sh:masterfrom
avinxshKD:fix/vcjob-exitcode-multicontainer

Conversation

@avinxshKD

Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

vcjob pod failure handling only used ContainerStatuses[0] when setting the exit code for lifecycle policy matching.

That breaks multi-container pods when the first container exits successfully and another regular container fails with the configured exitCode.

This PR scans all terminated regular container statuses, records non-zero exit codes, and lets LifecyclePolicy.ExitCode match any of them while keeping the existing policy order.

Init containers are intentionally not included here, per the issue discussion.

Which issue(s) this PR fixes:

Fixes #5452

Special notes for your reviewer:

Kept ExitCode for existing request behavior and added internal multi-code matching without changing the public API.

Does this PR introduce a user-facing change?

Fix vcjob lifecycle exitCode policy matching for failed multi-container pods.

Signed-off-by: Avinash Kumar Deepak <avinash8655279@gmail.com>
@volcano-sh-bot volcano-sh-bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 17, 2026
@volcano-sh-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hzxuzhonghu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 17, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for handling multiple non-zero exit codes from container statuses within a pod by adding an ExitCodes field to the Request struct and updating policy matching logic. The reviewer's feedback consistently recommends representing these exit codes as a slice of integers ([]int32) rather than a comma-separated string. This change would make the implementation more type-safe, efficient, and idiomatic in Go, eliminating unnecessary string allocations and parsing overhead across the controller logic and unit tests.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread pkg/controllers/apis/request.go
Comment thread pkg/controllers/job/job_controller_handler.go
Comment thread pkg/controllers/job/job_controller_handler.go
Comment thread pkg/controllers/job/job_controller_util.go
Comment thread pkg/controllers/job/job_controller_util_test.go
Comment thread pkg/controllers/job/job_controller_util_test.go
@avinxshKD

Copy link
Copy Markdown
Contributor Author

Hey @JesseStutler pls take a look, Kept the fix scoped to regular containers only, as discussed earlier. ExitCodes is kept as an internal string because apis.Request is used as a workqueue item and must stay comparable.

cc @hzxuzhonghu @kingeasternsun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vcjob exitCode policy only checks the first container

2 participants