Skip to content

Fix cronjob : return actual error when UpdateStatus fails in sync()#5454

Open
ManojLamani wants to merge 3 commits into
volcano-sh:masterfrom
ManojLamani:fix-cronjob-swallowed-status-error
Open

Fix cronjob : return actual error when UpdateStatus fails in sync()#5454
ManojLamani wants to merge 3 commits into
volcano-sh:masterfrom
ManojLamani:fix-cronjob-swallowed-status-error

Conversation

@ManojLamani

Copy link
Copy Markdown

What type of PR is this?
/kind bug

What this PR does / why we need it:
In sync() inside cronjob_controller.go, after syncCronJob() succeeds, an UpdateStatus failure incorrectly returned syncErr instead of the actual err.

Because syncErr is guaranteed to be nil at that point — a non-nil syncErr causes an early return on line 206 — the caller always received (nil, nil) on status update failure. This silently swallowed the error: no retry was triggered, no backoff occurred, and the work queue treated the sync as successful even when the CronJob status was not updated.

Before:

return nil, syncErr // syncErr is nil here — error is lost
After:

return nil, err // actual UpdateStatus error is returned
Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
This is a single-variable-name change. syncErr is provably nil at the point of the return statement because the block at lines 204–207 exits early if it is not. The bug caused every UpdateStatus failure to be silently swallowed.

Does this PR introduce a user-facing change?
Fixed CronJob status update failures being silently swallowed in the
controller sync loop. A failed UpdateStatus call now returns the actual
error so the work queue retries correctly.

Signed-off-by: Manoj Lamani <manoj.p24@medhaviskillsuniversity.edu.in>
Signed-off-by: Manoj Lamani <manoj.p24@medhaviskillsuniversity.edu.in>
In the sync() function, after syncCronJob() succeeds (syncErr is nil),
an UpdateStatus failure returned syncErr instead of the actual err.
Since syncErr is guaranteed nil at that point, the caller received
nil, nil and had no way to detect the status update failure.

Fixes: return nil, err so the real error is propagated to the caller.

Signed-off-by: Manoj Lamani <manoj.p24@medhaviskillsuniversity.edu.in>
Copilot AI review requested due to automatic review settings June 15, 2026 18:25
@volcano-sh-bot volcano-sh-bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 15, 2026
@volcano-sh-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lowang-bh for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 15, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Improves error handling across admission webhooks, scheduler cache informer wiring, and CronJob status updates to avoid silently ignoring failures.

Changes:

  • Propagate createPatch(...) errors in Pod/Job mutating webhooks instead of discarding them.
  • Capture and handle errors returned by AddEventHandler(...) when registering informer handlers.
  • Return the actual UpdateStatus(...) error from CronJob sync when status update fails.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pkg/webhooks/admission/pods/mutate/mutate_pod.go Stop ignoring patch creation failures and return an admission error response.
pkg/webhooks/admission/jobs/mutate/mutate_job.go Stop ignoring patch creation failures and return an admission error response.
pkg/scheduler/cache/cache.go Check informer handler registration errors to avoid silently running without handlers.
pkg/controllers/cronjob/cronjob_controller.go Return the correct error when UpdateStatus fails.
pkg/agentscheduler/cache/cache.go Check informer handler registration errors to avoid silently running without handlers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

func (sc *SchedulerCache) addEventHandler() {
handlers := make(map[string]cache.ResourceEventHandlerRegistration, 10)
var handlerRegistration cache.ResourceEventHandlerRegistration
var err error
// create informer for node information
sc.nodeInformer = informerFactory.Core().V1().Nodes()
handlerRegistration, _ = sc.nodeInformer.Informer().AddEventHandler(
handlerRegistration, err = sc.nodeInformer.Informer().AddEventHandler(
Comment on lines +677 to +679
if err != nil {
klog.Fatalf("Failed to add event handler for node informer: %v", err)
}
func (sc *SchedulerCache) addEventHandler() {
handlers := make(map[string]cache.ResourceEventHandlerRegistration, 10)
var handlerRegistration cache.ResourceEventHandlerRegistration
var err error
// create informer for node information
sc.nodeInformer = informerFactory.Core().V1().Nodes()
handlerRegistration, _ = sc.nodeInformer.Informer().AddEventHandler(
handlerRegistration, err = sc.nodeInformer.Informer().AddEventHandler(
Comment on lines +478 to +480
if err != nil {
klog.Fatalf("Failed to add event handler for node informer: %v", err)
}

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves error handling across several components. It captures and handles errors returned by AddEventHandler on various informers in both the agent scheduler and scheduler caches, calling klog.Fatalf upon failure. It also fixes a bug in the cronjob controller where an incorrect error variable (syncErr instead of err) was returned during status updates. Additionally, it ensures that errors from createPatch are properly handled and returned as admission responses in the job and pod mutation webhooks. As there are no review comments, I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants