Fix cronjob : return actual error when UpdateStatus fails in sync()#5454
Fix cronjob : return actual error when UpdateStatus fails in sync()#5454ManojLamani wants to merge 3 commits into
Conversation
Signed-off-by: Manoj Lamani <manoj.p24@medhaviskillsuniversity.edu.in>
Signed-off-by: Manoj Lamani <manoj.p24@medhaviskillsuniversity.edu.in>
In the sync() function, after syncCronJob() succeeds (syncErr is nil), an UpdateStatus failure returned syncErr instead of the actual err. Since syncErr is guaranteed nil at that point, the caller received nil, nil and had no way to detect the status update failure. Fixes: return nil, err so the real error is propagated to the caller. Signed-off-by: Manoj Lamani <manoj.p24@medhaviskillsuniversity.edu.in>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Improves error handling across admission webhooks, scheduler cache informer wiring, and CronJob status updates to avoid silently ignoring failures.
Changes:
- Propagate
createPatch(...)errors in Pod/Job mutating webhooks instead of discarding them. - Capture and handle errors returned by
AddEventHandler(...)when registering informer handlers. - Return the actual
UpdateStatus(...)error from CronJob sync when status update fails.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/webhooks/admission/pods/mutate/mutate_pod.go | Stop ignoring patch creation failures and return an admission error response. |
| pkg/webhooks/admission/jobs/mutate/mutate_job.go | Stop ignoring patch creation failures and return an admission error response. |
| pkg/scheduler/cache/cache.go | Check informer handler registration errors to avoid silently running without handlers. |
| pkg/controllers/cronjob/cronjob_controller.go | Return the correct error when UpdateStatus fails. |
| pkg/agentscheduler/cache/cache.go | Check informer handler registration errors to avoid silently running without handlers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (sc *SchedulerCache) addEventHandler() { | ||
| handlers := make(map[string]cache.ResourceEventHandlerRegistration, 10) | ||
| var handlerRegistration cache.ResourceEventHandlerRegistration | ||
| var err error |
| // create informer for node information | ||
| sc.nodeInformer = informerFactory.Core().V1().Nodes() | ||
| handlerRegistration, _ = sc.nodeInformer.Informer().AddEventHandler( | ||
| handlerRegistration, err = sc.nodeInformer.Informer().AddEventHandler( |
| if err != nil { | ||
| klog.Fatalf("Failed to add event handler for node informer: %v", err) | ||
| } |
| func (sc *SchedulerCache) addEventHandler() { | ||
| handlers := make(map[string]cache.ResourceEventHandlerRegistration, 10) | ||
| var handlerRegistration cache.ResourceEventHandlerRegistration | ||
| var err error |
| // create informer for node information | ||
| sc.nodeInformer = informerFactory.Core().V1().Nodes() | ||
| handlerRegistration, _ = sc.nodeInformer.Informer().AddEventHandler( | ||
| handlerRegistration, err = sc.nodeInformer.Informer().AddEventHandler( |
| if err != nil { | ||
| klog.Fatalf("Failed to add event handler for node informer: %v", err) | ||
| } |
There was a problem hiding this comment.
Code Review
This pull request improves error handling across several components. It captures and handles errors returned by AddEventHandler on various informers in both the agent scheduler and scheduler caches, calling klog.Fatalf upon failure. It also fixes a bug in the cronjob controller where an incorrect error variable (syncErr instead of err) was returned during status updates. Additionally, it ensures that errors from createPatch are properly handled and returned as admission responses in the job and pod mutation webhooks. As there are no review comments, I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
What type of PR is this?
/kind bug
What this PR does / why we need it:
In sync() inside cronjob_controller.go, after syncCronJob() succeeds, an UpdateStatus failure incorrectly returned syncErr instead of the actual err.
Because syncErr is guaranteed to be nil at that point — a non-nil syncErr causes an early return on line 206 — the caller always received (nil, nil) on status update failure. This silently swallowed the error: no retry was triggered, no backoff occurred, and the work queue treated the sync as successful even when the CronJob status was not updated.
Before:
return nil, syncErr // syncErr is nil here — error is lost
After:
return nil, err // actual UpdateStatus error is returned
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
This is a single-variable-name change. syncErr is provably nil at the point of the return statement because the block at lines 204–207 exits early if it is not. The bug caused every UpdateStatus failure to be silently swallowed.
Does this PR introduce a user-facing change?
Fixed CronJob status update failures being silently swallowed in the
controller sync loop. A failed UpdateStatus call now returns the actual
error so the work queue retries correctly.