Skip to content

fix(workflows): make task failures non-blocking#555

Merged
kelvinkipruto merged 6 commits intomainfrom
fix/non-blocking-workflow-tasks
Mar 6, 2026
Merged

fix(workflows): make task failures non-blocking#555
kelvinkipruto merged 6 commits intomainfrom
fix/non-blocking-workflow-tasks

Conversation

@kelvinkipruto
Copy link
Contributor

Description

Wrap each workflow task in a shared runTask helper so that a single task failure no longer aborts the workflow.
Previously, an unhandled task error would mark the workflow job as failed in payload-jobs, blocking all subsequent scheduled runs until the table was cleared manually. Now each task catches its own error, logs it to Sentry, and the workflow always completes — allowing the next cron cycle to start cleanly.

Fixes # (issue)

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Screenshots

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation

Wrap each workflow task in a shared runTask helper so that a single
task failure no longer aborts the workflow. Previously, an unhandled
task error would mark the workflow job as failed in payload-jobs,
blocking all subsequent scheduled runs until the table was cleared
manually. Now each task catches its own error, logs it to Sentry, and
the workflow always completes — allowing the next cron cycle to start
cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
PromiseTracker Ready Ready Preview, Comment Mar 6, 2026 2:21pm

Request Review

@kelvinkipruto
Copy link
Contributor Author

@claude Review

@kelvinkipruto
Copy link
Contributor Author

@codex review

@claude
Copy link

claude bot commented Mar 6, 2026

Claude finished @kelvinkipruto's task in 1m 8s —— View job


Code Review

  • Gather context
  • Read changed files
  • Analyze diff
  • Post review feedback

The change solves the immediate problem (blocked cron cycles due to unhandled task errors) with a clean, minimal helper. Overall the approach is solid, but there are a few things worth discussing.


Issues

1. Silent failure propagation through dependent tasks

src/workflows/airtableWorkflow.ts — The 7 tasks appear to form a pipeline with real data dependencies:

createTenantFromAirtable → createPoliticalEntity → fetchAirtableDocuments
  → downloadDocuments → extractDocuments → extractPromises → uploadToMeedan

With runTask swallowing all errors, if fetchAirtableDocuments fails (e.g. Airtable is down), downloadDocuments, extractDocuments, extractPromises, and uploadToMeedan will still run against stale or missing data. Each will also fail and be silently swallowed. The workflow will report as "complete" in payload-jobs, but no data was actually processed. This is arguably harder to debug than the original blocking failure.

The same concern applies to src/workflows/meedanWorkflow.ts: if fetchPromiseStatuses fails, updatePromiseStatus runs with no fresh data.

A suggestion would be to return a boolean from runTask and abort remaining tasks if a critical upstream one fails: Fix this →

2. withWorkflowErrorCapture is now effectively dead code for task errors

src/workflows/utils.ts:164–193withWorkflowErrorCapture wraps the handler to catch and report errors. Since runTask now swallows all task errors before they can bubble up, the outer catch in withWorkflowErrorCapture will never fire for task failures. It now only guards against errors thrown by the handler function itself (e.g. a bug in the handler setup code), which is a very narrow case. This is worth documenting or acknowledging to avoid confusion.

3. Duplicate Sentry reporting

src/workflows/utils.ts:13–19 — Both Sentry.logger.error and Sentry.captureException are called for every failure. Depending on whether Sentry.logger is configured to auto-capture as exceptions, this could create two Sentry issues per failure. Consider whether both are needed, or if captureException alone (with a breadcrumb/log attachment) is sufficient.

Additionally, Sentry.logger.error includes only error.message (line 16), discarding the stack trace. The full exception detail is preserved by captureException, but losing the stack in the log context can make log-based debugging harder.


Minor Suggestions

runTask return value (src/workflows/utils.ts:5) — The function currently returns Promise<void> (implicitly). If tasks ever need to return values, or if the short-circuit pattern above is adopted, making the return type explicit as Promise<boolean> would be a cheap improvement in expressiveness.

workflow parameter is redundant — The workflow name is available in the Sentry span/transaction context set up by withWorkflowContext. Passing it explicitly into every runTask call at the call site is slightly noisy. This is minor — just worth considering if the helper is extended.


Summary

The fix correctly addresses the cron-blocking regression with minimal code. The main risk is that the pipeline now silently completes even when all tasks fail mid-way — operators need to rely on Sentry alerts to detect problems. If the tasks are truly independent (idempotent and not pipelined), this is fine as-is. If they form a data dependency chain, consider adding a bail-out mechanism for upstream failures.
| Branch

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c9aa6e2681

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

kelvinkipruto and others added 3 commits March 6, 2026 17:01
Task errors are now swallowed by runTask before they can bubble up, so
withWorkflowErrorCapture no longer fires for task failures. Add a
comment acknowledging this so it is not mistaken for dead code — it
remains as a safety net for errors thrown in the handler setup code
itself, outside any runTask call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaced Sentry.logger.error + captureException with captureException
alone. Tasks using withTaskTracing already capture exceptions before
re-throwing, so the previous two-call pattern risked triple-reporting
the same error. captureException also preserves the full stack trace,
which Sentry.logger.error discarded by only passing error.message.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Drop the workflow parameter — the Sentry span context from
  withWorkflowContext already carries the workflow name, so passing it
  explicitly at every call site was redundant noise.
- Add explicit Promise<boolean> return type: true on success, false on
  failure, making the helper's intent clearer if callers ever need to
  act on the result.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kelvinkipruto
Copy link
Contributor Author

@claude Review.
On

  1. Silent failure propagation through dependent tasks

The intended approach is for each task to run independently, a failure on one should not affect other tasks

@claude
Copy link

claude bot commented Mar 6, 2026

Claude finished @kelvinkipruto's task in 1m 0s —— View job


Code Review

  • Gather context
  • Read changed files
  • Analyze changes
  • Post review feedback

Thanks for the clarification — understood that tasks are designed to be independent. That resolves concern #1. Here's the updated review focusing on what remains.


The approach looks solid

The runTask helper is clean and minimal. Wrapping each task individually, catching errors, reporting to Sentry, and letting the workflow always complete is a well-reasoned design for independent tasks. The comment added at src/workflows/utils.ts:162–165 correctly documents the narrowed scope of withWorkflowErrorCapture — that was a helpful addition.


Remaining points

1. runTask return value goes unused

runTask already returns Promise<boolean> (true/false), but neither airtableWorkflow.ts nor meedanWorkflow.ts uses the value. This is fine since tasks are independent, but if you ever want to add task-level observability (e.g., logging a summary of which tasks succeeded/failed per run), the return values are there to use. No change needed — just noting it's available.

2. Potential duplicate Sentry reporting in withWorkflowErrorCapture

src/workflows/utils.ts:181–191 — For handler-level errors (i.e., bugs outside any runTask call), withWorkflowErrorCapture calls both Sentry.logger.error and Sentry.captureException. Depending on Sentry SDK configuration, logger.error may also auto-submit as an exception, potentially creating two Sentry issues for one error. Also note that line 183 only logs error.message — the full stack is only preserved by captureException.

Since handler-level errors are now rare (only setup bugs would reach this path), the practical impact is low. But if you want to clean it up: Fix this →

3. Sequential execution of independent tasks (minor)

All tasks in both workflows are awaited sequentially (await runTask(...) one at a time). Since the tasks are independent, Promise.allSettled would run them concurrently and reduce total wall-clock time per workflow run. This is a minor performance suggestion — sequential is simpler and avoids any risk of resource contention, so it's reasonable to keep as-is.


Summary

The fix is correct and well-scoped. The only actionable item worth addressing is the potential duplicate Sentry reporting in withWorkflowErrorCapture (point 2), but even that is low-priority given how rarely that path now fires. Otherwise this is good to merge.


| Branch

Remove duplicate Sentry captureMessage call in task logger to avoid double reporting.
Remove Sentry.logger.error call in workflow error capture as Sentry.captureException already handles error reporting.
Change cleanup failed jobs schedule from daily to hourly to improve system maintenance.
@kelvinkipruto kelvinkipruto merged commit e5039b0 into main Mar 6, 2026
5 of 6 checks passed
@kelvinkipruto kelvinkipruto deleted the fix/non-blocking-workflow-tasks branch March 6, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant