Skip to content

Bug re-run#44

Merged
stefraynaud merged 3 commits into
developfrom
bug_restart
Jun 16, 2026
Merged

Bug re-run#44
stefraynaud merged 3 commits into
developfrom
bug_restart

Conversation

@stefraynaud

Copy link
Copy Markdown
Member

Description

Fix: tasks restarted after a fully successful run (#28)

Problem

When all tasks had succeeded and woom run was called again without
woom clean, every task was re-submitted instead of being skipped.
Observed on Datarmor (PBS Pro).

Root causes

Bug 1 – wrong identity comparison
status.name is JobStatus.SUCCESS compared a str to an Enum member
via is, which is always False. Fixed to status is JobStatus.SUCCESS.

Bug 2 – spurious sentinel PBS job
load_job(..., append=True) in get_task_status() populated
jobmanager.jobs with already-finished jobs, causing submit_sentinel()
to dispatch a new PBS job on every re-run even when nothing was submitted.
Fixed by gating the sentinel on n_submitted > 0.

Changes

  • woom/workflow.py — Enum comparison fix + sentinel submission guard
  • CHANGES.rst — bug fixes documented under Job overview #28
  • mds/bug28_restart_on_success.md — detailed root-cause report

Check list

  • Closes Job overview #28
  • Tests added
  • User visible changes (including notable bug fixes) are documented in CHANGES.rst
  • New modules are listed in api.rst

Two bugs caused woom to restart all tasks on re-run when all jobs
had previously succeeded.

Bug 1 – wrong identity comparison in Workflow.run():
  `status.name is JobStatus.SUCCESS` compared a str to an Enum
  member using `is`, which is always False. Tasks that already
  succeeded were never skipped.
  Fixed: `status is JobStatus.SUCCESS`.

Bug 2 – spurious sentinel submission on re-run:
  get_task_status() called load_job(..., append=True), adding every
  previously-run job to self.jobmanager.jobs even for skipped tasks.
  submit_sentinel() then saw a non-empty list and dispatched a new
  PBS job on every subsequent run, even when nothing was submitted.
  Fixed: sentinel is only called when n_submitted > 0.
@stefraynaud stefraynaud merged commit aad33f5 into develop Jun 16, 2026
13 checks passed
@stefraynaud stefraynaud deleted the bug_restart branch June 16, 2026 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Job overview

1 participant