Skip to content

fix(app): scheduled processes added via project update now register with the scheduler and respect run_on_start#493

Closed
makingaipractical wants to merge 2 commits into
F1bonacc1:mainfrom
makingaipractical:fix/scheduled-process-update-orphan
Closed

fix(app): scheduled processes added via project update now register with the scheduler and respect run_on_start#493
makingaipractical wants to merge 2 commits into
F1bonacc1:mainfrom
makingaipractical:fix/scheduled-process-update-orphan

Conversation

@makingaipractical

Copy link
Copy Markdown

Summary

Three closely related bugs in UpdateProject (CLI: process-compose project update, REST: POST /project) cause scheduled processes to misbehave when added, modified, or compared as part of a running-project update:

  1. A new scheduled process added via project update fires IMMEDIATELY, ignoring run_on_start: false.
  2. The same new scheduled process is never registered with the schedulernext_run_time stays nil and the cron is silently dropped forever.
  3. Changing only a scheduled process's cron/interval/run_on_start/timezone is detected as no-change by ProcessConfig.Compare, so the new schedule is silently ignored.

Live evidence from my setup (Mac, 16 daemons supervised, v1.110.0): I appended one scheduled-cron entry to the project YAML and ran process-compose project update -f <yaml>. The new entry fired immediately and got "orphaned" — next_run_time: null — so its 0 8 * * * cron would never have fired again until a full supervisor restart.

The bug

(1) Immediate fire on add — src/app/project_runner.go

UpdateProject routes new entries to addProcessAndRun:

func (p *ProjectRunner) addProcessAndRun(proc types.ProcessConfig) {
    ...
    p.initProcessLog(proc.ReplicaName)
    if !proc.IsDeferred() {
        p.runProcess(&proc)   // <-- unconditional spawn
    }
}

IsDeferred() returns true only for IsForeground || Disabled — it does not consider Schedule. The startup loop has the correct guard (lines 186-189) — addProcessAndRun is just missing the same one.

(2) Schedule orphaned — src/scheduler/scheduler.go + src/app/project_runner.go

processScheduler.AddProcess is called in exactly one place — the startup Run() loop at line 169. Neither UpdateProject, UpdateProcess, addProcessAndRun, StartProcess, nor RestartProcess register new entries with the scheduler. A scheduled process added at runtime is permanently orphaned from gocron.

Symmetrically, removeProcess never deregisters from the scheduler — there's no Scheduler.RemoveProcess method at all (only PauseProcess, which keeps the entry in s.schedules so ResumeProcess can re-arm it).

(3) Compare omits Schedule — src/types/process.go

ProcessConfig.Compare (lines 107-149) walks ~25 fields by direct compare or reflect.DeepEqual. Schedule is literally absent from the field list. The function predates scheduled processes (introduced in v1.87.0); the maintenance was never done.

If only Schedule.Cron changes between the running config and the new YAML, Compare returns true (equal) → UpdateProject classifies the process as up-to-date → the cron change is silently ignored.

The fix

Three minimal source changes + one new scheduler method:

src/scheduler/scheduler.go — add RemoveProcess (full removal, distinct from PauseProcess's preserve-for-resume semantics). Tolerates gocron.ErrJobNotFound (the underlying gocron job table is eventually-consistent against NewJob and may have already wiped the job on shutdown) and always cleans up the local tracking map:

func (s *Scheduler) RemoveProcess(name string) error {
    s.mutex.Lock()
    defer s.mutex.Unlock()
    entry, ok := s.schedules[name]
    if !ok { return nil }
    var err error
    if entry.Job != nil {
        if rmErr := s.gocronScheduler.RemoveJob(entry.Job.ID()); rmErr != nil && !errors.Is(rmErr, gocron.ErrJobNotFound) {
            err = rmErr
        }
        entry.Job = nil
    }
    delete(s.schedules, name)
    return err
}

src/app/project_runner.go — patch addProcessAndRun to route scheduled processes to Scheduler.AddProcess and skip runProcess. Mirrors the startup-loop semantics. Patch removeProcess to call Scheduler.RemoveProcess for full deregistration.

src/types/process.go — add !reflect.DeepEqual(p.Schedule, another.Schedule) to Compare's DeepEqual chain.

Why patching addProcessAndRun (rather than branching in each call site): the helper is called from both UpdateProject (new procs) and UpdateProcess (the remove-then-re-add path). Both call sites should treat scheduled processes the same way. One change in the helper covers both, and means future call sites get the right behavior for free.

Minimal reproduction

mkdir -p /tmp/pc-bug-1801 && cd /tmp/pc-bug-1801
cat > project.yaml <<'EOF'
version: "0.5"
processes:
  ticker:
    command: echo "ticker would fire"
    namespace: scheduled-cron
    schedule:
      cron: "0 8 * * *"
      run_on_start: false
EOF

cat > project-after.yaml <<'EOF'
version: "0.5"
processes:
  ticker:
    command: echo "ticker would fire"
    namespace: scheduled-cron
    schedule:
      cron: "0 8 * * *"
      run_on_start: false
  newcomer:
    command: echo "this should NOT fire on update"
    namespace: scheduled-cron
    schedule:
      cron: "0 0 1 1 *"
      run_on_start: false
EOF

# Terminal 1: start supervisor with the smaller config
process-compose up -f project.yaml --tui=false &
SUPERVISOR=$!
sleep 2

# Terminal 1: append a new scheduled entry via project update
process-compose project update -f project-after.yaml

# Before fix: newcomer fired immediately ("this should NOT fire on update")
#             AND curl localhost:8080/process/newcomer | jq .next_run_time → null
# After fix:  newcomer is silent
#             AND curl localhost:8080/process/newcomer | jq .next_run_time → "2027-01-01T00:00:00Z"

curl -s localhost:8080/process/newcomer | python3 -c "import sys,json; p=json.load(sys.stdin); print(f\"next_run_time: {p.get('next_run_time')}\")"

kill $SUPERVISOR

Before / After (real environment)

I run a 16-process supervisor on macOS. I added one scheduled-cron entry to the YAML and ran project update. Captured curl localhost:8080/processes:

Before fix (v1.110.0):

test-entry: status=Completed, pid=69358, process_start_time=2026-05-23T00:53:54.907+02:00, next_run_time=null

→ Fired immediately at the update moment, orphaned from scheduler.

After fix (this PR):

test-entry: status=Scheduled, pid=0, process_start_time=null, next_run_time=2027-01-01T00:00:00+01:00

→ Quiet, scheduler-registered, will fire at the configured time.

Tests

Three test surfaces lock in the fix:

  • src/scheduler/scheduler_test.go — 4 cases for the new RemoveProcess (found, unknown, already-paused, gocron-shutdown-tolerance).
  • src/types/process_test.go — 4 cases extending the table-driven TestCompareProcessConfigs: different cron, different run_on_start, nil-vs-set Schedule, identical schedules equal.
  • src/app/system_test.goTestUpdateProject_ScheduledProcess end-to-end through UpdateProject: add → schedule-change → remove. Separate from the existing TestUpdateProject so the pre-fix behavior assertion in that test stays untouched.

make testrace passes locally. make lint clean. make ci's build-nix step skipped (non-Nix host).

There's one pre-existing failure in TestBuildProcessEnvironment/PC_PROC_NAME + PC_REPLICA_NUM on main (unrelated to this change — environment-variable test, no scheduler involvement). Confirmed by running go test ./src/app/ -run TestBuildProcessEnvironment against unmodified main.

Why this fix vs. alternatives

  • Branch in each call site instead of the helper. Considered. Would require duplicating the same logic in UpdateProject's new-procs loop and UpdateProcess's add-after-remove path. The helper is the canonical "add a process to the runtime" entry point — putting the schedule guard there means the next caller (e.g. a future scaling path) gets it for free.
  • Skip the Compare fix and just let UpdateProcess's second compare catch it. UpdateProcess does call AssignProcessExecutableAndArgs before its own Compare, but UpdateProject classifies entries via Compare before dispatching to UpdateProcess — and Schedule-only changes never make it past the first classification. Compare needs to know about Schedule directly.
  • Reuse PauseProcess instead of adding RemoveProcess. Pause keeps the entry in s.schedules so ResumeProcess can re-arm it. For a process being deleted from the project entirely, that's a leak — and a name collision risk if a later AddProcess reuses the same name. The two operations have distinct contracts.

Related

I have one more open question — a separate bouncing issue I observed on the live supervisor where two pre-existing scheduled processes were classified as "Updated" on a project update against a structurally-identical YAML, causing a full pulsar-postgres restart. The source-level explanation (UpdateProject calling Compare before AssignProcessExecutableAndArgs while the in-memory copy is already assigned) doesn't quite match the code in loader.Load which already runs assignExecutableAndArgs before returning. I'm filing that separately as an issue with debug-log evidence; happy to follow up with another PR if you confirm the diagnosis.

Thanks for process-compose — it's been a quiet workhorse for a 16-daemon Mac supervisor setup of mine for the past few weeks.

makingaipractical and others added 2 commits May 23, 2026 02:09
… on remove

UpdateProject's add/update/delete paths forgot scheduled processes exist.
Three independent oversights surface together when a user runs
`process-compose project update -f <yaml>` against a config that adds,
modifies, or removes a scheduled-cron / scheduled-interval entry:

  (a) addProcessAndRun unconditionally called runProcess for new entries,
      so scheduled processes fired IMMEDIATELY on add even when
      `run_on_start: false`. The Run() startup loop has the correct guard
      (skip scheduled processes — they're owned by the scheduler); this
      patch matches that behavior in addProcessAndRun.

  (b) processScheduler.AddProcess was only ever called inside Run()'s
      startup loop. So a scheduled process added at runtime via
      UpdateProject (or UpdateProcess's remove-then-re-add path) was
      never registered with gocron — its `next_run_time` stayed nil and
      its cron expression was silently dropped forever.

  (c) ProcessConfig.Compare's reflect.DeepEqual block omitted the
      Schedule field. So if a user changed only the cron expression on an
      existing scheduled process, UpdateProject treated it as up-to-date
      and silently ignored the change. Compare predates scheduled
      processes (v1.87.0) and was never updated.

The fix also introduces Scheduler.RemoveProcess for the symmetric cleanup
path — distinct from PauseProcess, which preserves the entry so
ResumeProcess can re-arm it. RemoveProcess tolerates
gocron.ErrJobNotFound (e.g. when the underlying job was already wiped by
a previous Stop or lost to a race with NewJob's eventually-consistent
commit), and always cleans up the scheduler's tracking map.

Tested live against a 16-process supervisor: a newly-added
scheduled-cron entry with `run_on_start: false` now has the expected
`next_run_time` and does not fire on `project update`. Previously the
same operation would fire the new process immediately AND leave its cron
schedule orphaned.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new test surfaces lock in the fix so it can't quietly regress
under future simplifications:

  - src/scheduler/scheduler_test.go: 4 cases for the new RemoveProcess
    (found, unknown, already-paused, gocron-shutdown-tolerance)

  - src/types/process_test.go: 4 cases extending the table-driven
    TestCompareProcessConfigs — different cron, different run_on_start,
    nil-vs-set Schedule, identical Schedule equal

  - src/app/system_test.go: TestUpdateProject_ScheduledProcess —
    end-to-end add → schedule-change → remove of a scheduled process
    through UpdateProject, asserting scheduler registration and proper
    cleanup. Separate from the existing TestUpdateProject so the
    pre-fix behavior assertion in that test stays untouched.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud

Copy link
Copy Markdown

@makingaipractical

Copy link
Copy Markdown
Author

Closing — withdrawing this PR. Apologies for the noise.

@makingaipractical makingaipractical deleted the fix/scheduled-process-update-orphan branch May 23, 2026 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No-downtime rollout

1 participant