Skip to content

UpdateProject mis-classifies unchanged processes as 'Updated', bouncing always-on services on no-op updates #494

Description

@makingaipractical

Summary

On a structurally-unchanged project update, UpdateProject sometimes classifies pre-existing processes as "Updated" — causing them to be removed-and-re-added, which bounces the always-on processes they depend on. In my setup this caused pulsar-postgres to restart, which in turn caused four scheduled processes (currently firing inside the postgres-down window) to fail with psycopg.OperationalError: ... FATAL: the database system is shutting down.

Environment

  • process-compose v1.110.0 (commit cd7f6af)
  • macOS Darwin 25.4.0, M1 Max
  • 16 processes under a single supervisor (4 always-on, 8 scheduled-cron, 5 scheduled-interval)
  • Supervisor managed by a LaunchAgent

Observed behavior

I appended a new scheduled-cron entry to my services.json source-of-truth, regenerated process-compose.yaml (a one-process diff), and ran process-compose project update -f <yaml>. The supervisor responded "Project updated successfully".

Captured curl localhost:8080/processes shortly after:

  • All 17 processes' process_start_time clustered in a 60-second window starting ~25s after my project update command.
  • Always-on services (corpus-rag, pulsar-postgres, both dashboards) had freshly restarted — only possible via UpdateProcess-as-Updated, since they have restart: no.
  • Two scheduled-cron processes (pulsar-briefing-render cron 0 18 * * * and pulsar-decay-sweep cron 0 4 * * *) ALSO had recent process_start_time values, plus exit_code: 1 and a postgres-connection-refused traceback in their logs — they had fired in the window where pulsar-postgres was mid-shutdown.

The expected behavior was: a project update whose only change is "add one new entry" leaves all 16 existing entries untouched (Compare returns true for each → "up to date").

Source hypothesis (uncertain)

I traced the diff classification in UpdateProject (project_runner.go:1370-1420) and the Compare function (process.go:107-149).

My initial hypothesis was: UpdateProject calls Compare before AssignProcessExecutableAndArgs on the inbound newProc. The currentProc in p.project.Processes has been AssignArgs'd. So Args differs between them (one populated, one empty) → Compare returns false → process classified as "Updated".

But reviewing the loader path more carefully, loader.Load() calls assignExecutableAndArgs as a mutator at src/loader/loader.go:83 before returning. So the inbound newProc SHOULD have Args populated.

I don't have a clean source-level explanation for why two scheduled processes' Compare-output flipped while most others didn't. Some unknown mutation is happening to either the in-memory currentProc or the inbound newProc between loader.Load() and UpdateProject's diff. Filing this as an issue rather than a PR so you can weigh in on what to instrument.

Repro plan I'm willing to run

Happy to run with PC_DEBUG_MODE=1 and capture the full Compare-debug output if there's a logging hook I should enable. If you want me to add a temporary log.Debug("Compare detail: ...") patch dumping the field-by-field deltas at the moment of classification, I can do that and rerun.

Why two of the eight cron jobs and not all eight?

This is the part I can't explain from the code. The four cron jobs that fired all had string command: fields (vs. list-style entrypoint:). But several other jobs in the same config use the same shape and weren't bounced. Need debug logs to differentiate.

Related

I have a workaround in place locally (full supervisor restart instead of project update when adding scheduled entries) so this is not blocking — but the silent classification surprise is a footgun for anyone using project update against a production schedule.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions