Summary
On a structurally-unchanged project update, UpdateProject sometimes classifies pre-existing processes as "Updated" — causing them to be removed-and-re-added, which bounces the always-on processes they depend on. In my setup this caused pulsar-postgres to restart, which in turn caused four scheduled processes (currently firing inside the postgres-down window) to fail with psycopg.OperationalError: ... FATAL: the database system is shutting down.
Environment
- process-compose v1.110.0 (commit
cd7f6af)
- macOS Darwin 25.4.0, M1 Max
- 16 processes under a single supervisor (4 always-on, 8 scheduled-cron, 5 scheduled-interval)
- Supervisor managed by a LaunchAgent
Observed behavior
I appended a new scheduled-cron entry to my services.json source-of-truth, regenerated process-compose.yaml (a one-process diff), and ran process-compose project update -f <yaml>. The supervisor responded "Project updated successfully".
Captured curl localhost:8080/processes shortly after:
- All 17 processes'
process_start_time clustered in a 60-second window starting ~25s after my project update command.
- Always-on services (
corpus-rag, pulsar-postgres, both dashboards) had freshly restarted — only possible via UpdateProcess-as-Updated, since they have restart: no.
- Two scheduled-cron processes (
pulsar-briefing-render cron 0 18 * * * and pulsar-decay-sweep cron 0 4 * * *) ALSO had recent process_start_time values, plus exit_code: 1 and a postgres-connection-refused traceback in their logs — they had fired in the window where pulsar-postgres was mid-shutdown.
The expected behavior was: a project update whose only change is "add one new entry" leaves all 16 existing entries untouched (Compare returns true for each → "up to date").
Source hypothesis (uncertain)
I traced the diff classification in UpdateProject (project_runner.go:1370-1420) and the Compare function (process.go:107-149).
My initial hypothesis was: UpdateProject calls Compare before AssignProcessExecutableAndArgs on the inbound newProc. The currentProc in p.project.Processes has been AssignArgs'd. So Args differs between them (one populated, one empty) → Compare returns false → process classified as "Updated".
But reviewing the loader path more carefully, loader.Load() calls assignExecutableAndArgs as a mutator at src/loader/loader.go:83 before returning. So the inbound newProc SHOULD have Args populated.
I don't have a clean source-level explanation for why two scheduled processes' Compare-output flipped while most others didn't. Some unknown mutation is happening to either the in-memory currentProc or the inbound newProc between loader.Load() and UpdateProject's diff. Filing this as an issue rather than a PR so you can weigh in on what to instrument.
Repro plan I'm willing to run
Happy to run with PC_DEBUG_MODE=1 and capture the full Compare-debug output if there's a logging hook I should enable. If you want me to add a temporary log.Debug("Compare detail: ...") patch dumping the field-by-field deltas at the moment of classification, I can do that and rerun.
Why two of the eight cron jobs and not all eight?
This is the part I can't explain from the code. The four cron jobs that fired all had string command: fields (vs. list-style entrypoint:). But several other jobs in the same config use the same shape and weren't bounced. Need debug logs to differentiate.
Related
I have a workaround in place locally (full supervisor restart instead of project update when adding scheduled entries) so this is not blocking — but the silent classification surprise is a footgun for anyone using project update against a production schedule.
Summary
On a structurally-unchanged
project update,UpdateProjectsometimes classifies pre-existing processes as "Updated" — causing them to be removed-and-re-added, which bounces the always-on processes they depend on. In my setup this causedpulsar-postgresto restart, which in turn caused four scheduled processes (currently firing inside the postgres-down window) to fail withpsycopg.OperationalError: ... FATAL: the database system is shutting down.Environment
cd7f6af)Observed behavior
I appended a new scheduled-cron entry to my services.json source-of-truth, regenerated
process-compose.yaml(a one-process diff), and ranprocess-compose project update -f <yaml>. The supervisor responded "Project updated successfully".Captured
curl localhost:8080/processesshortly after:process_start_timeclustered in a 60-second window starting ~25s after myproject updatecommand.corpus-rag,pulsar-postgres, both dashboards) had freshly restarted — only possible viaUpdateProcess-as-Updated, since they haverestart: no.pulsar-briefing-rendercron0 18 * * *andpulsar-decay-sweepcron0 4 * * *) ALSO had recentprocess_start_timevalues, plusexit_code: 1and a postgres-connection-refused traceback in their logs — they had fired in the window wherepulsar-postgreswas mid-shutdown.The expected behavior was: a
project updatewhose only change is "add one new entry" leaves all 16 existing entries untouched (Compare returns true for each → "up to date").Source hypothesis (uncertain)
I traced the diff classification in
UpdateProject(project_runner.go:1370-1420) and the Compare function (process.go:107-149).My initial hypothesis was:
UpdateProjectcalls Compare beforeAssignProcessExecutableAndArgson the inbound newProc. The currentProc inp.project.Processeshas been AssignArgs'd. SoArgsdiffers between them (one populated, one empty) → Compare returns false → process classified as "Updated".But reviewing the loader path more carefully,
loader.Load()callsassignExecutableAndArgsas a mutator atsrc/loader/loader.go:83before returning. So the inbound newProc SHOULD haveArgspopulated.I don't have a clean source-level explanation for why two scheduled processes' Compare-output flipped while most others didn't. Some unknown mutation is happening to either the in-memory
currentProcor the inboundnewProcbetweenloader.Load()andUpdateProject's diff. Filing this as an issue rather than a PR so you can weigh in on what to instrument.Repro plan I'm willing to run
Happy to run with
PC_DEBUG_MODE=1and capture the full Compare-debug output if there's a logging hook I should enable. If you want me to add a temporarylog.Debug("Compare detail: ...")patch dumping the field-by-field deltas at the moment of classification, I can do that and rerun.Why two of the eight cron jobs and not all eight?
This is the part I can't explain from the code. The four cron jobs that fired all had string
command:fields (vs. list-styleentrypoint:). But several other jobs in the same config use the same shape and weren't bounced. Need debug logs to differentiate.Related
I have a workaround in place locally (full supervisor restart instead of
project updatewhen adding scheduled entries) so this is not blocking — but the silent classification surprise is a footgun for anyone usingproject updateagainst a production schedule.