Fix updated process readiness cascading to dependents#497
Conversation
F1bonacc1
left a comment
There was a problem hiding this comment.
Thanks for tracking this down. I'd split this into two parts.
Keep: the getDoneOrRunningProcess (prefer running over stale done) and addProcessAndRun (drop stale done) changes. They fix a real bug where dependents resolved a stale process object after update/restart, and they apply to all dependency conditions. Solid.
Push back on: the readyCancelFn() added in handleOutput, which makes ready_log_line satisfy a process_healthy dependency. These are two intentionally separate conditions:
process_healthywaits for a readiness/liveness probe to pass. A probe is continuous — it keeps checking and can go back to not-ready.process_log_readywaits for aready_log_lineto appear. That's a one-time event — once the line shows up, it's done forever.
Making ready_log_line also satisfy process_healthy blurs that line: process_healthy would no longer mean "the dependency is healthy right now," just "it printed a line once."
The validators currently keep these two worlds apart in both directions. A process_log_ready dependency requires the target to have a ready_log_line (validators.go:134), and a process is not allowed to have both a ready_log_line and a probe (validators.go:146). So today you cannot point a process_log_ready dependency at a probe-based process — it's rejected. This PR makes the opposite work (a process_healthy dependency pointing at a log-line process) without touching that rule. So one combination is forbidden and the equivalent one is now quietly allowed, which is inconsistent.
Two more concerns as written:
process_log_readyalready does exactly this and works today, so the change just adds a second, looser way to express the same thing.- It disagrees with the loader. A YAML config that puts
process_healthyon aready_log_line-only process is still rejected under--strict/--dry-run, and logs "no health check exists" in normal mode — even though it now works at runtime. The test only passes because the API path skips validation.
The actual bug being fixed is the hang: in non-strict mode the dependent waits on procReadyCtx forever. I'd fix that directly — make the waiter fail cleanly with the message the validator already prints — instead of redefining ready_log_line as a health signal. If you do want the lenient behavior, it should be done end-to-end (update the validator at :127 and the docs). What's the use case that led you to process_healthy here instead of process_log_ready?
8cd9766 to
6ffb745
Compare
|
Thanks, that distinction makes sense. I updated the PR to keep I kept the stale process fixes ( I also adjusted the regression test to cover both the stale running-process selection and the no-hang failure behavior. |
|



Summary
ready_log_lineas satisfyingprocess_healthywaiters by releasing the ready context when the line is observed.ready_log_lineshould unblock healthy dependents.Reproduction
proc1configured withready_log_line: old-readyand a command that printsold-readyand exits.proc1to complete, leaving it in the completed process map.proc1to printnew-readyand keep running, withready_log_line: new-ready.proc2withdepends_on.proc1.condition: process_healthy.proc2staysPendingbecause dependency lookup can use the stale completedproc1, andready_log_linedoes not releaseprocess_healthywaiters.proc2observes the updated runningproc1, thenew-readylog line releases the healthy dependency, andproc2starts.Verification
a92cb36with the same regression scenario.go test ./src/app -run TestUpdateProcessReadyLogLineReleasesHealthyDependent -count=1go test ./src/app -run 'TestSystem_TestReadyLineWithSkipped|TestUpdateProject|TestSystem_TestReadyLine' -count=1