Skip to content

Fix updated process readiness cascading to dependents#497

Merged
F1bonacc1 merged 1 commit into
F1bonacc1:mainfrom
hebo6:fix/update-readiness-cascade
May 30, 2026
Merged

Fix updated process readiness cascading to dependents#497
F1bonacc1 merged 1 commit into
F1bonacc1:mainfrom
hebo6:fix/update-readiness-cascade

Conversation

@hebo6

@hebo6 hebo6 commented May 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Prefer the currently running process over stale completed entries when resolving dependencies.
  • Remove stale completed process entries when re-adding an updated process.
  • Treat ready_log_line as satisfying process_healthy waiters by releasing the ready context when the line is observed.
  • Add a regression test for updated processes whose ready_log_line should unblock healthy dependents.

Reproduction

  1. Start a project with proc1 configured with ready_log_line: old-ready and a command that prints old-ready and exits.
  2. Wait for proc1 to complete, leaving it in the completed process map.
  3. Update proc1 to print new-ready and keep running, with ready_log_line: new-ready.
  4. Add or start proc2 with depends_on.proc1.condition: process_healthy.
  5. Before this fix, proc2 stays Pending because dependency lookup can use the stale completed proc1, and ready_log_line does not release process_healthy waiters.
  6. After this fix, proc2 observes the updated running proc1, the new-ready log line releases the healthy dependency, and proc2 starts.

Verification

  • Reproduced the failure on the parent commit a92cb36 with the same regression scenario.
  • Verified the regression is covered by:
    go test ./src/app -run TestUpdateProcessReadyLogLineReleasesHealthyDependent -count=1
  • Ran related existing tests:
    go test ./src/app -run 'TestSystem_TestReadyLineWithSkipped|TestUpdateProject|TestSystem_TestReadyLine' -count=1

@F1bonacc1 F1bonacc1 left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tracking this down. I'd split this into two parts.

Keep: the getDoneOrRunningProcess (prefer running over stale done) and addProcessAndRun (drop stale done) changes. They fix a real bug where dependents resolved a stale process object after update/restart, and they apply to all dependency conditions. Solid.

Push back on: the readyCancelFn() added in handleOutput, which makes ready_log_line satisfy a process_healthy dependency. These are two intentionally separate conditions:

  • process_healthy waits for a readiness/liveness probe to pass. A probe is continuous — it keeps checking and can go back to not-ready.
  • process_log_ready waits for a ready_log_line to appear. That's a one-time event — once the line shows up, it's done forever.

Making ready_log_line also satisfy process_healthy blurs that line: process_healthy would no longer mean "the dependency is healthy right now," just "it printed a line once."

The validators currently keep these two worlds apart in both directions. A process_log_ready dependency requires the target to have a ready_log_line (validators.go:134), and a process is not allowed to have both a ready_log_line and a probe (validators.go:146). So today you cannot point a process_log_ready dependency at a probe-based process — it's rejected. This PR makes the opposite work (a process_healthy dependency pointing at a log-line process) without touching that rule. So one combination is forbidden and the equivalent one is now quietly allowed, which is inconsistent.

Two more concerns as written:

  1. process_log_ready already does exactly this and works today, so the change just adds a second, looser way to express the same thing.
  2. It disagrees with the loader. A YAML config that puts process_healthy on a ready_log_line-only process is still rejected under --strict/--dry-run, and logs "no health check exists" in normal mode — even though it now works at runtime. The test only passes because the API path skips validation.

The actual bug being fixed is the hang: in non-strict mode the dependent waits on procReadyCtx forever. I'd fix that directly — make the waiter fail cleanly with the message the validator already prints — instead of redefining ready_log_line as a health signal. If you do want the lenient behavior, it should be done end-to-end (update the validator at :127 and the docs). What's the use case that led you to process_healthy here instead of process_log_ready?

@hebo6 hebo6 force-pushed the fix/update-readiness-cascade branch from 8cd9766 to 6ffb745 Compare May 30, 2026 00:58
@hebo6

hebo6 commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

Thanks, that distinction makes sense.

I updated the PR to keep ready_log_line and process_healthy separate: the log-ready path no longer releases process_healthy waiters.

I kept the stale process fixes (getDoneOrRunningProcess preferring the running process and clearing stale doneProcesses entries on add/update). For the hang case, the runtime process_healthy dependency path now fails cleanly with the same "no health check exists" message when the target has no readiness/liveness probe, instead of waiting forever.

I also adjusted the regression test to cover both the stale running-process selection and the no-hang failure behavior.

@sonarqubecloud

Copy link
Copy Markdown

@F1bonacc1 F1bonacc1 left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @hebo6 for the PR and for making PC better for everyone else!

Cheers! 😄

@F1bonacc1 F1bonacc1 merged commit 85686e5 into F1bonacc1:main May 30, 2026
7 checks passed
@hebo6 hebo6 deleted the fix/update-readiness-cascade branch May 30, 2026 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants