native module refinement#1875
Conversation
Greptile SummaryThis PR refines the native-module lifecycle with a stop-lock to prevent re-entrant shutdown races, orphan-process prevention via Confidence Score: 5/5Safe to merge; all remaining findings are P2 style/reliability suggestions that do not block correct operation. Previously flagged P1 issues (TimeoutExpired after SIGKILL, re-entrant stop) are resolved. New findings are P2: a reduced shutdown timeout that may be too short for heavy-teardown processes, missing stderr in the build-failure exception, and a minor find_library edge case. None of these are blocking correctness bugs. dimos/core/native_module.py — shutdown_timeout reduction and build-error message regression deserve a second look before shipping to users with slow-teardown native processes. Important Files Changed
Sequence DiagramsequenceDiagram
participant Parent as NativeModule (parent)
participant Proc as Native Process (child)
participant WD as Watchdog Thread
Parent->>Proc: Popen(start_new_session=True, preexec_fn=PR_SET_PDEATHSIG)
Parent->>WD: Thread(target=_watch_process).start()
WD->>Proc: proc.wait()
alt Normal stop()
Parent->>Parent: _stop_lock: _stopping=True
Parent->>Proc: send_signal(SIGTERM)
Proc-->>Parent: exits within shutdown_timeout
Parent->>WD: watchdog.join(timeout)
WD-->>Parent: joined
Parent->>Parent: _stop_lock: _process=None, _watchdog=None
Parent->>Parent: super().stop()
else Process crashes
Proc-->>WD: proc.wait() returns (unexpected exit)
WD->>Parent: self.stop() [re-entrant, lock prevents double-stop]
else Parent dies unexpectedly
Proc->>Proc: SIGTERM via PR_SET_PDEATHSIG
end
Reviews (3): Last reviewed commit: "-" | Re-trigger Greptile |
Document all unaddressed review comments across these PRs: - PR #1875: 6 comments from leshy/paul-nechifor on native_module.py - PR #1791: 8 comments from leshy on nav stack architecture/naming - PR #1663: 3 comments from paul-nechifor on test_mcp_server.py - PR #1485: 4 greptile bot suggestions (3 fixed, 1 minor typing nit)
Keep both: dev's _WorkerState pattern and our branch's SIGINT ignore for clean coordinator-orchestrated shutdown. Also add pr_responses.yaml for PR #1875 review comments.
Replace the ad-hoc PROD env var check with a proper GlobalConfig field, readable via global_config.prod. pydantic-settings auto-reads from the PROD env var.
Co-authored-by: Paul Nechifor <paul@nechifor.net>
- Restore LogFormat enum and JSON log parsing in _read_log_stream
- Add auto_build config to gate expensive nix rebuilds per-module
- Replace PROD env var with global_config.prod
- Handle TimeoutExpired after SIGKILL in stop()
- Use find_library("c") instead of hardcoded libc.so.6
- Simplify _child_preexec_linux docstring
- Document shell=True trusted-source requirement
…tive_rebuild3 # Conflicts: # dimos/core/coordination/python_worker.py # dimos/core/global_config.py
d7404f9 to
6b07823
Compare
# Conflicts: # dimos/core/native_module.py
Native module refinement (rebuild logic and logging) and keyboard interrupt improvement