Submitted with the assistance of Sally Sonnet
Summary
pg0 info / pg0 list (and the running property on the Python Pg0 class) report a postgres instance as "running" based on process/port-alive state only — there's no real connectivity check (e.g. SELECT 1). This means a postgres backend that's technically alive in ps and bound to its port, but unable to actually serve any query, is reported as healthy indefinitely.
How this happens in practice
On Linux hosts where systemd-logind's default RemoveIPC=yes reaps a user's shared-memory segments when their last login session ends (and that user has no loginctl linger enabled), an embedded pg0 postgres instance can lose its shared memory while the OS process itself keeps running. Any subsequent real connection attempt fails:
FATAL: could not open shared memory segment "/PostgreSQL.NNNNNNNNNN": No such file or directory
But pg0 info --name <instance> and pg0 list continue to report the instance as running with a valid-looking connection URI, because the check never actually opens a connection — it appears to only check that the process exists and the port is listening.
Why this matters
For any application that calls pg0.info().running (or the CLI equivalent) to decide whether to skip startup/use an existing instance — as hindsight-api's embedded-Postgres manager does — a zombie instance like this is invisible. The application happily reuses (or tries to reuse) a connection to an instance that can never actually serve a query, and the resulting failure surfaces much later, in application-level code, with no indication that pg0 itself already "knew" the instance was unhealthy.
Verification performed
- Reproduced directly: stopped a healthy instance's shared memory out from under it (via the systemd RemoveIPC interaction above), confirmed the process was still alive (
ps), confirmed pg0 list still reported (running), and confirmed a direct psql connection failed with the shared-memory error shown above.
- After restarting the instance with
pg0 stop + pg0 start (getting a fresh shared-memory segment), the same instance correctly served real queries again.
Suggested fix
Have pg0 info/pg0 list/the running property perform a lightweight real connectivity check (e.g. attempt a trivial query via the bundled psql, or open a raw libpq connection) rather than relying solely on process-alive + port-listening state.
Environment
- pg0-embedded 0.14.2 (Python SDK), pg0 CLI 0.14.2
- PostgreSQL 18.1.0 (bundled)
- Host: Debian 12 bookworm, x86_64
Submitted with the assistance of Sally Sonnet
Summary
pg0 info/pg0 list(and therunningproperty on the PythonPg0class) report a postgres instance as "running" based on process/port-alive state only — there's no real connectivity check (e.g.SELECT 1). This means a postgres backend that's technically alive inpsand bound to its port, but unable to actually serve any query, is reported as healthy indefinitely.How this happens in practice
On Linux hosts where
systemd-logind's defaultRemoveIPC=yesreaps a user's shared-memory segments when their last login session ends (and that user has nologinctllinger enabled), an embeddedpg0postgres instance can lose its shared memory while the OS process itself keeps running. Any subsequent real connection attempt fails:But
pg0 info --name <instance>andpg0 listcontinue to report the instance asrunningwith a valid-looking connection URI, because the check never actually opens a connection — it appears to only check that the process exists and the port is listening.Why this matters
For any application that calls
pg0.info().running(or the CLI equivalent) to decide whether to skip startup/use an existing instance — ashindsight-api's embedded-Postgres manager does — a zombie instance like this is invisible. The application happily reuses (or tries to reuse) a connection to an instance that can never actually serve a query, and the resulting failure surfaces much later, in application-level code, with no indication thatpg0itself already "knew" the instance was unhealthy.Verification performed
ps), confirmedpg0 liststill reported(running), and confirmed a directpsqlconnection failed with the shared-memory error shown above.pg0 stop+pg0 start(getting a fresh shared-memory segment), the same instance correctly served real queries again.Suggested fix
Have
pg0 info/pg0 list/therunningproperty perform a lightweight real connectivity check (e.g. attempt a trivial query via the bundledpsql, or open a raw libpq connection) rather than relying solely on process-alive + port-listening state.Environment