Skip to content

feat: track restore stage in _pgro.restore_info#25

Merged
passcod merged 2 commits intomainfrom
feat/restore-stage-tracking
Apr 23, 2026
Merged

feat: track restore stage in _pgro.restore_info#25
passcod merged 2 commits intomainfrom
feat/restore-stage-tracking

Conversation

@passcod
Copy link
Copy Markdown
Member

@passcod passcod commented Apr 23, 2026

Adds stage and last_transition_time columns to _pgro.restore_info.Init script sets stage to restored when a post-restore reindex ispending, or ready otherwise. The background reindex block transitionsto reindexing before starting and ready after completion.

🦸 Review Hero

  • Run Review Hero
  • Auto-fix review suggestions
  • Auto-fix CI failures

passcod and others added 2 commits April 22, 2026 18:34
Adds `stage` and `last_transition_time` columns to `_pgro.restore_info`.
Init script sets stage to `restored` when a post-restore reindex is
pending, or `ready` otherwise. The background reindex block transitions
to `reindexing` before starting and `ready` after completion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@passcod passcod changed the title move to AGENTS.md feat: track restore stage in _pgro.restore_info Apr 23, 2026
Comment thread src/controllers/restore/builders.rs
@review-hero
Copy link
Copy Markdown

review-hero Bot commented Apr 23, 2026

🦸 Review Hero Summary
6 agents reviewed this PR | 1 critical | 0 suggestions | 0 nitpicks | Filtering: consensus 3 voters, 2 below threshold

Below consensus threshold (2 unique issues not confirmed by majority)
Location Agent Severity Comment
src/controllers/restore/builders.rs:988 Bugs & Correctness critical The rm -f /pgdata/needs-reindex executes on line 988 before the SET stage = 'ready' psql call on line 989. If the process is killed in that window, the needs-reindex file is gone but the stag...
src/controllers/restore/tests.rs:378 Bugs & Correctness nitpick script.find("REINDEX INDEX").unwrap() (without .expect(...)) will panic with a generic index-out-of-bounds message if the string is absent, making test failures hard to diagnose. Use `.expect("...
Local fix prompt (copy to your coding agent)

Fix these issues identified on the pull request. One commit per issue fixed.


src/controllers/restore/builders.rs:989: The stage = 'ready' UPDATE fires unconditionally after the reindex loop, even when individual REINDEX commands fail. Lines 982 and 984 use || true to suppress failures, so a corrupted/invalid index is silently ignored and the stage still reports 'ready'. A client reading stage='ready' would believe the database is fully usable when it may have invalid indexes. Fix: track failures with a flag variable (e.g. REINDEX_OK=1, clear it on failure) and only update to 'ready' if the flag is still set; otherwise update to a distinct error state or leave as 'reindexing' so the problem is visible.

restored_at = now();
restored_at = now(),
stage = EXCLUDED.stage,
last_transition_time = now();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bugs & Correctness] suggestion

The ON CONFLICT clause always resets last_transition_time = now() regardless of whether stage actually changed. If the init container re-runs on a pod restart after a restore has already completed (e.g. the same stage value is upserted), last_transition_time gets bumped to the restart time, making it look like a stage transition just occurred. Fix: last_transition_time = CASE WHEN _pgro.restore_info.stage IS DISTINCT FROM EXCLUDED.stage THEN now() ELSE _pgro.restore_info.last_transition_time END

PG_MAJOR=$(cat /pgdata/pgdata/PG_VERSION)
(
while ! pg_isready -q -U postgres -d postgres; do sleep 2; done
psql -U postgres -d postgres -c "UPDATE _pgro.restore_info SET stage = 'reindexing', last_transition_time = now() WHERE id = 1;"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bugs & Correctness] suggestion

The 'reindexing' and 'ready' UPDATE statements in the postgres container silently affect 0 rows if the _pgro.restore_info row is missing (e.g. the init container's psql heredoc failed partway through — psql does not abort on statement error by default). Stage tracking is then permanently lost with no indication of failure. Consider adding || echo 'WARNING: stage update affected 0 rows' after each UPDATE, or asserting row count via a DO block.

@review-hero
Copy link
Copy Markdown

review-hero Bot commented Apr 23, 2026

🦸 Review Hero Summary
6 agents reviewed this PR | 0 critical | 2 suggestions | 0 nitpicks | Filtering: consensus 3 voters, 2 below threshold

Below consensus threshold (2 unique issues not confirmed by majority)
Location Agent Severity Comment
src/controllers/restore/builders.rs:989 Bugs & Correctness suggestion Every REINDEX call uses || true (lines 982/984) so individual failures are silently swallowed, yet stage = 'ready' is set unconditionally after the loop. A database with partially-failed reinde...
src/controllers/restore/tests.rs:346 Bugs & Correctness suggestion The assertion script.contains("PGRO_STAGE=restored") && script.contains("PGRO_STAGE=ready") only checks that both strings exist somewhere in the script — it does not verify which branch sets whic...
Local fix prompt (copy to your coding agent)

Fix these issues identified on the pull request. One commit per issue fixed.


src/controllers/restore/builders.rs:814: The ON CONFLICT clause always resets last_transition_time = now() regardless of whether stage actually changed. If the init container re-runs on a pod restart after a restore has already completed (e.g. the same stage value is upserted), last_transition_time gets bumped to the restart time, making it look like a stage transition just occurred. Fix: last_transition_time = CASE WHEN _pgro.restore_info.stage IS DISTINCT FROM EXCLUDED.stage THEN now() ELSE _pgro.restore_info.last_transition_time END


src/controllers/restore/builders.rs:966: The 'reindexing' and 'ready' UPDATE statements in the postgres container silently affect 0 rows if the _pgro.restore_info row is missing (e.g. the init container's psql heredoc failed partway through — psql does not abort on statement error by default). Stage tracking is then permanently lost with no indication of failure. Consider adding || echo 'WARNING: stage update affected 0 rows' after each UPDATE, or asserting row count via a DO block.

@passcod passcod merged commit 39d589c into main Apr 23, 2026
33 checks passed
@passcod passcod deleted the feat/restore-stage-tracking branch April 23, 2026 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant