feat: postgres database resource#292
feat: postgres database resource#292jason-lynch wants to merge 1 commit intofeat/PLAT-417/resource-type-dependenciesfrom
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request introduces a new PostgresDatabaseResource type to manage per-database lifecycle operations, refactors resource dependencies to reference specific databases rather than just nodes, simplifies instance initialization by deferring database-specific setup, and extends the orchestration framework to track database ownership and names across nodes and resources. Multiple resources now carry DatabaseName fields, and resource dependencies are updated to reflect database-scoped relationships. Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
a4bfd8b to
0af2711
Compare
887b1a8 to
daeb238
Compare
0af2711 to
9df990e
Compare
20b19ea to
b768409
Compare
ae1a213 to
dc3adc0
Compare
b768409 to
de8a54c
Compare
dc3adc0 to
31a709a
Compare
de8a54c to
e753e90
Compare
31a709a to
f647f73
Compare
e753e90 to
e47f053
Compare
f647f73 to
29af08f
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
server/internal/postgres/roles.go (1)
98-111:⚠️ Potential issue | 🔴 CriticalCopy-paste bug: condition checks wrong role name.
The conditional statement at line 102 checks for
'pgedge_application'but creates'pgedge_application_read_only'at line 105. This is inconsistent and will fail to create the read-only role ifpgedge_applicationalready exists.🐛 Proposed fix
func CreateApplicationReadOnlyRole(opts BuiltinRoleOptions) Statements { statements := Statements{ ConditionalStatement{ If: Query[bool]{ - SQL: "SELECT NOT EXISTS (SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = 'pgedge_application');", + SQL: "SELECT NOT EXISTS (SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = 'pgedge_application_read_only');", }, Then: Statement{ SQL: "CREATE ROLE pgedge_application_read_only WITH NOLOGIN;", }, }, } return statements }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/postgres/roles.go` around lines 98 - 111, The condition in CreateApplicationReadOnlyRole uses the wrong role name; update the Query inside the ConditionalStatement so it checks for NOT EXISTS (... WHERE rolname = 'pgedge_application_read_only') to match the role created by the Statement (pgedge_application_read_only), ensuring the conditional creates the read-only role when it doesn't already exist.server/internal/database/sync_event_resource.go (1)
23-27:⚠️ Potential issue | 🟠 MajorWire
DatabaseNameinto identity and connection selection.Right now this new field is only stored.
Identifier()still keys the resource byproviderNode + subscriberNode, andRefresh()still connects withprovider.Spec.DatabaseName. That means database-scoped sync events will collide for the same node pair and can still run against the wrong database.Suggested fix
-func SyncEventResourceIdentifier(providerNode, subscriberNode string) resource.Identifier { +func SyncEventResourceIdentifier(providerNode, subscriberNode, databaseName string) resource.Identifier { return resource.Identifier{ - ID: providerNode + subscriberNode, + ID: providerNode + subscriberNode + "-" + databaseName, Type: ResourceTypeSyncEvent, } } @@ func (r *SyncEventResource) Identifier() resource.Identifier { - return SyncEventResourceIdentifier(r.ProviderNode, r.SubscriberNode) + return SyncEventResourceIdentifier(r.ProviderNode, r.SubscriberNode, r.DatabaseName) } @@ - providerConn, err := provider.Connection(ctx, rc, provider.Spec.DatabaseName) + providerConn, err := provider.Connection(ctx, rc, r.DatabaseName) if err != nil { - return fmt.Errorf("failed to connect to provider database %q: %w", provider.Spec.DatabaseName, err) + return fmt.Errorf("failed to connect to provider database %q: %w", r.DatabaseName, err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/database/sync_event_resource.go` around lines 23 - 27, The resource's identity and connection selection must include the new DatabaseName field: update Identifier() to incorporate DatabaseName (e.g., include r.DatabaseName in the composite key returned so keys are unique per providerNode+subscriberNode+database), and update Refresh() to use the resource's DatabaseName when selecting/creating the provider connection instead of using provider.Spec.DatabaseName (ensure any connection lookup or call uses r.DatabaseName). Also update any equality/hash logic that builds the identity string in Identifier() to reflect the added field so database-scoped sync events no longer collide.server/internal/database/wait_for_sync_event_resource.go (1)
19-24:⚠️ Potential issue | 🟠 Major
DatabaseNamestill isn’t part of this resource’s identity or execution path.The new field is stored on the struct, but the identifier, dependency lookup, and connection target still key off only the node pair. Two databases replicated between the same provider/subscriber nodes will collapse into one
WaitForSyncEventResource, andRefresh()can wait on the wrong database by usingsubscriber.Spec.DatabaseNameinstead ofr.DatabaseName.Also applies to: 44-52, 59-68
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/database/wait_for_sync_event_resource.go` around lines 19 - 24, The resource identity and execution must include DatabaseName: change WaitForSyncEventResourceIdentifier to incorporate the database name into the ID (e.g., providerNode + subscriberNode + ":" + databaseName) and update any places that build or compare resource identifiers (including the other functions mentioned around the 44-52 and 59-68 diffs) to use this new identifier format; in Refresh() and any dependency lookup or connection logic replace uses of subscriber.Spec.DatabaseName with r.DatabaseName so the wait targets the resource's own DatabaseName rather than the subscriber spec.server/internal/workflows/plan_restore.go (1)
92-103:⚠️ Potential issue | 🟠 MajorPass node-level restore config to the activity, not the workflow-level config.
The
nodeparameter already carriesnode.RestoreConfig, but line 102 forwards the workflow-levelrestoreConfigtoGetRestoreResourcesInput. Since the activity usesinput.RestoreConfigdirectly (line 57 ofget_restore_resources.go), node-level restore overrides are lost. ChangeRestoreConfig: restoreConfig,toRestoreConfig: node.RestoreConfig,.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/workflows/plan_restore.go` around lines 92 - 103, The workflow currently passes the workflow-level restoreConfig into the activity, causing node-level overrides to be ignored; in the loop that constructs the GetRestoreResourcesInput (when creating in := &activities.GetRestoreResourcesInput{...}) replace RestoreConfig: restoreConfig with RestoreConfig: node.RestoreConfig so the activity (used by GetRestoreResources) receives the node-specific restore settings; locate this in the block that builds nodeRestore (NodeRestoreResources) and the inner loop over node.Instances where in is created using taskID and instance.server/internal/orchestrator/swarm/service_user_role.go (1)
118-155:⚠️ Potential issue | 🔴 CriticalExecute schema and table grants against
r.DatabaseName, not thepostgresdatabase.
connectToPrimary()connects to thepostgresdatabase (line 257), so lines 147–149 grantUSAGEandSELECTon thepublicschema withinpostgres, not withinr.DatabaseName. This leaves the service role withCONNECTpermission onr.DatabaseNamebut no ability to access tables there. Open a second connection targetingr.DatabaseNamefor the schema/table grant statements.Additionally, line 149's
ALTER DEFAULT PRIVILEGESlacksFOR ROLE, so it only affects future objects created by the executing admin role. If the database owner creates tables, the default privileges will not apply to them. AddFOR ROLE <owner>if future owner-created tables should inherit these privileges.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/orchestrator/swarm/service_user_role.go` around lines 118 - 155, The schema/table GRANTs are being executed against the connection returned by connectToPrimary (which targets the postgres DB) so USAGE/SELECT/ALTER DEFAULT PRIVILEGES run on postgres instead of r.DatabaseName; open a second connection targeting r.DatabaseName (e.g., add a new conn2 via a connect-to-database call or extend connectToPrimary to accept a database name) and execute the grants variable (the statements that currently build GRANT USAGE/SELECT and ALTER DEFAULT PRIVILEGES) against that connection, ensuring you Close() it; also change the ALTER DEFAULT PRIVILEGES statement to include FOR ROLE <owner> (use the actual database owner variable or query the DB owner) so it affects objects created by the database owner rather than only the admin executing the command.server/internal/database/operations/update_nodes_test.go (1)
153-153:⚠️ Potential issue | 🟡 MinorBug:
t.Name()should betc.namefor sub-test naming.Using
t.Name()returns the parent test function name (TestUpdateNode), not the individual test case name. This means all sub-tests will have the same name, making test output confusing and potentially causing issues with test filtering/reporting.Note: Line 377 in
TestRollingUpdateNodeshas the same issue, while line 593 inTestConcurrentUpdateNodescorrectly usestc.name.Proposed fix
- t.Run(t.Name(), func(t *testing.T) { + t.Run(tc.name, func(t *testing.T) {And similarly at line 377:
- t.Run(t.Name(), func(t *testing.T) { + t.Run(tc.name, func(t *testing.T) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/database/operations/update_nodes_test.go` at line 153, Sub-tests inside TestUpdateNode (and the similar case in TestRollingUpdateNodes) are using t.Run(t.Name(), ...) which yields the parent test name for every case; change those calls to use the table case identifier tc.name (i.e., replace t.Run(t.Name(), func... ) with t.Run(tc.name, func... ) ) so each sub-test is uniquely named — verify TestConcurrentUpdateNodes already uses tc.name as reference.server/internal/database/operations/populate_nodes_test.go (1)
166-176:⚠️ Potential issue | 🟡 MinorBug: Test uses
t.Name()instead oftc.namefor subtest naming.Using
t.Name()returns the parent test's name, causing all subtests to have the same name. This makes it harder to identify which test case failed and may cause test filtering issues.🐛 Proposed fix
- t.Run(t.Name(), func(t *testing.T) { + t.Run(tc.name, func(t *testing.T) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/database/operations/populate_nodes_test.go` around lines 166 - 176, The subtests in populate_nodes_test.go call t.Run(t.Name(), ...) so every case gets the parent test name; change the subtest name to use the test-case identifier (tc.name) in the t.Run call so each subtest is uniquely named, e.g. t.Run(tc.name, func(t *testing.T) { ... }), ensuring you reference the table-driven test variable tc and its name field around the call to operations.PopulateNode.
🧹 Nitpick comments (5)
docker/control-plane-dev/config.json (1)
4-8: Consider if "error" level is appropriate for a dev environment.Setting the
api_serverlog level to "error" will suppress warnings, info, and debug messages. Since this is a development configuration, you may want to use a more verbose level (e.g., "debug" or "info") to aid in troubleshooting and development.If this is intentional to reduce log noise, consider documenting the rationale.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/control-plane-dev/config.json` around lines 4 - 8, The current dev config sets the api_server log level to "error" which will hide warnings/info/debug; update the "logging.component_levels.api_server" value to a more verbose level (e.g., "info" or "debug") in the config and, if keeping "error" intentionally, add a brief comment or documentation note explaining the rationale; locate the "logging" -> "component_levels" -> "api_server" key in the JSON and change the value or add accompanying documentation where config is defined.server/internal/postgres/roles.go (1)
152-162: Consider extracting shared schema logic.
BuiltinRolePrivilegeOptions.Schemas()(lines 157-162) is identical toBuiltinRoleOptions.Schemas()(lines 66-71). Consider extracting an embedded struct or a shared helper to reduce duplication.♻️ Possible refactor using embedded struct
+type SchemaConfig struct { + ExtraSchemas []string +} + +func (s SchemaConfig) Schemas() []string { + var schemas []string + schemas = append(schemas, defaultSchemas...) + schemas = append(schemas, s.ExtraSchemas...) + return schemas +} + type BuiltinRoleOptions struct { - PGVersion string - ExtraSchemas []string -} - -func (o BuiltinRoleOptions) Schemas() []string { - var schemas []string - schemas = append(schemas, defaultSchemas...) - schemas = append(schemas, o.ExtraSchemas...) - return schemas + PGVersion string + SchemaConfig } type BuiltinRolePrivilegeOptions struct { - DBName string - ExtraSchemas []string -} - -func (o BuiltinRolePrivilegeOptions) Schemas() []string { - var schemas []string - schemas = append(schemas, defaultSchemas...) - schemas = append(schemas, o.ExtraSchemas...) - return schemas + DBName string + SchemaConfig }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/postgres/roles.go` around lines 152 - 162, BuiltinRolePrivilegeOptions.Schemas() duplicates BuiltinRoleOptions.Schemas() — extract the shared logic into a single helper or embedded type and have both structs reuse it; for example create a small SchemaOptions struct (holding ExtraSchemas) with a Schemas() method that appends defaultSchemas and ExtraSchemas, or add a package-level function SchemasFor(extra []string) []string that returns append(defaultSchemas, extra...), then replace the duplicate implementations in BuiltinRolePrivilegeOptions and BuiltinRoleOptions to call the shared Schemas implementation and remove the duplicated method bodies.server/internal/database/instance_resource.go (1)
137-143: Consider updating error message for consistency.Similar to the
initializeInstancemethod, theConnectionmethod connects todbName(line 138) but the error message (line 142) referencesr.Spec.DatabaseName, which may not match the actual database being connected to.Proposed fix
conn, err := ConnectToInstance(ctx, &ConnectionOptions{ DSN: r.ConnectionInfo.AdminDSN(dbName), TLS: tlsCfg, }) if err != nil { - return nil, fmt.Errorf("failed to connect to database %q: %w", r.Spec.DatabaseName, err) + return nil, fmt.Errorf("failed to connect to database %q: %w", dbName, err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/database/instance_resource.go` around lines 137 - 143, The error message in the Connection method uses r.Spec.DatabaseName while the attempted connection is to the local variable dbName; update the fmt.Errorf call to reference dbName (the actual database passed to ConnectToInstance) for consistency and accurate diagnostics—locate the ConnectToInstance call and the fmt.Errorf that returns "failed to connect to database" and replace the r.Spec.DatabaseName reference with dbName.server/internal/database/operations/helpers_test.go (1)
69-82: Add one fixture that exercises non-nildatabaseDependencies.This helper always passes
nilfor the new constructor argument, so the planner tests in this file still never exercise the extra dependency layer introduced byNewInstanceResources. A regression that dropsPostgresDatabaseordering would still pass here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/database/operations/helpers_test.go` around lines 69 - 82, The test helper always calls NewInstanceResources with a nil third argument, so it never exercises non-nil databaseDependencies; update the fixture(s) that call NewInstanceResources in helpers_test.go to include at least one case where the third parameter is a non-nil dependencies slice (e.g., a realistic databaseDependencies value or a minimal list containing a PostgresDatabase dependency) so tests hit the extra dependency layer introduced by NewInstanceResources (locate calls to NewInstanceResources in this file and add a non-nil dependencies argument to one fixture).server/internal/database/replication_slot_advance_from_cts_resource.go (1)
18-24: Identifier does not includeDatabaseName- potential collision with multi-database support.The
ReplicationSlotAdvanceFromCTSResourceIdentifieronly usesproviderNode + subscriberNode, but the resource now has aDatabaseNamefield. If multiple databases are ever created per node (as mentioned in the PR objectives for future multi-database support), resources for different databases would have colliding identifiers.The PR summary mentions "database name was not incorporated into resource identities yet," so this may be intentional for now, but it's worth tracking for when multi-database support is fully enabled.
Future consideration: Include DatabaseName in identifier
-func ReplicationSlotAdvanceFromCTSResourceIdentifier(providerNode, subscriberNode string) resource.Identifier { +func ReplicationSlotAdvanceFromCTSResourceIdentifier(databaseName, providerNode, subscriberNode string) resource.Identifier { return resource.Identifier{ Type: ResourceTypeReplicationSlotAdvanceFromCTS, - ID: providerNode + subscriberNode, + ID: databaseName + "-" + providerNode + "-" + subscriberNode, } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/database/replication_slot_advance_from_cts_resource.go` around lines 18 - 24, The identifier function ReplicationSlotAdvanceFromCTSResourceIdentifier currently builds IDs from providerNode + subscriberNode which can collide across databases; update this function to incorporate the resource's DatabaseName (e.g., concatenate providerNode, subscriberNode and DatabaseName with a stable separator) so IDs are unique per-database; ensure you use the ResourceTypeReplicationSlotAdvanceFromCTS as before and pick a separator that cannot appear in node or db names (or escape values) so existing consumers of resource.Identifier remain correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/internal/database/instance_resource.go`:
- Around line 183-189: The error message incorrectly references
r.Spec.DatabaseName even though ConnectToInstance is connecting to the
"postgres" admin DB via r.ConnectionInfo.AdminDSN("postgres"); update the error
to reference the actual target (e.g., "postgres" or derive the admin DB name
from r.ConnectionInfo.AdminDSN call) so the fmt.Errorf in the ConnectToInstance
error branch accurately names the database being connected to and not
r.Spec.DatabaseName.
In `@server/internal/database/lag_tracker_commit_ts_resource.go`:
- Line 29: The resource's DatabaseName field is not being used for identity or
connection, causing different databases on the same node pair to collide; update
the Identifier() method to include DatabaseName in the key (alongside
origin/receiver), update the resource constructor/path so DatabaseName is
threaded into the created lag-tracker resource (replace any use of
instance.Spec.DatabaseName in constructors with the resource's DatabaseName),
and modify Refresh() (and any place that dials/connects, e.g., the code
currently using instance.Spec.DatabaseName) to use r.DatabaseName as the
connection target; ensure the same change is applied where the
constructor/initialization happens (lines around the existing ctor usage noted
at 60-61).
In `@server/internal/database/operations/common.go`:
- Around line 83-86: The error message is misleading: when state.AddResource(db)
fails we're adding a PostgresDatabaseResource (db), so update the fmt.Errorf
message in the failure branch to explicitly reference the
PostgresDatabaseResource or database resource (e.g., "failed to add
PostgresDatabaseResource to state" or "failed to add database resource to
state") so logs correctly reflect the operation performed (locate the call to
state.AddResource(db) and change the error text returned alongside the wrapped
err).
In `@server/internal/database/replication_slot_resource.go`:
- Line 29: The DatabaseName field was added but not threaded into resource
identity and deletion: update the Identifier() method to include the
DatabaseName (so the resource key includes provider, subscriber and
DatabaseName) and update Delete() to use the resource's DatabaseName (not
provider.Spec.DatabaseName) when looking up and deleting the replication
slot/state; ensure any state keys, comparisons and List/Delete calls in
Identifier(), Delete(), and any helper that builds the replication-slot key use
the DatabaseName field so multiple DBs on the same provider/subscriber are kept
distinct.
In `@server/internal/database/spec.go`:
- Around line 587-593: NodeInstances() currently picks the first user with
DBOwner==true from DatabaseUsers (setting DatabaseOwner) without validation,
which makes ownership order-dependent or empty; add validation in
validateDatabaseSpec() (or equivalent service-layer validation) to assert
exactly one DatabaseUsers entry has DBOwner==true and return an error if count
!= 1 so NodeInstances() can assume a single owner; update any callers/tests to
handle the validation error and reference DatabaseUsers, DBOwner, DatabaseOwner,
NodeInstances(), and validateDatabaseSpec() when implementing the check.
In `@server/internal/postgres/dsn.go`:
- Around line 35-66: The iteration over d.Extra in DSN.Fields() makes
DSN.String() nondeterministic; before the for loop that calls addField for each
entry in d.Extra, collect the map keys, sort them (e.g., using sort.Strings),
and then iterate the sorted keys to append fields in a stable order; update
references in DSN.Fields() (and any other place that serializes d.Extra) so
addField is invoked in key-sorted order to ensure deterministic DSN.String()
output.
---
Outside diff comments:
In `@server/internal/database/operations/populate_nodes_test.go`:
- Around line 166-176: The subtests in populate_nodes_test.go call
t.Run(t.Name(), ...) so every case gets the parent test name; change the subtest
name to use the test-case identifier (tc.name) in the t.Run call so each subtest
is uniquely named, e.g. t.Run(tc.name, func(t *testing.T) { ... }), ensuring you
reference the table-driven test variable tc and its name field around the call
to operations.PopulateNode.
In `@server/internal/database/operations/update_nodes_test.go`:
- Line 153: Sub-tests inside TestUpdateNode (and the similar case in
TestRollingUpdateNodes) are using t.Run(t.Name(), ...) which yields the parent
test name for every case; change those calls to use the table case identifier
tc.name (i.e., replace t.Run(t.Name(), func... ) with t.Run(tc.name, func... ) )
so each sub-test is uniquely named — verify TestConcurrentUpdateNodes already
uses tc.name as reference.
In `@server/internal/database/sync_event_resource.go`:
- Around line 23-27: The resource's identity and connection selection must
include the new DatabaseName field: update Identifier() to incorporate
DatabaseName (e.g., include r.DatabaseName in the composite key returned so keys
are unique per providerNode+subscriberNode+database), and update Refresh() to
use the resource's DatabaseName when selecting/creating the provider connection
instead of using provider.Spec.DatabaseName (ensure any connection lookup or
call uses r.DatabaseName). Also update any equality/hash logic that builds the
identity string in Identifier() to reflect the added field so database-scoped
sync events no longer collide.
In `@server/internal/database/wait_for_sync_event_resource.go`:
- Around line 19-24: The resource identity and execution must include
DatabaseName: change WaitForSyncEventResourceIdentifier to incorporate the
database name into the ID (e.g., providerNode + subscriberNode + ":" +
databaseName) and update any places that build or compare resource identifiers
(including the other functions mentioned around the 44-52 and 59-68 diffs) to
use this new identifier format; in Refresh() and any dependency lookup or
connection logic replace uses of subscriber.Spec.DatabaseName with
r.DatabaseName so the wait targets the resource's own DatabaseName rather than
the subscriber spec.
In `@server/internal/orchestrator/swarm/service_user_role.go`:
- Around line 118-155: The schema/table GRANTs are being executed against the
connection returned by connectToPrimary (which targets the postgres DB) so
USAGE/SELECT/ALTER DEFAULT PRIVILEGES run on postgres instead of r.DatabaseName;
open a second connection targeting r.DatabaseName (e.g., add a new conn2 via a
connect-to-database call or extend connectToPrimary to accept a database name)
and execute the grants variable (the statements that currently build GRANT
USAGE/SELECT and ALTER DEFAULT PRIVILEGES) against that connection, ensuring you
Close() it; also change the ALTER DEFAULT PRIVILEGES statement to include FOR
ROLE <owner> (use the actual database owner variable or query the DB owner) so
it affects objects created by the database owner rather than only the admin
executing the command.
In `@server/internal/postgres/roles.go`:
- Around line 98-111: The condition in CreateApplicationReadOnlyRole uses the
wrong role name; update the Query inside the ConditionalStatement so it checks
for NOT EXISTS (... WHERE rolname = 'pgedge_application_read_only') to match the
role created by the Statement (pgedge_application_read_only), ensuring the
conditional creates the read-only role when it doesn't already exist.
In `@server/internal/workflows/plan_restore.go`:
- Around line 92-103: The workflow currently passes the workflow-level
restoreConfig into the activity, causing node-level overrides to be ignored; in
the loop that constructs the GetRestoreResourcesInput (when creating in :=
&activities.GetRestoreResourcesInput{...}) replace RestoreConfig: restoreConfig
with RestoreConfig: node.RestoreConfig so the activity (used by
GetRestoreResources) receives the node-specific restore settings; locate this in
the block that builds nodeRestore (NodeRestoreResources) and the inner loop over
node.Instances where in is created using taskID and instance.
---
Nitpick comments:
In `@docker/control-plane-dev/config.json`:
- Around line 4-8: The current dev config sets the api_server log level to
"error" which will hide warnings/info/debug; update the
"logging.component_levels.api_server" value to a more verbose level (e.g.,
"info" or "debug") in the config and, if keeping "error" intentionally, add a
brief comment or documentation note explaining the rationale; locate the
"logging" -> "component_levels" -> "api_server" key in the JSON and change the
value or add accompanying documentation where config is defined.
In `@server/internal/database/instance_resource.go`:
- Around line 137-143: The error message in the Connection method uses
r.Spec.DatabaseName while the attempted connection is to the local variable
dbName; update the fmt.Errorf call to reference dbName (the actual database
passed to ConnectToInstance) for consistency and accurate diagnostics—locate the
ConnectToInstance call and the fmt.Errorf that returns "failed to connect to
database" and replace the r.Spec.DatabaseName reference with dbName.
In `@server/internal/database/operations/helpers_test.go`:
- Around line 69-82: The test helper always calls NewInstanceResources with a
nil third argument, so it never exercises non-nil databaseDependencies; update
the fixture(s) that call NewInstanceResources in helpers_test.go to include at
least one case where the third parameter is a non-nil dependencies slice (e.g.,
a realistic databaseDependencies value or a minimal list containing a
PostgresDatabase dependency) so tests hit the extra dependency layer introduced
by NewInstanceResources (locate calls to NewInstanceResources in this file and
add a non-nil dependencies argument to one fixture).
In `@server/internal/database/replication_slot_advance_from_cts_resource.go`:
- Around line 18-24: The identifier function
ReplicationSlotAdvanceFromCTSResourceIdentifier currently builds IDs from
providerNode + subscriberNode which can collide across databases; update this
function to incorporate the resource's DatabaseName (e.g., concatenate
providerNode, subscriberNode and DatabaseName with a stable separator) so IDs
are unique per-database; ensure you use the
ResourceTypeReplicationSlotAdvanceFromCTS as before and pick a separator that
cannot appear in node or db names (or escape values) so existing consumers of
resource.Identifier remain correct.
In `@server/internal/postgres/roles.go`:
- Around line 152-162: BuiltinRolePrivilegeOptions.Schemas() duplicates
BuiltinRoleOptions.Schemas() — extract the shared logic into a single helper or
embedded type and have both structs reuse it; for example create a small
SchemaOptions struct (holding ExtraSchemas) with a Schemas() method that appends
defaultSchemas and ExtraSchemas, or add a package-level function
SchemasFor(extra []string) []string that returns append(defaultSchemas,
extra...), then replace the duplicate implementations in
BuiltinRolePrivilegeOptions and BuiltinRoleOptions to call the shared Schemas
implementation and remove the duplicated method bodies.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a84ae357-8fd4-49e5-877f-563e790fe1cc
📒 Files selected for processing (64)
docker/control-plane-dev/config.jsonserver/internal/database/instance_resource.goserver/internal/database/lag_tracker_commit_ts_resource.goserver/internal/database/node_resource.goserver/internal/database/operations/add_nodes.goserver/internal/database/operations/add_nodes_test.goserver/internal/database/operations/common.goserver/internal/database/operations/end.goserver/internal/database/operations/golden_test/TestRestoreDatabase/restore_all_nodes_in_two-node_db.jsonserver/internal/database/operations/golden_test/TestRestoreDatabase/single_node_restore.jsonserver/internal/database/operations/golden_test/TestRestoreDatabase/single_node_restore_in_two-node_db.jsonserver/internal/database/operations/golden_test/TestRestoreDatabase/single_node_restore_in_two-node_db_with_replica.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/add_an_instance_dependency.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/add_an_instance_dependency_with_forced_update.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/add_update_and_remove_node.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/adding_a_replica.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/adding_multiple_replicas_concurrent.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/adding_multiple_replicas_rolling.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/delete_two_node_database.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/forced_update.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/multiple_nodes_from_empty.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/one_node_to_two_nodes_with_default_source_node.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/one_node_to_two_nodes_with_populate.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/one_node_to_two_nodes_with_populate_and_replica.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/remove_node.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/remove_one_node_from_three_node_database.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/remove_two_nodes_from_three_node_database.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/single_node_from_empty.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/single_node_with_service_from_empty.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/two_nodes_to_three_nodes_with_populate.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/update_database_node_with_unchanged_service.jsonserver/internal/database/operations/helpers_test.goserver/internal/database/operations/populate_nodes.goserver/internal/database/operations/populate_nodes_test.goserver/internal/database/operations/restore_database.goserver/internal/database/operations/restore_database_test.goserver/internal/database/operations/update_database_test.goserver/internal/database/operations/update_nodes.goserver/internal/database/operations/update_nodes_test.goserver/internal/database/orchestrator.goserver/internal/database/postgres_database.goserver/internal/database/replication_slot_advance_from_cts_resource.goserver/internal/database/replication_slot_create_resource.goserver/internal/database/replication_slot_resource.goserver/internal/database/resources.goserver/internal/database/spec.goserver/internal/database/subscription_resource.goserver/internal/database/switchover_resource.goserver/internal/database/sync_event_resource.goserver/internal/database/wait_for_sync_event_resource.goserver/internal/monitor/instance_monitor_resource.goserver/internal/orchestrator/swarm/orchestrator.goserver/internal/orchestrator/swarm/service_user_role.goserver/internal/postgres/create_db.goserver/internal/postgres/dsn.goserver/internal/postgres/errors.goserver/internal/postgres/roles.goserver/internal/postgres/roles_test.goserver/internal/resource/event_test.goserver/internal/resource/resource.goserver/internal/workflows/activities/apply_event.goserver/internal/workflows/activities/executor.goserver/internal/workflows/common.goserver/internal/workflows/plan_restore.go
💤 Files with no reviewable changes (1)
- server/internal/postgres/roles_test.go
| conn, err := ConnectToInstance(ctx, &ConnectionOptions{ | ||
| DSN: r.ConnectionInfo.AdminDSN(r.Spec.DatabaseName), | ||
| DSN: r.ConnectionInfo.AdminDSN("postgres"), | ||
| TLS: tlsCfg, | ||
| }) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to connect to database %q: %w", r.Spec.DatabaseName, err) | ||
| } |
There was a problem hiding this comment.
Error message references wrong database name.
The connection is made to the "postgres" database (line 184), but the error message at line 188 still references r.Spec.DatabaseName. This could mislead debugging efforts.
Proposed fix
conn, err := ConnectToInstance(ctx, &ConnectionOptions{
DSN: r.ConnectionInfo.AdminDSN("postgres"),
TLS: tlsCfg,
})
if err != nil {
- return fmt.Errorf("failed to connect to database %q: %w", r.Spec.DatabaseName, err)
+ return fmt.Errorf("failed to connect to postgres database: %w", err)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| conn, err := ConnectToInstance(ctx, &ConnectionOptions{ | |
| DSN: r.ConnectionInfo.AdminDSN(r.Spec.DatabaseName), | |
| DSN: r.ConnectionInfo.AdminDSN("postgres"), | |
| TLS: tlsCfg, | |
| }) | |
| if err != nil { | |
| return fmt.Errorf("failed to connect to database %q: %w", r.Spec.DatabaseName, err) | |
| } | |
| conn, err := ConnectToInstance(ctx, &ConnectionOptions{ | |
| DSN: r.ConnectionInfo.AdminDSN("postgres"), | |
| TLS: tlsCfg, | |
| }) | |
| if err != nil { | |
| return fmt.Errorf("failed to connect to postgres database: %w", err) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/internal/database/instance_resource.go` around lines 183 - 189, The
error message incorrectly references r.Spec.DatabaseName even though
ConnectToInstance is connecting to the "postgres" admin DB via
r.ConnectionInfo.AdminDSN("postgres"); update the error to reference the actual
target (e.g., "postgres" or derive the admin DB name from
r.ConnectionInfo.AdminDSN call) so the fmt.Errorf in the ConnectToInstance error
branch accurately names the database being connected to and not
r.Spec.DatabaseName.
| err := state.AddResource(db) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to add node resources to state: %w", err) | ||
| } |
There was a problem hiding this comment.
Misleading error message.
The error message says "failed to add node resources" but the code is adding a PostgresDatabaseResource. This could confuse debugging efforts.
📝 Proposed fix
err := state.AddResource(db)
if err != nil {
- return nil, fmt.Errorf("failed to add node resources to state: %w", err)
+ return nil, fmt.Errorf("failed to add database resource to state: %w", err)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| err := state.AddResource(db) | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to add node resources to state: %w", err) | |
| } | |
| err := state.AddResource(db) | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to add database resource to state: %w", err) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/internal/database/operations/common.go` around lines 83 - 86, The
error message is misleading: when state.AddResource(db) fails we're adding a
PostgresDatabaseResource (db), so update the fmt.Errorf message in the failure
branch to explicitly reference the PostgresDatabaseResource or database resource
(e.g., "failed to add PostgresDatabaseResource to state" or "failed to add
database resource to state") so logs correctly reflect the operation performed
(locate the call to state.AddResource(db) and change the error text returned
alongside the wrapped err).
| // ensures the corresponding replication slot is dropped on the provider, | ||
| // preventing orphaned slots from accumulating WAL. | ||
| type ReplicationSlotResource struct { | ||
| DatabaseName string `json:"database_name"` |
There was a problem hiding this comment.
Finish threading DatabaseName through replication-slot identity and delete.
Lines 29 and 52 make this resource database-scoped, but Identifier() still collapses every database on the same provider/subscriber pair to one resource, and Delete() still uses Lines 82 and 88's provider.Spec.DatabaseName. With more than one database, cleanup can target the wrong slot and the wrong state entry.
Also applies to: 50-53
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/internal/database/replication_slot_resource.go` at line 29, The
DatabaseName field was added but not threaded into resource identity and
deletion: update the Identifier() method to include the DatabaseName (so the
resource key includes provider, subscriber and DatabaseName) and update Delete()
to use the resource's DatabaseName (not provider.Spec.DatabaseName) when looking
up and deleting the replication slot/state; ensure any state keys, comparisons
and List/Delete calls in Identifier(), Delete(), and any helper that builds the
replication-slot key use the DatabaseName field so multiple DBs on the same
provider/subscriber are kept distinct.
There was a problem hiding this comment.
Lines 29 and 52 make this resource database-scoped, but Identifier() still collapses every database on the same provider/subscriber pair to one resource
I covered this in both the commit message and the PR description. I intentionally did not do this because changing these identities will have a destructive effect on existing databases. We'll need a migration process to do this safely, and since our API contract still only allows one Postgres database per control plane database, we can defer that work. For this PR, I'm only doing the necessary work for the feature I'm building.
Delete() still uses Lines 82 and 88's provider.Spec.DatabaseName. With more than one database, cleanup can target the wrong slot and the wrong state entry.
I will push a fix for this because it's a small change.
There was a problem hiding this comment.
@jason-lynch, thanks for the context — that makes sense for the Identifier() point. I've recorded the rationale below so it doesn't get flagged again in future reviews.
For the Delete() fix, sounds good — looking forward to the push.
✏️ Learnings added
Learnt from: jason-lynch
Repo: pgEdge/control-plane PR: 292
File: server/internal/database/replication_slot_resource.go:29-29
Timestamp: 2026-03-11T18:03:07.926Z
Learning: In `server/internal/database/replication_slot_resource.go` (Go), `ReplicationSlotResource.Identifier()` intentionally does NOT include `DatabaseName` in the resource key (it only uses provider+subscriber). Including `DatabaseName` would change existing resource identities and have a destructive effect on existing databases, requiring a migration process. This is safe to defer because the current API contract allows only one Postgres database per control plane database. Do not flag the absence of `DatabaseName` from `Identifier()` as a bug until a migration path is in place.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: jason-lynch
Repo: pgEdge/control-plane PR: 292
File: server/internal/database/postgres_database.go:122-135
Timestamp: 2026-03-11T17:52:39.498Z
Learning: In `server/internal/database/postgres_database.go` (Go), the `isFirstTimeSetup()` method intentionally uses resource presence in state (rather than `NeedsRecreate`) as the restore-complete signal. This is a deliberate design choice: `NeedsRecreate` cannot be used because it is set both for failed creates AND for resources removed outside the system's view, making it unsafe to trigger the destructive `postRestore` path on retry. The post-restore flow (rename/drop/reinitialize Spock) is inherently non-idempotent; recovery from a failed restore is achieved by deleting the instance and retrying the restore, which will be covered in documentation.
Learnt from: rshoemaker
Repo: pgEdge/control-plane PR: 266
File: server/internal/workflows/update_database.go:122-153
Timestamp: 2026-02-09T21:47:46.904Z
Learning: In server/internal/workflows/update_database.go and related Go workflow files (e.g., provision_services.go), implement idempotent provisioning semantics: re-running ProvisionServices on already-provisioned service instances should be safe and non-destructive. Upsert operations (e.g., StoreServiceInstance) may temporarily set state to 'creating' but should recover back to 'running' via monitoring. Use SQL-level idempotence for user/service roles (e.g., CREATE ROLE IF NOT EXISTS / ALTER ROLE) and ensure deployment diffs compare desired vs. current state. These patterns support recovery when a user retries with corrected specs after a failure, and should be considered in similar Go workflow files across the repository.
Learnt from: jason-lynch
Repo: pgEdge/control-plane PR: 290
File: server/internal/database/service.go:564-574
Timestamp: 2026-03-11T14:26:49.745Z
Learning: In pgEdge/control-plane server/internal/database/service.go, a release failure after a successful persist in ReconcileInstanceSpec is considered non-critical by design. The only consequence is that the released port will not be a candidate for future random assignment, which is very low risk. No additional complexity is needed to guard against this failure mode.
Learnt from: rshoemaker
Repo: pgEdge/control-plane PR: 266
File: server/internal/orchestrator/swarm/service_user_role.go:85-181
Timestamp: 2026-02-09T23:31:07.993Z
Learning: In server/internal/orchestrator/swarm/service_user_role.go (Go): The Delete method distinguishes between etcd errors (GetDatabase) and Postgres connection errors. GetDatabase errors should be propagated because they represent transient infrastructure failures. However, connection errors when attempting to DROP ROLE (GetInstanceConnectionInfo, PostgresUserTLS, conn.Exec) should be swallowed with logging, because during database deletion, Postgres instances may be concurrently shutting down and connection failures are expected, not retriable issues.
Learnt from: jason-lynch
Repo: pgEdge/control-plane PR: 252
File: server/internal/database/wait_for_sync_event_resource.go:76-90
Timestamp: 2026-01-20T13:20:29.985Z
Learning: In the pgEdge/control-plane codebase, any file under server/internal/database/ that implements the resource.Resource interface (e.g., *WaitForSyncEventResource) and runs within workflow contexts (not connected directly to API endpoints) should not rely on Goa error mapping in server/internal/api/apiv1/errors.go. Treat domain-specific errors from these workflow resources as internal to the workflow layer and avoid mapping them to API-level Goa errors. This pattern applies to all relevant Go files in server/internal/database/, not just the one example.
Learnt from: rshoemaker
Repo: pgEdge/control-plane PR: 280
File: server/internal/orchestrator/swarm/mcp_config.go:95-97
Timestamp: 2026-03-02T20:17:36.231Z
Learning: In the pgEdge/control-plane repository, internal (unexported) functions that are called through controlled paths by the resource framework should assume validated input from their invocation boundaries. Do not add defensive nil guards inside such internal helpers; rely on validation at the resource construction/invocation boundaries (e.g., MCPConfigResource.Create/Update) to guarantee data integrity. This guideline applies to all Go files under server/internal where internal helpers serve as implementation details behind controlled APIs.
Learnt from: jason-lynch
Repo: pgEdge/control-plane PR: 290
File: server/internal/database/service.go:564-574
Timestamp: 2026-03-11T14:26:30.249Z
Learning: In the pgEdge/control-plane repo, the port allocator (ports.Service) acts as a full-occupancy tracker for all host ports, not just randomly allocated ones. Releasing a port (ReleasePortIfDefined / ReleasePort) marks it as eligible for future random assignment. User-pinned ports are tracked as well, and intentionally released via portsSvc on instance deletion or spec changes, since the allocator checks OS availability before returning any port. Reviewers should verify that the port allocator maintains a global occupancy view, that ReleasePort* functions release ports back into the pool rather than permanently freeing them, and that pinned ports remain tracked until the host OS confirms availability at allocation time. Applies to files under server/internal/database (and related ports/allocator code) in Go.
Learnt from: rshoemaker
Repo: pgEdge/control-plane PR: 287
File: server/internal/orchestrator/swarm/service_user_role.go:59-60
Timestamp: 2026-03-11T16:05:27.239Z
Learning: In server/internal/orchestrator/swarm/service_user_role.go, treat a ResourceVersion bump (e.g., 2 -> 3) as a documentation/safety marker only. When the resource Identifier key changes (e.g., ServiceInstanceID to ServiceID), the system will automatically recreate resources due to the Type/ID key mismatch. Do not flag a ResourceVersion bump as a missing recreate trigger if an Identifier change is already causing a full recreate for that resource. This guidance is file-specific and should be applied when reviewing changes to service_user_role.go.
| var owner string | ||
| for _, user := range s.DatabaseUsers { | ||
| if user.DBOwner { | ||
| owner = user.Username | ||
| break | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for validation that enforces exactly one DBOwner before NodeInstances runs.
rg -n -C3 --type=go '\bDBOwner\b|db_owner|DatabaseOwner' server/internal/database server/internal/workflowsRepository: pgEdge/control-plane
Length of output: 9257
🏁 Script executed:
#!/bin/bash
# Search for validation functions that check DatabaseUsers
rg -n -C5 'DatabaseUsers' server/internal/database/spec.go | head -60Repository: pgEdge/control-plane
Length of output: 2309
🏁 Script executed:
#!/bin/bash
# Search for any validation or error checking related to DatabaseUsers
rg -n -B3 -A3 'len.*DatabaseUsers|DatabaseUsers.*len' server/internal/databaseRepository: pgEdge/control-plane
Length of output: 925
🏁 Script executed:
#!/bin/bash
# Search for where NodeInstances is called from
rg -n 'NodeInstances\(\)' server/Repository: pgEdge/control-plane
Length of output: 696
🏁 Script executed:
#!/bin/bash
# Search for validation/schema validation functions in spec.go
rg -n 'Validate|validate|Error|error' server/internal/database/spec.go | head -40Repository: pgEdge/control-plane
Length of output: 851
🏁 Script executed:
#!/bin/bash
# Search for where Spec is created/validated and check for DBOwner validation
rg -n 'DBOwner|db_owner' server/internal/workflows --type=go -C2Repository: pgEdge/control-plane
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Look at the actual NodeInstances implementation to see error handling
sed -n '581,660p' server/internal/database/spec.goRepository: pgEdge/control-plane
Length of output: 2710
🏁 Script executed:
#!/bin/bash
# Search for any validation that happens at spec load/create time
rg -n 'func.*Validate|NewSpec|UnmarshalJSON' server/internal/database/spec.goRepository: pgEdge/control-plane
Length of output: 125
🏁 Script executed:
#!/bin/bash
# Check if there's validation in the API layer or spec service
rg -n 'DBOwner|db_owner|DatabaseOwner' server/internal/api server/internal/service --type=goRepository: pgEdge/control-plane
Length of output: 307
🏁 Script executed:
#!/bin/bash
# Check if there's validation in the convert or API layer
rg -n -B5 -A5 'DbOwner|db_owner|DBOwner' server/internal/api/apiv1/convert.goRepository: pgEdge/control-plane
Length of output: 894
🏁 Script executed:
#!/bin/bash
# Search for validation that might happen before NodeInstances is called
rg -n 'Spec.*Validate|func.*Validate' server/internal/workflows/Repository: pgEdge/control-plane
Length of output: 992
🏁 Script executed:
#!/bin/bash
# Check what happens when owner is empty string - look at downstream usage
rg -n 'DatabaseOwner' server/internal/database/operations/common.go -B2 -A2Repository: pgEdge/control-plane
Length of output: 378
🏁 Script executed:
#!/bin/bash
# Look for any validation of DatabaseUsers before creating/updating a spec
rg -n 'DatabaseUsers' server/internal/api --type=go -B2 -A2Repository: pgEdge/control-plane
Length of output: 1392
🏁 Script executed:
#!/bin/bash
# Look at ValidateSpec to see what validation is actually done
sed -n '37,120p' server/internal/workflows/validate_spec.goRepository: pgEdge/control-plane
Length of output: 2446
🏁 Script executed:
#!/bin/bash
# Check if there's any validation in the spec service layer
rg -n 'func.*Create.*Spec|func.*Update.*Spec' server/internal/database/ --type=goRepository: pgEdge/control-plane
Length of output: 717
🏁 Script executed:
#!/bin/bash
# Search for any places that check owner is not empty
rg -n 'owner.*=|Owner.*==' server/internal/database/spec.goRepository: pgEdge/control-plane
Length of output: 92
🏁 Script executed:
#!/bin/bash
# See if there's validation logic anywhere that checks DBOwner count
rg -n 'count|len.*DBOwner|multiple.*owner' server/ --type=goRepository: pgEdge/control-plane
Length of output: 7834
🏁 Script executed:
#!/bin/bash
# Check the CreateDatabase and UpdateDatabase to see if validation happens there
sed -n '53,100p' server/internal/database/service.goRepository: pgEdge/control-plane
Length of output: 1488
🏁 Script executed:
#!/bin/bash
# Look for any test that has multiple DBOwner users
rg -n 'DBOwner.*true' server/internal/database/spec_test.go | head -30Repository: pgEdge/control-plane
Length of output: 277
🏁 Script executed:
#!/bin/bash
# Check if there's any validation on Spec creation/API input
rg -n 'func.*Validate' server/internal/api/apiv1/ --type=go | grep -i specRepository: pgEdge/control-plane
Length of output: 248
🏁 Script executed:
#!/bin/bash
# Find and examine the ValidateDatabaseSpec function
rg -n 'func.*ValidateDatabaseSpec|func.*validateDatabaseSpec' server/internal/api/apiv1/ --type=goRepository: pgEdge/control-plane
Length of output: 255
🏁 Script executed:
#!/bin/bash
# Look at the validate_test.go to see what DatabaseSpec validation tests exist
sed -n '497,650p' server/internal/api/apiv1/validate_test.goRepository: pgEdge/control-plane
Length of output: 3692
🏁 Script executed:
#!/bin/bash
# Search for validation that checks DatabaseUsers specifically
rg -n 'DatabaseUsers|database_users' server/internal/api/apiv1/validate.goRepository: pgEdge/control-plane
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Check what ValidateChangedSpec does
rg -n -A20 'func ValidateChangedSpec' server/internal/database/spec.goRepository: pgEdge/control-plane
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Get the full validateDatabaseSpec function
sed -n '64,150p' server/internal/api/apiv1/validate.goRepository: pgEdge/control-plane
Length of output: 2935
🏁 Script executed:
#!/bin/bash
# Search for all validations in validate.go
rg -n 'func validate' server/internal/api/apiv1/validate.goRepository: pgEdge/control-plane
Length of output: 1830
🏁 Script executed:
#!/bin/bash
# Look for any test with multiple DBOwner entries to see if it's rejected
rg -n -B10 'DBOwner.*true' server/internal/api/apiv1/validate_test.go | head -50Repository: pgEdge/control-plane
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Search for DatabaseUsers or database_users in the entire codebase to see if there's any validation
rg -n 'DatabaseUsers|database_users' server/internal/api/apiv1/validate.goRepository: pgEdge/control-plane
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Check if the conversion function from API to internal spec has validation
sed -n '700,760p' server/internal/api/apiv1/convert.goRepository: pgEdge/control-plane
Length of output: 2006
🏁 Script executed:
#!/bin/bash
# Look for any error handling or validation when creating NodeInstances with no owner
rg -n 'owner.*=.*""' server/internal/database/spec.goRepository: pgEdge/control-plane
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Check if there's validation on the Spec struct itself (UnmarshalJSON, Validate methods)
rg -n 'func.*\(.*Spec\).*Validate|UnmarshalJSON' server/internal/database/spec.goRepository: pgEdge/control-plane
Length of output: 125
🏁 Script executed:
#!/bin/bash
# Look for any place that checks if owner is empty and handles it
rg -n 'owner == ""' server/internal/database/ --type=goRepository: pgEdge/control-plane
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Check what happens if DatabaseOwner is empty in downstream operations
rg -n 'Owner.*:' server/internal/database/operations/common.go -B2 -A2Repository: pgEdge/control-plane
Length of output: 253
🏁 Script executed:
#!/bin/bash
# Look at the test for NodeInstances to see if it tests the owner extraction logic
rg -n -A30 'func Test.*NodeInstances' server/internal/database/spec_test.goRepository: pgEdge/control-plane
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Check if there's any validation of DatabaseUsers in validateDatabaseSpec by looking at the full function output
sed -n '64,148p' server/internal/api/apiv1/validate.go | grep -i 'user\|owner'Repository: pgEdge/control-plane
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Look at the entire NodeInstances function to see if there's any error return for missing owner
sed -n '581,660p' server/internal/database/spec.go | tail -30Repository: pgEdge/control-plane
Length of output: 1100
🏁 Script executed:
#!/bin/bash
# Check if owner being empty is ever checked in operations that use it
rg -n 'Owner.*==' server/internal/database/operations/ --type=goRepository: pgEdge/control-plane
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Verify: are there any tests that check what happens with zero or multiple DBOwner?
rg -n 'DBOwner' server/internal/database/spec_test.go | wc -lRepository: pgEdge/control-plane
Length of output: 65
Add validation to enforce exactly one DBOwner user before NodeInstances() runs.
NodeInstances() silently takes the first user with DBOwner == true and stamps it onto every node. There is no upstream validation enforcing exactly one owner or rejecting zero owners. If multiple DBOwner users are present, ownership becomes input-order dependent; if none exist, DatabaseOwner becomes an empty string. Validate in validateDatabaseSpec() or at service layer that exactly one user has DBOwner == true.
Also applies to: 645-646
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/internal/database/spec.go` around lines 587 - 593, NodeInstances()
currently picks the first user with DBOwner==true from DatabaseUsers (setting
DatabaseOwner) without validation, which makes ownership order-dependent or
empty; add validation in validateDatabaseSpec() (or equivalent service-layer
validation) to assert exactly one DatabaseUsers entry has DBOwner==true and
return an error if count != 1 so NodeInstances() can assume a single owner;
update any callers/tests to handle the validation error and reference
DatabaseUsers, DBOwner, DatabaseOwner, NodeInstances(), and
validateDatabaseSpec() when implementing the check.
| func (d *DSN) Fields() []string { | ||
| var fields []string | ||
| addField := func(key, value string) { | ||
| if value == "" { | ||
| return | ||
| } | ||
| port := strings.Join(ports, ",") | ||
| fields = append(fields, fmt.Sprintf("port=%s", port)) | ||
| } | ||
| if d.User != "" { | ||
| fields = append(fields, fmt.Sprintf("user=%s", d.User)) | ||
| } | ||
| if d.Password != "" { | ||
| fields = append(fields, fmt.Sprintf("password=%s", d.Password)) | ||
| } | ||
| if d.DBName != "" { | ||
| fields = append(fields, fmt.Sprintf("dbname=%s", d.DBName)) | ||
| } | ||
| if d.SSLCert != "" { | ||
| fields = append(fields, fmt.Sprintf("sslcert=%s", d.SSLCert)) | ||
| } | ||
| if d.SSLKey != "" { | ||
| fields = append(fields, fmt.Sprintf("sslkey=%s", d.SSLKey)) | ||
| } | ||
| if d.SSLRootCert != "" { | ||
| fields = append(fields, fmt.Sprintf("sslrootcert=%s", d.SSLRootCert)) | ||
| } | ||
| if d.ApplicationName != "" { | ||
| fields = append(fields, fmt.Sprintf("application_name=%s", d.ApplicationName)) | ||
| } else { | ||
| fields = append(fields, "application_name=control-plane") | ||
|
|
||
| var buf strings.Builder | ||
| buf.WriteString(key) | ||
| buf.WriteString("=") | ||
| buf.WriteString(value) | ||
|
|
||
| fields = append(fields, buf.String()) | ||
| } | ||
|
|
||
| addField("host", d.Host()) | ||
| addField("port", d.Port()) | ||
| addField("user", d.User) | ||
| addField("password", d.Password) | ||
| addField("dbname", d.DBName) | ||
| addField("sslcert", d.SSLCert) | ||
| addField("sslkey", d.SSLKey) | ||
| addField("sslrootcert", d.SSLRootCert) | ||
| addField("service", d.Service) | ||
| addField("application_name", d.ApplicationName) | ||
| addField("target_session_attrs", d.TargetSessionAttrs) | ||
|
|
||
| for key, value := range d.Extra { | ||
| fields = append(fields, fmt.Sprintf("%s=%s", key, value)) | ||
| addField(key, value) | ||
| } | ||
| return strings.Join(fields, " ") | ||
|
|
||
| return fields |
There was a problem hiding this comment.
Sort Extra keys before serializing the DSN.
Iterating d.Extra directly makes String() nondeterministic when that map has multiple entries. Any code comparing DSN strings will then see false diffs and can churn subscriptions/interfaces even when nothing changed.
Suggested fix
import (
+ "sort"
"strconv"
"strings"
)
@@
- for key, value := range d.Extra {
- addField(key, value)
- }
+ keys := make([]string, 0, len(d.Extra))
+ for key := range d.Extra {
+ keys = append(keys, key)
+ }
+ sort.Strings(keys)
+ for _, key := range keys {
+ addField(key, d.Extra[key])
+ }Also applies to: 69-70
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/internal/postgres/dsn.go` around lines 35 - 66, The iteration over
d.Extra in DSN.Fields() makes DSN.String() nondeterministic; before the for loop
that calls addField for each entry in d.Extra, collect the map keys, sort them
(e.g., using sort.Strings), and then iterate the sorted keys to append fields in
a stable order; update references in DSN.Fields() (and any other place that
serializes d.Extra) so addField is invoked in key-sorted order to ensure
deterministic DSN.String() output.
29af08f to
bf3dd8b
Compare
e47f053 to
26aa05d
Compare
bf3dd8b to
0508e04
Compare
26aa05d to
a856583
Compare
0508e04 to
7f89e1e
Compare
a856583 to
57a6663
Compare
Adds a new `PostgresDatabaseResource` that shifts the Postgres database creation logic to happen later in the database creation process. Now, the `InstanceResource` is only responsible for creating the database users, and the `PostgresDatabase` resource is responsible for creating the Postgres database, granting role privileges to the new database, and initializing the Spock node. This change is necessary because we need to use `pg_service.conf` for SystemD support, and we need that file to exist when we create the Spock node. This change shifts the Spock node initialization until after all of the instances are created and all nodes are ready, so we're able to construct the DSNs for all nodes before we initialize Spock. This change has an added benefit that it opens up the possibility of support for multiple Postgres databases per Control Plane database since we're able to make multiple databases/spock nodes per `NodeResource`/Patroni cluster. I've added the `DatabaseName` field to most, if not all, of the resources that would need to change to accommodate this. However, I did stop short of incorporating the database name into these resource's identities, which will be necessary to complete that multi-database support. PLAT-417
7f89e1e to
b6a8fb4
Compare
57a6663 to
e656f49
Compare
Summary
Adds a new
PostgresDatabaseResourcethat shifts the Postgres database creation logic to happen later in the database creation process. Now, theInstanceResourceis only responsible for creating the database users, and thePostgresDatabaseresource is responsible for creating the Postgres database, granting role privileges to the new database, and initializing the Spock node.This change is necessary because we need to use
pg_service.conffor SystemD support, and we need that file to exist when we create the Spock node. This change shifts the Spock node initialization until after all of the instances are created and all nodes are ready, so we're able to construct the DSNs for all nodes before we initialize Spock.This change has an added benefit that it opens up the possibility of support for multiple Postgres databases per Control Plane database since we're able to make multiple databases/spock nodes per
NodeResource/Patroni cluster. I've added theDatabaseNamefield to most, if not all, of the resources that would need to change to accommodate this. However, I did stop short of incorporating the database name into these resource's identities, which will be necessary to complete that multi-database support.Testing
There should be no difference in user-facing functionality.
PLAT-417