Skip to content

K8SPG-1057: Allow using etcd as patroni DCS#1647

Open
yoav-katz wants to merge 24 commits into
percona:mainfrom
yoav-katz:etcd-dcs
Open

K8SPG-1057: Allow using etcd as patroni DCS#1647
yoav-katz wants to merge 24 commits into
percona:mainfrom
yoav-katz:etcd-dcs

Conversation

@yoav-katz

@yoav-katz yoav-katz commented Jun 18, 2026

Copy link
Copy Markdown

CHANGE DESCRIPTION

Problem:
Patroni supports multiple DCS backends, but the operator hardcodes Kubernetes Endpoints as the only option. This blocks clusters on managed Kubernetes platforms where workloads cannot reach the control plane API.

Cause:
The kubernetes: stanza was hardcoded in the generated Patroni config with no mechanism to select a different backend.
Several other pieces of the operator also assumed k8s DCS: RBAC rules unconditionally granted Endpoints permissions, the primary service routed through Patroni-managed Endpoints objects, and pod role labels/annotations were expected to be set by Patroni itself (which only happens with k8s DCS).

Solution:
Add a spec.patroni.dcs field (type: kubernetes default, type: etcd alternative). The field is immutable after cluster creation, enforced by a CEL validation rule on the CRD.

When type: etcd, the operator:

  • Emits an etcd3: stanza in the generated Patroni config instead of kubernetes:, with optional TLS (cacert/cert/key) and auth credentials (PATRONI_ETCD3_USERNAME/PATRONI_ETCD3_PASSWORD) sourced from referenced Secrets.
  • Injects on_start and on_role_change Patroni callbacks pointing to a new patroni-role-change.sh script. Since Patroni does not set pod role labels or the status annotation when using etcd DCS, this script patches the pod via the k8s API on every role transition, restoring the label (role=primary|replica) and annotation ({"role":"primary"}) that the rest of the operator depends on for Service routing and primary detection.
  • Creates the primary Service with a label selector (role=primary) instead of the previous headless-Endpoints-to-Patroni-leader-ClusterIP indirection, which only works with k8s DCS.
  • Skips creating the Patroni leader lease Service and distributed configuration Service, which are k8s DCS artifacts.
  • Omits the Endpoints RBAC permissions from the postgres pod ServiceAccount, since they are not needed.
  • Validates that referenced TLS and auth Secrets exist and contain the required keys, surfacing issues as Warning events on the cluster.

The Kubernetes DCS path is unchanged.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Helm Chart Merge Request
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PG version?
  • Does the change support oldest and newest supported Kubernetes version?

@it-percona-cla

it-percona-cla commented Jun 18, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@yoav-katz yoav-katz marked this pull request as draft June 18, 2026 22:26
@yoav-katz

Copy link
Copy Markdown
Author

I will wait for the jira ticket to open the PR in the helm charts repo

@yoav-katz

Copy link
Copy Markdown
Author

and another note - this is my first OSS contribute so be gentel 😄
if there is stuff that you think should be changed becuase of style/dependency consideration I will be happy to fix!

@egegunes egegunes changed the title feat(etcd) K8SPG-1057: Allow using etcd as patroni DCS Jun 19, 2026

@egegunes egegunes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yoav-katz the implementation looks good to me in general. but we definitely need an e2e test that deploys etcd and configures PerconaPGCluster to use it.

@DanBrima

Copy link
Copy Markdown

would love to see this merged!

@yoav-katz

yoav-katz commented Jun 20, 2026

Copy link
Copy Markdown
Author

QUESTIONS FOR REVIEWERS:

  1. e2e coverage approach: The new suite covers the etcd DCS happy path but does not run the existing full test suite (switchover, pgbackrest backup/restore, scale-up, upgrade, etc.) against etcd DCS. Should we: (a) add the most critical existing tests (switchover, backup) to this suite, or (b) parameterize the main suite to run with both DCS backends?
  2. Read-from-replica test: There is no step testing that replica pods are correctly labeled and that reads can be served through the replica service. Should that be added before merge?
  3. DCS immutability UX: The CEL rule prevents changing dcs.type after cluster creation. If a user needs to migrate between DCS backends, the only path is delete and recreate. Is this the right trade-off, or should we document a migration procedure?
  4. Routing: The current implementation routes primary/replica traffic via k8s Services with label selectors (role=primary, role=replica). The longer-term goal is to replace this with HAProxy, which would discover and health-check postgres pods directly via Patroni's REST API - removing the dependency on pod labels for routing entirely. should HAProxy integration be implemented inside the operator, or should the operator when using etcd as a dcs simply expose a headless Service covering all postgres pods and leave HAProxy configuration to the user?

@yoav-katz yoav-katz marked this pull request as ready for review June 20, 2026 16:10
@yoav-katz

Copy link
Copy Markdown
Author

Operator-managed etcd (future consideration)

The current design requires users to supply an external etcd cluster via spec.patroni.dcs.etcd.endpoints. This is a reasonable first step, but it places a significant operational burden on users who don't already have etcd infrastructure. An alternative would be a managed sub-field on the etcd spec, e.g.:

spec:
  patroni:
    dcs:
      type: etcd
      etcd:
        managed:           # operator deploys etcd itself
          replicas: 3      # 1 for dev, 3 for production HA
          storage: 1Gi
          storageClass: standard
        # endpoints: omitted when managed: is set

The operator would create and reconcile an etcd StatefulSet (with PVCs) co-located with the PostgreSQL cluster. This raises a few design questions:
(a) Should this be scoped to this PR or tracked as a follow-up?
(b) If implemented, should it be a thin wrapper (the operator just creates a StatefulSet from a known etcd image) or should it delegate to an existing etcd operator (e.g., via a EtcdCluster CR)?

@yoav-katz yoav-katz requested a review from egegunes June 21, 2026 22:06
@egegunes

Copy link
Copy Markdown
Contributor
  1. e2e coverage approach: The new suite covers the etcd DCS happy path but does not run the existing full test suite (switchover, pgbackrest backup/restore, scale-up, upgrade, etc.) against etcd DCS. Should we: (a) add the most critical existing tests (switchover, backup) to this suite, or (b) parameterize the main suite to run with both DCS backends?

Let's start with (a).

  1. Read-from-replica test: There is no step testing that replica pods are correctly labeled and that reads can be served through the replica service. Should that be added before merge?

I don't think it's crucial but would be a good addition.

  1. DCS immutability UX: The CEL rule prevents changing dcs.type after cluster creation. If a user needs to migrate between DCS backends, the only path is delete and recreate. Is this the right trade-off, or should we document a migration procedure?

For the start, I think it's better to not allow live migration. We can revisit this after receiving feedback.

  1. Routing: The current implementation routes primary/replica traffic via k8s Services with label selectors (role=primary, role=replica). The longer-term goal is to replace this with HAProxy, which would discover and health-check postgres pods directly via Patroni's REST API - removing the dependency on pod labels for routing entirely. should HAProxy integration be implemented inside the operator, or should the operator when using etcd as a dcs simply expose a headless Service covering all postgres pods and leave HAProxy configuration to the user?

Why the longer-term goal is to replace routing by labels with HAProxy? Also, operator already creates a headless service covering all postgres pods.

  1. Operator-managed etcd: The operator would create and reconcile an etcd StatefulSet (with PVCs) co-located with the PostgreSQL cluster. This raises a few design questions:
    (a) Should this be scoped to this PR or tracked as a follow-up?
    (b) If implemented, should it be a thin wrapper (the operator just creates a StatefulSet from a known etcd image) or should it delegate to an existing etcd operator (e.g., via a EtcdCluster CR)?

I don't think we should ever have etcd managed by the operator. It should be the user who configure and manage etcd infrastructure.

Comment thread e2e-tests/tests/etcd-dcs/05-check-password-leak.yaml Outdated
Comment thread internal/controller/postgrescluster/patroni.go Outdated
@yoav-katz

Copy link
Copy Markdown
Author

Why the longer-term goal is to replace routing by labels with HAProxy? Also, operator already creates a headless service covering all postgres pods.

The label-based routing still tightly couples failover to the Kubernetes API - Patroni's callback needs to patch pod labels on every role change. That's the same dependency we're trying to reduce by moving to external etcd. HAProxy querying Patroni's REST health endpoints directly removes that dependency entirely - failover is self-contained within Patroni and etcd, no K8s API writes in the hot path.

) error {
// With etcd DCS, Patroni stores distributed configuration in etcd, not k8s Endpoints.
if dcs := cluster.Spec.Patroni.GetDCS(); dcs != nil && dcs.Type == v1beta1.PatroniDCSTypeEtcd {
if dcs := cluster.GetDCS(); dcs != nil && dcs.Type == v1beta1.PatroniDCSTypeEtcd {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we are repeating this code in many places, would it make sense to have something like cluster.IsDCSEtcd() and move this conditions into there?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of IsDCSEtcd() I went with a more general DCSType() method that normalizes a nil DCS to the default (kubernetes), so adding a new DCS type in the future doesn't require a function per type.
Do you think we should also add IsDCSEtcd() as a convenience on top of it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the current version is good, i mostly wanted to cleanup all those nil checks

Copilot AI review requested due to automatic review settings June 25, 2026 15:29

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for using external etcd as Patroni’s DCS backend (in addition to the existing Kubernetes Endpoints DCS), enabling deployments where workloads cannot reach the Kubernetes control plane API. It introduces a new spec.patroni.dcs API, updates Patroni config generation, RBAC/service behavior, and adds validations plus unit/E2E coverage for the etcd path.

Changes:

  • Add spec.patroni.dcs (default kubernetes, optional etcd) with CEL immutability validation and generated deepcopy/CRD updates.
  • When etcd DCS is selected: generate etcd3: Patroni config, add role-change callbacks, adjust primary Service selection logic, and validate referenced TLS/auth Secrets (with Secret watch/indexing).
  • Add unit tests and a new KUTTL E2E scenario (etcd-dcs) to validate config, behavior, and immutability.

Reviewed changes

Copilot reviewed 41 out of 42 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/apis/upstream.pgv2.percona.com/v1beta1/zz_generated.deepcopy.go Generated deepcopy updates for new Patroni DCS types.
pkg/apis/upstream.pgv2.percona.com/v1beta1/patroni_types.go Adds dcs API types, helpers, and CEL validations.
pkg/apis/pgv2.percona.com/v2/perconapgcluster_types.go Adds etcd DCS validation and Secret indexer for reconciliation triggers.
pkg/apis/pgv2.percona.com/v2/perconapgcluster_types_test.go Unit tests for Percona CR validation of etcd DCS endpoints.
percona/controller/pgcluster/patroni_etcd.go Reconcile-time Secret presence/key validation + Warning events for etcd DCS Secrets.
percona/controller/pgcluster/patroni_etcd_test.go Unit tests for etcd DCS Secret validation reconciliation behavior.
percona/controller/pgcluster/controller.go Watches Secrets via multiple field indexes (envFrom + patroni etcd secrets) with dedupe.
internal/patroni/reconcile.go Mount etcd TLS Secret and inject Patroni etcd auth env vars into instance Pods.
internal/patroni/rbac.go Conditionalize Endpoints/Service create RBAC rules based on DCS type.
internal/patroni/config.go Emit etcd3: config + callbacks for etcd DCS; keep Kubernetes DCS behavior unchanged.
internal/patroni/config_test.go Adds unit coverage for etcd DCS Patroni YAML generation and env behavior.
internal/controller/postgrescluster/patroni.go Skip k8s-DCS-specific artifacts/status reads when using etcd DCS.
internal/controller/postgrescluster/cluster.go Use selector-based primary Service for etcd DCS; avoid applying Endpoints when not used.
e2e-tests/tests/etcd-dcs/00-deploy-operator.yaml E2E setup step: deploy operator and client.
e2e-tests/tests/etcd-dcs/00-assert.yaml E2E assertions for operator/CRD readiness.
e2e-tests/tests/etcd-dcs/01-etcd-setup.yaml E2E step: deploy a single-node etcd StatefulSet for testing.
e2e-tests/tests/etcd-dcs/01-assert.yaml E2E assertion: etcd is ready.
e2e-tests/tests/etcd-dcs/02-create-cluster.yaml E2E step: create cluster configured for etcd DCS.
e2e-tests/tests/etcd-dcs/02-assert.yaml E2E assertions: cluster reaches ready state with etcd DCS.
e2e-tests/tests/etcd-dcs/03-write-data.yaml E2E: write data to primary via client.
e2e-tests/tests/etcd-dcs/04-read-from-primary.yaml E2E: read data back from primary.
e2e-tests/tests/etcd-dcs/04-assert.yaml E2E assertion: expected read result.
e2e-tests/tests/etcd-dcs/05-assert.yaml E2E assertions for created resources.
e2e-tests/tests/etcd-dcs/06-check-patroni-config.yaml E2E: verify Patroni config contains etcd3 + callbacks and omits kubernetes:.
e2e-tests/tests/etcd-dcs/07-check-patronictl.yaml E2E: verify patronictl list shows running/leader.
e2e-tests/tests/etcd-dcs/08-check-etcd-keys.yaml E2E: verify Patroni keys appear in etcd.
e2e-tests/tests/etcd-dcs/09-check-no-warning-events.yaml E2E: ensure no unexpected Warning events for etcd secrets.
e2e-tests/tests/etcd-dcs/10-check-dcs-immutability.yaml E2E: ensure DCS type immutability is enforced by admission/CEL.
e2e-tests/tests/etcd-dcs/11-check-pod-labels.yaml E2E: verify role labels are present (callback executed).
e2e-tests/tests/etcd-dcs/99-remove-cluster-gracefully.yaml E2E teardown: remove resources and validate operator stability.
e2e-tests/run-release.csv Adds etcd-dcs to release E2E run list.
e2e-tests/run-pr.csv Adds etcd-dcs to PR E2E run list.
deploy/cw-bundle.yaml Bundle CRD updates to include new DCS schema/validations.
deploy/crd.yaml Generated CRD updates for new DCS schema/validations.
deploy/bundle.yaml Bundle CRD updates for new DCS schema/validations.
config/crd/bases/upstream.pgv2.percona.com_postgresclusters.yaml Base CRD schema updates for DCS fields/validations.
config/crd/bases/pgv2.percona.com_perconapgclusters.yaml Base CRD schema updates for DCS fields/validations.
cmd/postgres-operator/main.go Adds field index registration for etcd DCS referenced Secrets.
build/postgres-operator/patroni-role-change.sh New Patroni callback script to patch pod role label + status annotation.
build/postgres-operator/init-entrypoint.sh Installs the new Patroni role-change script into runtime bindir.
build/postgres-operator/Dockerfile Ships the new Patroni role-change script in the image.
build/crd/percona/generated/pgv2.percona.com_perconapgclusters.yaml Generated Percona CRD output updated for new DCS schema/validations.

Comment thread pkg/apis/upstream.pgv2.percona.com/v1beta1/patroni_types.go
Comment thread percona/controller/pgcluster/patroni_etcd_test.go
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 25, 2026 16:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 41 out of 42 changed files in this pull request and generated 2 comments.

Comment thread internal/patroni/reconcile.go
Comment thread internal/controller/postgrescluster/patroni.go
@egegunes egegunes added this to the v3.1.0 milestone Jun 26, 2026
@JNKPercona

Copy link
Copy Markdown
Collaborator
Test Name Result Time
backup-enable-disable passed 00:00:00
builtin-extensions passed 00:00:00
cert-manager-tls passed 00:00:00
custom-envs passed 00:00:00
custom-tls passed 00:00:00
database-init-sql passed 00:00:00
demand-backup passed 00:35:24
demand-backup-offline-snapshot passed 00:16:08
dynamic-configuration passed 00:00:00
finalizers passed 00:00:00
init-deploy passed 00:00:00
huge-pages passed 00:00:00
major-upgrade-14-to-15 passed 00:00:00
major-upgrade-15-to-16 passed 00:00:00
major-upgrade-16-to-17 passed 00:00:00
major-upgrade-17-to-18 passed 00:00:00
ldap passed 00:00:00
ldap-tls passed 00:00:00
monitoring passed 00:00:00
monitoring-pmm3 passed 00:00:00
one-pod passed 00:00:00
operator-self-healing passed 00:00:00
pitr passed 00:00:00
scaling passed 00:00:00
scheduled-backup passed 00:00:00
self-healing passed 00:00:00
sidecars passed 00:00:00
standby-pgbackrest passed 00:00:00
standby-streaming passed 00:13:55
start-from-backup passed 00:00:00
tablespaces passed 00:00:00
telemetry-transfer passed 00:00:00
upgrade-consistency passed 00:00:00
upgrade-minor passed 00:00:00
users passed 00:00:00
etcd-dcs passed 00:00:00
Summary Value
Tests Run 36/36
Job Duration 00:52:15
Total Test Time 01:05:28

commit: 6cb833e
image: perconalab/percona-postgresql-operator:PR-1647-6cb833e6c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants