Postgres replication patterns in 2026 — Patroni, managed services, and the failover story

There are essentially three ways to run highly-available Postgres in 2026: managed Postgres on a hyperscaler, a self-managed Patroni cluster, or a custom replication script someone wrote in 2019. Option three is a trap.

We run options one and two extensively across customer engagements. Which one you should pick depends on how much operational capacity you have and whether you need extensions or RPO/RTO numbers the managed services can't hit.

Patroni — the failover semantics that matter

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    maximum_lag_on_failover: 1048576
    synchronous_mode: true
postgresql:
  use_pg_rewind: true

synchronous_mode: true is RPO=0 in exchange for commit latency. maximum_lag_on_failover: 1MB means a stale replica can't be promoted — the alternative is silent data loss in a real outage. These two settings are what make Patroni's failover story stronger than RDS Multi-AZ.

The full write-up covers:

Managed Postgres trade-offs (no superuser, parameter group limits, vendor-defined RPO)
Patroni cluster topology (3 Postgres + 3 etcd or K8s leases)
The 30-45 second failover sequence and how to tune ttl down
DCS (etcd) partition failures — the silent demote that confuses everyone
Patroni on Kubernetes via Zalando/Crunchy operators vs bare VMs
HAProxy / PgBouncer health checks against Patroni's REST API

For most workloads we recommend managed Postgres. Reach out if yours doesn't fit.