CloudNativePG Architecture on AWS EKS — Complete Guide for DBAs
Meta Description (155 chars): Learn CloudNativePG architecture on AWS EKS — key concepts, PostgreSQL HA design, node isolation, service types and use cases explained with real diagrams.
Focus Keyword: cloudnativepg architecture aws eks Secondary Keywords: postgresql on kubernetes architecture, cloudnativepg use cases, postgres kubernetes operator Category: PostgreSQL Series: PostgreSQL on Kubernetes (CloudNativePG) — Post 1 of 12
Introduction
If you’ve spent years managing PostgreSQL on bare metal or VMs, running it on Kubernetes can feel like learning a new language. The concepts are familiar — primary, replica, failover — but the mechanics are completely different. Before you install a single operator or write a single YAML file, you need to understand how CloudNativePG thinks about PostgreSQL.
This is Post 1 of our 12-part series on running PostgreSQL on Kubernetes using CloudNativePG (the EDB Postgres AI for CloudNativePG Cluster operator). We’ll be testing everything on a real AWS EKS cluster and attaching screenshots after each step.
This post covers the architecture and core concepts — no installation yet. Think of it as the blueprint before the build.
What you’ll learn:
- The key Kubernetes and PostgreSQL terms you need to know before you start
- The two main deployment use cases CloudNativePG supports
- How CloudNativePG handles HA, replication, and service routing
- How to design your node topology for production on AWS EKS
- The difference between single-AZ and multi-AZ deployments
Prerequisites
Before you start, make sure you have:
- [ ] Basic familiarity with Kubernetes (Pods, Services, Namespaces, PVCs)
- [ ] Basic PostgreSQL knowledge (primary, replica, WAL)
- [ ] An AWS account with EKS access (we’ll install in Post 2)
- [ ]
kubectlinstalled on your local machine
Lab Environment
We’ll use this environment throughout the entire series. Post 1 is conceptual — no commands to run yet — but here’s what we’re building toward:
| Component | Version / Details |
|---|---|
| Kubernetes | AWS EKS 1.31 |
| CloudNativePG Operator | v1.28.1 |
| PostgreSQL | 16.x |
| Worker Nodes | 3 × m5.xlarge (one per AZ) |
| Storage | AWS EBS gp3 (via EBS CSI Driver) |
| Region | us-east-1 (3 Availability Zones) |
| kubectl | v1.31+ |
What is CloudNativePG?
CloudNativePG (developed by EDB as “EDB Postgres AI for CloudNativePG Cluster”) is a Kubernetes operator that manages the full lifecycle of PostgreSQL clusters. It is 100% Kubernetes-native — no external tools, no sidecars for management, no external coordinators like etcd for Patroni.
The operator introduces a custom resource called Cluster that defines your entire PostgreSQL setup declaratively. You describe what you want; the operator makes it happen and keeps it that way.
Part 1: Terminology You Must Know
Before going further, let’s align on terms. CloudNativePG documentation uses both Kubernetes and PostgreSQL terminology, and they sometimes overlap in confusing ways.
Kubernetes Terms
| Term | What it means in CloudNativePG context |
|---|---|
| Node | A worker machine (EC2 instance on EKS) where PostgreSQL pods run |
| Postgres Node | A worker node dedicated to PostgreSQL workloads, labelled and tainted to prevent other workloads from landing on it |
| Pod | The container running a single PostgreSQL instance |
| Service | A stable network endpoint that routes traffic to the correct pod (primary or replica) |
| Secret | Stores passwords, certificates, connection strings — used heavily by CloudNativePG |
| StorageClass | Defines the type of storage (e.g., AWS EBS gp3) including provisioner, reclaim policy, and expansion settings |
| PersistentVolume (PV) | The actual storage resource — on EKS, this is an EBS volume |
| PersistentVolumeClaim (PVC) | The request for storage made by a PostgreSQL pod |
| Namespace | Logical isolation boundary — we’ll deploy CloudNativePG in its own namespace |
| RBAC | Role-Based Access Control — CloudNativePG uses this to control what the operator can and cannot touch |
| CRD | Custom Resource Definition — CloudNativePG extends Kubernetes with a Cluster CRD |
| Operator | The controller that watches Cluster resources and ensures the actual state matches the desired state |
PostgreSQL Terms
| Term | What it means |
|---|---|
| Instance | A single running PostgreSQL server process |
| Primary | The instance accepting both reads and writes |
| Replica / Standby | An instance replicating from the primary via WAL streaming |
| Hot Standby | A replica that can serve read-only queries while staying in sync |
| Cluster | In CloudNativePG terms: one primary + N replicas, all managed as a single Cluster resource |
| Replica Cluster | A separate Cluster resource in another Kubernetes cluster, used for cross-region DR |
| Designated Primary | The standby in a replica cluster that becomes primary during a failover of the whole cluster |
| WAL | Write-Ahead Log — the transaction log PostgreSQL uses for replication and crash recovery |
| PVC Group | The pair of PVCs per PostgreSQL instance: one for PGDATA (storage) and one for WAL files (walStorage) |
| RTO | Recovery Time Objective — how fast you recover |
| RPO | Recovery Point Objective — how much data you can afford to lose |
Cloud Terms (AWS Context)
| Term | AWS equivalent |
|---|---|
| Region | AWS Region (e.g., us-east-1) |
| Availability Zone | AWS AZ (e.g., us-east-1a, us-east-1b, us-east-1c) |
Part 2: The Two CloudNativePG Use Cases
CloudNativePG supports two primary deployment patterns depending on where your application lives.
Use Case 1: Application Inside Kubernetes (Recommended)
This is the cloud-native pattern. Both your application and PostgreSQL live in the same Kubernetes cluster, typically in the same namespace.
┌─────────────────────────────────────────────────────────┐
│ AWS EKS Cluster │
│ │
│ ┌──────────────┐ ┌──────────────────────────┐ │
│ │ Application │ │ CloudNativePG Cluster │ │
│ │ (Deployment)│──rw──▶ │ ┌────────┐ ┌────────┐ │ │
│ │ Stateless │ │ │Primary │ │Replica │ │ │
│ │ Multi-Pod │ │ └────────┘ └────────┘ │ │
│ └──────────────┘ └──────────────────────────┘ │
│ │ │
│ LoadBalancer / Ingress (HTTPS) │
│ │ │
│ End Users │
└─────────────────────────────────────────────────────────┘
The application connects to PostgreSQL through the -rw service (explained below). TLS is enforced by default. The application never needs to know which pod is the primary — the service handles routing automatically, even after a failover.
Use Case 2: Application Outside Kubernetes
Sometimes the database moves to Kubernetes before the application does. In this case, PostgreSQL is exposed via a LoadBalancer service type, and the application connects using a standard hostname and port — just like connecting to any PostgreSQL server. The application outside doesn’t need to know about Kubernetes at all.
┌─────────────────┐ ┌─────────────────────────┐
│ Application │ │ AWS EKS Cluster │
│ (VM / EC2) │──TCP──▶ │ LoadBalancer Service │
│ │ 5432 │ │ │
│ │ │ ┌─────▼────┐ │
│ │ │ │ Primary │ │
└─────────────────┘ │ └──────────┘ │
└─────────────────────────┘
Part 3: CloudNativePG Architecture Deep Dive
How State is Managed
Unlike stateless apps where you just add more pods, PostgreSQL state must be carefully replicated. CloudNativePG uses application-level replication — specifically PostgreSQL’s own built-in WAL streaming replication — rather than storage-level replication (like DRBD or replicated EBS).
Why not storage-level replication? Because PostgreSQL already handles this better than any storage layer can. Storage replication adds latency, doesn’t understand the database, and complicates crash recovery. PostgreSQL WAL streaming is battle-tested, fast, and works across availability zones.
CloudNativePG supports:
- Synchronous streaming replication — zero data loss, small write latency impact
- Asynchronous streaming replication — no write latency impact, small chance of data loss on failure
- File-based WAL shipping (to S3/object store) — used as backup and fallback for replica clusters
The Three Service Types
This is one of the most important concepts to understand before you write any YAML. CloudNativePG automatically creates three Kubernetes services for every cluster:
| Service | Suffix | What it connects to | When to use |
|---|---|---|---|
| Read-Write | -rw | Always the current primary | Your application’s main connection |
| Read-Only | -ro | Any hot standby replica | Offload read queries (analytics, reports) |
| Read (any) | -r | Any instance including primary | When you just need any readable PostgreSQL |
Key behaviour: If a failover happens and a replica is promoted to primary, CloudNativePG automatically updates the -rw service to point to the new primary. Your application keeps using the same hostname. No manual DNS changes, no connection string updates.
cluster-name-rw ──▶ Primary Pod
cluster-name-ro ──▶ Replica Pod(s) only
cluster-name-r ──▶ Any Pod (primary or replica)
Part 4: Node Topology for Production on AWS EKS
This is where many teams get it wrong. Running all PostgreSQL pods on the same node (or same AZ) defeats the purpose of HA.
Multi-AZ: The Recommended Setup
AWS EKS supports multi-AZ node groups. For CloudNativePG in production, you want one worker node per availability zone, with three AZs minimum:
us-east-1a us-east-1b us-east-1c
┌────────────┐ ┌────────────┐ ┌────────────┐
│ pg-node-1 │ │ pg-node-2 │ │ pg-node-3 │
│ │ │ │ │ │
│ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │
│ │Primary │ │ │ │Replica1│ │ │ │Replica2│ │
│ └────────┘ │ │ └────────┘ │ │ └────────┘ │
│ EBS gp3 │ │ EBS gp3 │ │ EBS gp3 │
└────────────┘ └────────────┘ └────────────┘
If us-east-1a goes down entirely, one of the replicas is automatically promoted to primary. No data loss (with synchronous replication). No manual intervention.
Reserving Nodes for PostgreSQL
In a shared EKS cluster, you don’t want application pods competing with PostgreSQL for CPU and memory. CloudNativePG recommends using node labels and taints to dedicate nodes to PostgreSQL:
Step 1: Label the node — tells the scheduler this node can run PostgreSQL
kubectl label node <NODE-NAME> node-role.kubernetes.io/postgres=
Step 2: Taint the node — prevents non-PostgreSQL pods from landing here
kubectl taint node <NODE-NAME> node-role.kubernetes.io/postgres=:NoSchedule
Step 3: Configure your Cluster resource to use these nodes:
spec:
affinity:
nodeSelector:
node-role.kubernetes.io/postgres: ""
tolerations:
- key: node-role.kubernetes.io/postgres
operator: Exists
effect: NoSchedule
Rule of thumb: Deploy Postgres nodes in multiples of three — one per AZ. A 3-node setup gives you a 3-instance cluster (1 primary + 2 replicas) spread across all three AZs. This is the production-ready baseline.
Single-AZ: What It Means for HA
If your EKS cluster only has one AZ — common in dev/test or early migration projects — CloudNativePG still works, but HA is limited to node-level failure only. If the entire AZ goes down, your cluster is unavailable until you fail over to a separate Kubernetes cluster (the replica cluster feature). This is still much better than a single VM, but it’s important to understand the boundary.
Part 5: Cross-Cluster Disaster Recovery
For enterprise deployments, CloudNativePG supports a distributed PostgreSQL topology spanning multiple Kubernetes clusters. Think of this as your multi-region DR strategy:
AWS Region us-east-1 (Primary) AWS Region us-west-2 (DR)
┌─────────────────────────┐ ┌─────────────────────────┐
│ EKS Cluster A │ │ EKS Cluster B │
│ │ │ │
│ Primary Cluster │──WAL──▶ │ Replica Cluster │
│ (read + write) │ stream │ (read only, standby) │
│ │ │ │
│ primary + 2 replicas │ │ designated primary │
│ │ │ + optional replicas │
└─────────────────────────┘ └─────────────────────────┘
The replica cluster stays in continuous recovery mode, consuming WAL from the primary cluster via streaming replication or S3-based WAL shipping. When you need to fail over:
- Planned switchover: Demote the primary cluster first, then promote the replica cluster. The former primary rejoins as a replica — no re-cloning needed.
- Unplanned failover: Promote the replica cluster manually. Some data loss is possible (determined by your RPO).
Important: CloudNativePG cannot perform automated cross-cluster failover. The operator’s scope is a single Kubernetes cluster. Cross-cluster promotion must be done manually or via a higher-level GitOps/orchestration tool.
Part 6: How Replication Works Inside a Cluster
Within a single Cluster resource, replication is fully managed:
Primary Pod ──WAL stream──▶ Replica Pod 1
└──WAL stream──▶ Replica Pod 2
└──WAL archive──▶ S3 (barman-cloud plugin)
- Default: asynchronous streaming replication (no write penalty, small RPO)
- Option: synchronous replication for zero RPO (at cost of write latency)
- WAL archive to S3: always recommended for point-in-time recovery (PITR)
In synchronous mode, the primary waits for at least one replica to confirm WAL receipt before acknowledging a commit. This gives you zero data loss but adds ~1-2ms latency per write depending on AZ network latency.
Architecture Summary
Here’s the complete picture of what we’ll build over this series:
AWS EKS us-east-1
│
├── Namespace: postgresql-operator-system
│ └── CloudNativePG Operator Pod
│
└── Namespace: postgres
├── Cluster: my-pg-cluster
│ ├── Pod: my-pg-cluster-1 (Primary) → us-east-1a
│ ├── Pod: my-pg-cluster-2 (Replica) → us-east-1b
│ └── Pod: my-pg-cluster-3 (Replica) → us-east-1c
│
├── Services
│ ├── my-pg-cluster-rw → Primary
│ ├── my-pg-cluster-ro → Replicas
│ └── my-pg-cluster-r → Any instance
│
├── Secrets
│ ├── my-pg-cluster-superuser
│ └── my-pg-cluster-app
│
└── PVCs (EBS gp3)
├── my-pg-cluster-1 (PGDATA)
├── my-pg-cluster-2 (PGDATA)
└── my-pg-cluster-3 (PGDATA)
Key Takeaways
✅ CloudNativePG uses PostgreSQL’s own WAL streaming replication — not storage-level replication. This is intentional and better for production.
✅ Three services are automatically created per cluster: -rw (primary), -ro (replicas only), -r (any). The -rw service automatically updates on failover — your app doesn’t need to change connection strings.
✅ For production on AWS EKS, use 3 worker nodes across 3 AZs. Label and taint them as “postgres nodes” to prevent workload interference.
✅ CloudNativePG manages PostgreSQL within a single Kubernetes cluster only. Cross-cluster failover (multi-region DR) requires manual promotion or a higher-level tool.
✅ The Cluster custom resource is the single source of truth for your entire PostgreSQL HA setup — instances, replication, services, storage, and failover are all declared in one place.
Test Your Knowledge
Ready to test what you’ve learned? Take the free quiz:
👉 PostgreSQL Replication Quiz → gradeupnow.in/postgres-replication-quiz/
20 questions · Instant feedback · Detailed explanations · Free
What’s Next
This post is part of the PostgreSQL on Kubernetes (CloudNativePG) series:
| # | Post | Status |
|---|---|---|
| 1 | CloudNativePG Architecture on AWS EKS | 📍 You are here |
| 2 | Installing CloudNativePG on AWS EKS — Step by Step | ⬜ Coming next week |
| 3 | PostgreSQL Configuration & Pod Tuning on Kubernetes | ⬜ Coming soon |
| 4 | Bootstrap Methods — initdb, Recovery, pg_basebackup | ⬜ Coming soon |
In Post 2, we move from concepts to commands. We’ll install the CloudNativePG operator on a real AWS EKS cluster using Helm, deploy a 3-instance PostgreSQL cluster, and verify all three services are created and routing correctly.
👉 [Next Post: Installing CloudNativePG on AWS EKS → coming next week]
References
- EDB CloudNativePG Official Documentation v1.28.1
- CloudNativePG Community Project
- CNCF Blog: Recommended Architectures for PostgreSQL in Kubernetes
- AWS EKS Documentation
- PostgreSQL Streaming Replication
Found this helpful? Share it with your DBA team! Questions? Drop them in the comments below.