CloudNativePG Architecture on AWS EKS — Complete Guide for DBAs

Meta Description (155 chars): Learn CloudNativePG architecture on AWS EKS — key concepts, PostgreSQL HA design, node isolation, service types and use cases explained with real diagrams.

Focus Keyword: cloudnativepg architecture aws eks Secondary Keywords: postgresql on kubernetes architecture, cloudnativepg use cases, postgres kubernetes operator Category: PostgreSQL Series: PostgreSQL on Kubernetes (CloudNativePG) — Post 1 of 12

Introduction

If you’ve spent years managing PostgreSQL on bare metal or VMs, running it on Kubernetes can feel like learning a new language. The concepts are familiar — primary, replica, failover — but the mechanics are completely different. Before you install a single operator or write a single YAML file, you need to understand how CloudNativePG thinks about PostgreSQL.

This is Post 1 of our 12-part series on running PostgreSQL on Kubernetes using CloudNativePG (the EDB Postgres AI for CloudNativePG Cluster operator). We’ll be testing everything on a real AWS EKS cluster and attaching screenshots after each step.

This post covers the architecture and core concepts — no installation yet. Think of it as the blueprint before the build.

What you’ll learn:

The key Kubernetes and PostgreSQL terms you need to know before you start
The two main deployment use cases CloudNativePG supports
How CloudNativePG handles HA, replication, and service routing
How to design your node topology for production on AWS EKS
The difference between single-AZ and multi-AZ deployments

Prerequisites

Before you start, make sure you have:

[ ] Basic familiarity with Kubernetes (Pods, Services, Namespaces, PVCs)
[ ] Basic PostgreSQL knowledge (primary, replica, WAL)
[ ] An AWS account with EKS access (we’ll install in Post 2)
[ ] kubectl installed on your local machine

Lab Environment

We’ll use this environment throughout the entire series. Post 1 is conceptual — no commands to run yet — but here’s what we’re building toward:

Component	Version / Details
Kubernetes	AWS EKS 1.31
CloudNativePG Operator	v1.28.1
PostgreSQL	16.x
Worker Nodes	3 × m5.xlarge (one per AZ)
Storage	AWS EBS gp3 (via EBS CSI Driver)
Region	us-east-1 (3 Availability Zones)
kubectl	v1.31+

What is CloudNativePG?

CloudNativePG (developed by EDB as “EDB Postgres AI for CloudNativePG Cluster”) is a Kubernetes operator that manages the full lifecycle of PostgreSQL clusters. It is 100% Kubernetes-native — no external tools, no sidecars for management, no external coordinators like etcd for Patroni.

The operator introduces a custom resource called Cluster that defines your entire PostgreSQL setup declaratively. You describe what you want; the operator makes it happen and keeps it that way.

Part 1: Terminology You Must Know

Before going further, let’s align on terms. CloudNativePG documentation uses both Kubernetes and PostgreSQL terminology, and they sometimes overlap in confusing ways.

Kubernetes Terms

Term	What it means in CloudNativePG context
Node	A worker machine (EC2 instance on EKS) where PostgreSQL pods run
Postgres Node	A worker node dedicated to PostgreSQL workloads, labelled and tainted to prevent other workloads from landing on it
Pod	The container running a single PostgreSQL instance
Service	A stable network endpoint that routes traffic to the correct pod (primary or replica)
Secret	Stores passwords, certificates, connection strings — used heavily by CloudNativePG
StorageClass	Defines the type of storage (e.g., AWS EBS gp3) including provisioner, reclaim policy, and expansion settings
PersistentVolume (PV)	The actual storage resource — on EKS, this is an EBS volume
PersistentVolumeClaim (PVC)	The request for storage made by a PostgreSQL pod
Namespace	Logical isolation boundary — we’ll deploy CloudNativePG in its own namespace
RBAC	Role-Based Access Control — CloudNativePG uses this to control what the operator can and cannot touch
CRD	Custom Resource Definition — CloudNativePG extends Kubernetes with a `Cluster` CRD
Operator	The controller that watches `Cluster` resources and ensures the actual state matches the desired state

PostgreSQL Terms

Term	What it means
Instance	A single running PostgreSQL server process
Primary	The instance accepting both reads and writes
Replica / Standby	An instance replicating from the primary via WAL streaming
Hot Standby	A replica that can serve read-only queries while staying in sync
Cluster	In CloudNativePG terms: one primary + N replicas, all managed as a single `Cluster` resource
Replica Cluster	A separate `Cluster` resource in another Kubernetes cluster, used for cross-region DR
Designated Primary	The standby in a replica cluster that becomes primary during a failover of the whole cluster
WAL	Write-Ahead Log — the transaction log PostgreSQL uses for replication and crash recovery
PVC Group	The pair of PVCs per PostgreSQL instance: one for PGDATA (`storage`) and one for WAL files (`walStorage`)
RTO	Recovery Time Objective — how fast you recover
RPO	Recovery Point Objective — how much data you can afford to lose

Cloud Terms (AWS Context)

Term	AWS equivalent
Region	AWS Region (e.g., us-east-1)
Availability Zone	AWS AZ (e.g., us-east-1a, us-east-1b, us-east-1c)

Part 2: The Two CloudNativePG Use Cases

CloudNativePG supports two primary deployment patterns depending on where your application lives.

Use Case 1: Application Inside Kubernetes (Recommended)

This is the cloud-native pattern. Both your application and PostgreSQL live in the same Kubernetes cluster, typically in the same namespace.

┌─────────────────────────────────────────────────────────┐
│                   AWS EKS Cluster                       │
│                                                         │
│  ┌──────────────┐        ┌──────────────────────────┐  │
│  │  Application │        │   CloudNativePG Cluster  │  │
│  │  (Deployment)│──rw──▶ │  ┌────────┐ ┌────────┐  │  │
│  │  Stateless   │        │  │Primary │ │Replica │  │  │
│  │  Multi-Pod   │        │  └────────┘ └────────┘  │  │
│  └──────────────┘        └──────────────────────────┘  │
│         │                                               │
│    LoadBalancer / Ingress (HTTPS)                       │
│         │                                               │
│    End Users                                            │
└─────────────────────────────────────────────────────────┘

The application connects to PostgreSQL through the -rw service (explained below). TLS is enforced by default. The application never needs to know which pod is the primary — the service handles routing automatically, even after a failover.

Use Case 2: Application Outside Kubernetes

Sometimes the database moves to Kubernetes before the application does. In this case, PostgreSQL is exposed via a LoadBalancer service type, and the application connects using a standard hostname and port — just like connecting to any PostgreSQL server. The application outside doesn’t need to know about Kubernetes at all.

┌─────────────────┐          ┌─────────────────────────┐
│  Application    │          │    AWS EKS Cluster      │
│  (VM / EC2)     │──TCP──▶  │  LoadBalancer Service   │
│                 │  5432    │         │               │
│                 │          │   ┌─────▼────┐          │
│                 │          │   │ Primary  │          │
└─────────────────┘          │   └──────────┘          │
                             └─────────────────────────┘

Part 3: CloudNativePG Architecture Deep Dive

How State is Managed

Unlike stateless apps where you just add more pods, PostgreSQL state must be carefully replicated. CloudNativePG uses application-level replication — specifically PostgreSQL’s own built-in WAL streaming replication — rather than storage-level replication (like DRBD or replicated EBS).

Why not storage-level replication? Because PostgreSQL already handles this better than any storage layer can. Storage replication adds latency, doesn’t understand the database, and complicates crash recovery. PostgreSQL WAL streaming is battle-tested, fast, and works across availability zones.

CloudNativePG supports:

Synchronous streaming replication — zero data loss, small write latency impact
Asynchronous streaming replication — no write latency impact, small chance of data loss on failure
File-based WAL shipping (to S3/object store) — used as backup and fallback for replica clusters

The Three Service Types

This is one of the most important concepts to understand before you write any YAML. CloudNativePG automatically creates three Kubernetes services for every cluster:

Service	Suffix	What it connects to	When to use
Read-Write	`-rw`	Always the current primary	Your application’s main connection
Read-Only	`-ro`	Any hot standby replica	Offload read queries (analytics, reports)
Read (any)	`-r`	Any instance including primary	When you just need any readable PostgreSQL

Key behaviour: If a failover happens and a replica is promoted to primary, CloudNativePG automatically updates the -rw service to point to the new primary. Your application keeps using the same hostname. No manual DNS changes, no connection string updates.

cluster-name-rw  ──▶  Primary Pod
cluster-name-ro  ──▶  Replica Pod(s) only
cluster-name-r   ──▶  Any Pod (primary or replica)

Part 4: Node Topology for Production on AWS EKS

This is where many teams get it wrong. Running all PostgreSQL pods on the same node (or same AZ) defeats the purpose of HA.

Multi-AZ: The Recommended Setup

AWS EKS supports multi-AZ node groups. For CloudNativePG in production, you want one worker node per availability zone, with three AZs minimum:

us-east-1a          us-east-1b          us-east-1c
┌────────────┐      ┌────────────┐      ┌────────────┐
│ pg-node-1  │      │ pg-node-2  │      │ pg-node-3  │
│            │      │            │      │            │
│ ┌────────┐ │      │ ┌────────┐ │      │ ┌────────┐ │
│ │Primary │ │      │ │Replica1│ │      │ │Replica2│ │
│ └────────┘ │      │ └────────┘ │      │ └────────┘ │
│  EBS gp3   │      │  EBS gp3   │      │  EBS gp3   │
└────────────┘      └────────────┘      └────────────┘

If us-east-1a goes down entirely, one of the replicas is automatically promoted to primary. No data loss (with synchronous replication). No manual intervention.

Reserving Nodes for PostgreSQL

In a shared EKS cluster, you don’t want application pods competing with PostgreSQL for CPU and memory. CloudNativePG recommends using node labels and taints to dedicate nodes to PostgreSQL:

Step 1: Label the node — tells the scheduler this node can run PostgreSQL

kubectl label node <NODE-NAME> node-role.kubernetes.io/postgres=

Step 2: Taint the node — prevents non-PostgreSQL pods from landing here

kubectl taint node <NODE-NAME> node-role.kubernetes.io/postgres=:NoSchedule

Step 3: Configure your Cluster resource to use these nodes:

spec:
  affinity:
    nodeSelector:
      node-role.kubernetes.io/postgres: ""
    tolerations:
    - key: node-role.kubernetes.io/postgres
      operator: Exists
      effect: NoSchedule

Rule of thumb: Deploy Postgres nodes in multiples of three — one per AZ. A 3-node setup gives you a 3-instance cluster (1 primary + 2 replicas) spread across all three AZs. This is the production-ready baseline.

Single-AZ: What It Means for HA

If your EKS cluster only has one AZ — common in dev/test or early migration projects — CloudNativePG still works, but HA is limited to node-level failure only. If the entire AZ goes down, your cluster is unavailable until you fail over to a separate Kubernetes cluster (the replica cluster feature). This is still much better than a single VM, but it’s important to understand the boundary.

Part 5: Cross-Cluster Disaster Recovery

For enterprise deployments, CloudNativePG supports a distributed PostgreSQL topology spanning multiple Kubernetes clusters. Think of this as your multi-region DR strategy:

AWS Region us-east-1 (Primary)        AWS Region us-west-2 (DR)
┌─────────────────────────┐           ┌─────────────────────────┐
│   EKS Cluster A         │           │   EKS Cluster B         │
│                         │           │                         │
│  Primary Cluster        │──WAL──▶   │  Replica Cluster        │
│  (read + write)         │  stream   │  (read only, standby)   │
│                         │           │                         │
│  primary + 2 replicas   │           │  designated primary     │
│                         │           │  + optional replicas    │
└─────────────────────────┘           └─────────────────────────┘

The replica cluster stays in continuous recovery mode, consuming WAL from the primary cluster via streaming replication or S3-based WAL shipping. When you need to fail over:

Planned switchover: Demote the primary cluster first, then promote the replica cluster. The former primary rejoins as a replica — no re-cloning needed.
Unplanned failover: Promote the replica cluster manually. Some data loss is possible (determined by your RPO).

Important: CloudNativePG cannot perform automated cross-cluster failover. The operator’s scope is a single Kubernetes cluster. Cross-cluster promotion must be done manually or via a higher-level GitOps/orchestration tool.

Part 6: How Replication Works Inside a Cluster

Within a single Cluster resource, replication is fully managed:

Primary Pod ──WAL stream──▶ Replica Pod 1
           └──WAL stream──▶ Replica Pod 2
           └──WAL archive──▶ S3 (barman-cloud plugin)

Default: asynchronous streaming replication (no write penalty, small RPO)
Option: synchronous replication for zero RPO (at cost of write latency)
WAL archive to S3: always recommended for point-in-time recovery (PITR)

In synchronous mode, the primary waits for at least one replica to confirm WAL receipt before acknowledging a commit. This gives you zero data loss but adds ~1-2ms latency per write depending on AZ network latency.

Architecture Summary

Here’s the complete picture of what we’ll build over this series:

AWS EKS us-east-1
│
├── Namespace: postgresql-operator-system
│   └── CloudNativePG Operator Pod
│
└── Namespace: postgres
    ├── Cluster: my-pg-cluster
    │   ├── Pod: my-pg-cluster-1 (Primary)    → us-east-1a
    │   ├── Pod: my-pg-cluster-2 (Replica)    → us-east-1b
    │   └── Pod: my-pg-cluster-3 (Replica)    → us-east-1c
    │
    ├── Services
    │   ├── my-pg-cluster-rw  → Primary
    │   ├── my-pg-cluster-ro  → Replicas
    │   └── my-pg-cluster-r   → Any instance
    │
    ├── Secrets
    │   ├── my-pg-cluster-superuser
    │   └── my-pg-cluster-app
    │
    └── PVCs (EBS gp3)
        ├── my-pg-cluster-1  (PGDATA)
        ├── my-pg-cluster-2  (PGDATA)
        └── my-pg-cluster-3  (PGDATA)

Key Takeaways

✅ CloudNativePG uses PostgreSQL’s own WAL streaming replication — not storage-level replication. This is intentional and better for production.

✅ Three services are automatically created per cluster: -rw (primary), -ro (replicas only), -r (any). The -rw service automatically updates on failover — your app doesn’t need to change connection strings.

✅ For production on AWS EKS, use 3 worker nodes across 3 AZs. Label and taint them as “postgres nodes” to prevent workload interference.

✅ CloudNativePG manages PostgreSQL within a single Kubernetes cluster only. Cross-cluster failover (multi-region DR) requires manual promotion or a higher-level tool.

✅ The Cluster custom resource is the single source of truth for your entire PostgreSQL HA setup — instances, replication, services, storage, and failover are all declared in one place.

Test Your Knowledge

Ready to test what you’ve learned? Take the free quiz:

👉 PostgreSQL Replication Quiz → gradeupnow.in/postgres-replication-quiz/

20 questions · Instant feedback · Detailed explanations · Free

What’s Next

This post is part of the PostgreSQL on Kubernetes (CloudNativePG) series:

#	Post	Status
1	CloudNativePG Architecture on AWS EKS	📍 You are here
2	Installing CloudNativePG on AWS EKS — Step by Step	⬜ Coming next week
3	PostgreSQL Configuration & Pod Tuning on Kubernetes	⬜ Coming soon
4	Bootstrap Methods — initdb, Recovery, pg_basebackup	⬜ Coming soon

In Post 2, we move from concepts to commands. We’ll install the CloudNativePG operator on a real AWS EKS cluster using Helm, deploy a 3-instance PostgreSQL cluster, and verify all three services are created and routing correctly.

👉 [Next Post: Installing CloudNativePG on AWS EKS → coming next week]

References

Found this helpful? Share it with your DBA team! Questions? Drop them in the comments below.