CloudNativePG: Running PostgreSQL in Kubernetes Without the Pain

A CloudNativePG cluster that sits in Setting up primary forever, with zero error events on the Cluster resource and a perfectly healthy operator, is one of the more frustrating ways to spend an afternoon. The operator says it’s working. The pods never appear. And the actual cause has nothing to do with the database at all.

Running stateful databases on Kubernetes used to be the thing everyone told you not to do. CloudNativePG (CNPG) changed that calculus for a lot of people, including me. It’s a proper operator: it handles failover, backups, connection routing, and rolling upgrades through native Kubernetes primitives instead of bolting Postgres onto a StatefulSet and praying. If you run a hardened cluster with admission controllers, network policies, and least-privilege RBAC, this post is about the friction you’ll hit that the quickstart never mentions.

Who should care

If your cluster is vanilla, kubectl apply the operator and a Cluster manifest, and you’re done in ten minutes. The CNPG docs are genuinely good for that path. This is for the rest of us: people running Kyverno or OPA Gatekeeper, self-signed cert chains, and the kind of policy-as-code setup where every workload has to justify its existence. That’s where CNPG stops being a ten-minute install and starts being an integration project.

What I tried first

The first instinct, when a CNPG cluster hangs, is to assume you got the database config wrong. So you go read your Cluster manifest line by line. You check the storage class. You check that the PVC bound. You bump the operator log level and watch it cheerfully report that it’s reconciling, over and over, with no complaints.

Here’s the trap: the CNPG operator doesn’t run initdb itself. It creates a Kubernetes Job to bootstrap the primary. That Job spawns a Pod. And in a hardened cluster, the Pod is where everything dies, because your admission controller is judging it against policies the operator’s own Pods were exempted from but the bootstrap Job was not.

The mistake I see constantly is reading the wrong resource. People kubectl describe cluster and kubectl describe pod on the operator, find nothing, and conclude CNPG is broken. The events you need are on the Job and on the Pod the Job tries to create. A blocked Pod creation shows up as an event on the Job’s owning controller, not on the Cluster:

# The Cluster looks stuck here, but says nothing useful
kubectl get cluster -n databases
# NAME       AGE   INSTANCES   READY   STATUS                    PRIMARY
# pg-main    8m    3           0       Setting up primary

# The real story is on the bootstrap Job's events
kubectl describe job -n databases pg-main-1-initdb

If a policy is the culprit, that describe output is where you’ll finally see something like admission webhook "validate.kyverno.svc" denied the request: validation error: every container must define resource limits. The bootstrap Job’s Pod template didn’t set CPU/memory limits, your require-resource-limits policy rejected it, and the operator quietly retries forever because, from its perspective, it asked Kubernetes nicely and Kubernetes said no.

I spent longer than I’d like to admit assuming the storage layer was at fault before I went and looked at the Job. The lesson stuck: when an operator hangs, find the resource the operator creates, not the resource it manages.

The actual solution

1. Exempt CNPG lifecycle resources from blocking policies

CNPG generates Jobs and Pods on your behalf, and you can’t directly edit their pod templates the way you would a Deployment you wrote. So the fix isn’t to add resource limits to the Job. It’s to teach your policy engine that CNPG-owned resources are allowed to skip the rule that’s blocking them.

Every resource CNPG creates carries the cnpg.io/cluster label. That label is your exclusion key. For Kyverno, add an exclude block to the rule that’s firing:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-resource-limits
spec:
  validationFailureAction: Enforce
  rules:
    - name: validate-resources
      match:
        any:
          - resources:
              kinds: ["Pod"]
      exclude:
        any:
          - resources:
              # CNPG-managed Pods (instances + bootstrap Jobs) carry this label
              selector:
                matchLabels:
                  cnpg.io/cluster: "*"
      validate:
        message: "Every container must define CPU and memory limits."
        pattern:
          spec:
            containers:
              - resources:
                  limits:
                    memory: "?*"
                    cpu: "?*"

This is a deliberately narrow exclusion. You’re not disabling the policy. You’re carving out resources that match a specific operator-owned label, which means a developer can’t accidentally smuggle a limitless Pod past the gate by slapping a random label on it. If you want to be stricter, scope the exclusion to the databases namespace as well so the label only grants an exemption where CNPG is actually allowed to run.

The same idea applies to OPA Gatekeeper, just expressed differently: add the label to the constraint’s match.excludedNamespaces or write a labelSelector exclusion in the constraint spec. The principle doesn’t change. Match the operator’s label, exempt the lifecycle resources, leave everything else under enforcement. I wrote about the general shape of this in Kyverno Admission Controllers: Policy-as-Code That Actually Works, and CNPG’s initdb Job is the cleanest real-world example I’ve found of policy breaking infrastructure in a way that’s invisible until you know where to look.

2. Give the operator the RBAC it actually needs

If you provision service accounts by hand instead of trusting the operator’s defaults, remember that CNPG needs to manage Jobs, Pods, PVCs, Secrets, and Services on your behalf. A read-only or overly-scoped account will fail in the same silent way a policy block does: the reconcile loop runs, the create call gets a 403, and nothing visible happens.

The operator’s ClusterRole covers this out of the box. If you’re tightening it, the non-obvious permissions are the ability to create and delete Jobs (for initdb and restores) and to manage PVCs (for volume expansion and replica provisioning). Strip those and your cluster bootstraps fine until the first time it needs to scale or recover, then breaks. I go deeper on scoping accounts like this in Kubernetes RBAC: Building Least-Privilege Service Accounts.

3. Pin your PostgreSQL minor version away from 16.4

There’s a known regression in PostgreSQL 16.4 where the server can hit a segmentation fault under certain memory conditions on nodes with large amounts of RAM available. If you’re running CNPG on beefy worker nodes (16GB+ of available memory is the trigger zone), this is exactly the kind of thing that looks like a CNPG bug, a storage bug, or a kernel OOM, when it’s actually upstream Postgres.

The fix is boring and effective: pin the image to a known-good minor and don’t float the tag.

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: pg-main
  namespace: databases
spec:
  instances: 3
  # Pin explicitly. Do not use a floating major-version tag in production.
  imageName: ghcr.io/cloudnative-pg/postgresql:16.6
  storage:
    size: 20Gi
    storageClass: longhorn
  resources:
    requests:
      memory: "2Gi"
      cpu: "500m"
    limits:
      memory: "2Gi"
      cpu: "1"

Note the memory requests and limits are set to the same value. For a database, you almost never want Postgres getting throttled or evicted because a noisy neighbor ballooned and the scheduler decided your requests were a polite suggestion. Equal requests and limits put the Pod in the Guaranteed QoS class, which is what you want for a stateful workload you can’t afford to lose to memory pressure.

4. Understand the three Services CNPG hands you

This is the part that pays off long after install. For a cluster named pg-main, CNPG creates a set of Services automatically, and each one routes to a different role:

Service	Routes to	Use it for
`pg-main-rw`	Current primary	Writes, migrations, anything that mutates
`pg-main-ro`	Replicas only	Read-only queries, reporting, analytics
`pg-main-r`	Any instance (primary or replica)	Reads where you don’t care which node

The -rw Service is the important one: when CNPG fails over, it repoints -rw at the new primary. Your application doesn’t need to know a failover happened. It keeps connecting to pg-main-rw.databases.svc.cluster.local and the operator handles the rest. That’s the entire value proposition of running Postgres under an operator instead of as a hand-rolled StatefulSet.

For read/write splitting, point your app at two connection strings instead of one. Most ORMs and connection libraries support a primary/replica config:

# In your app's config or Secret
env:
  - name: DATABASE_URL_PRIMARY
    value: "postgresql://app:$(PGPASSWORD)@pg-main-rw.databases.svc.cluster.local:5432/appdb"
  - name: DATABASE_URL_REPLICA
    value: "postgresql://app:$(PGPASSWORD)@pg-main-ro.databases.svc.cluster.local:5432/appdb"

Send SELECTs that tolerate slight replication lag to -ro, and send everything else to -rw. The catch worth stating plainly: replicas are asynchronous by default, so a read immediately after a write can return stale data. If you need read-your-writes consistency for a given query, send it to -rw. Don’t blanket-route all reads to replicas and then act surprised when a user doesn’t see the row they just created.

5. Connection SSL: the untrusted-certificate wall

CNPG enables TLS by default and issues its own certificates through an internal CA. That’s good for in-cluster security and annoying the first time a client refuses to connect because it doesn’t trust the CA.

The error you’ll see from a client is some flavor of SSL error: certificate verify failed or self-signed certificate in certificate chain. The wrong reaction is to globally disable TLS on the cluster. The right reaction depends on who’s connecting:

# In-cluster clients: trust CNPG's CA. The operator publishes it as a Secret.
kubectl get secret pg-main-ca -n databases -o jsonpath='{.data.ca\.crt}' | base64 -d > ca.crt
# Then point the client at it:
# postgresql://...?sslmode=verify-full&sslrootcert=/etc/pg/ca.crt

For clients that genuinely can’t do certificate verification (some managed platforms and serverless backends only support a binary “SSL on/off” toggle and can’t be handed a custom CA), you have two honest options. Either set sslmode=require on the client, which encrypts the connection but skips CA verification, or terminate trust at a proxy you control. sslmode=require is the pragmatic middle ground: you keep encryption in transit and drop only the identity check. It’s not as strong as verify-full, but it’s a deliberate, documented tradeoff rather than turning TLS off entirely.

Here’s the quick reference I keep around for the sslmode ladder:

`sslmode`	Encrypted?	Verifies CA?	Verifies hostname?
`disable`	No	No	No
`require`	Yes	No	No
`verify-ca`	Yes	Yes	No
`verify-full`	Yes	Yes	Yes

Aim for verify-full for anything in-cluster, where you control the CA distribution. Drop to require only for external clients that can’t be handed the CA, and never to disable. If you’re already running cluster-wide TLS automation, the CA-distribution problem is the same one cert-manager solves for ingress; I covered that workflow in cert-manager + Cloudflare DNS-01: Automated TLS for Everything.

6. Exposing pgAdmin without poking a hole in the cluster

You’ll eventually want a GUI to poke at the database. The pattern I’d reach for is pgAdmin4 in its own namespace, reachable through your existing ingress controller, never exposed directly. Keep it in a separate namespace from the database so your network policies can treat it as an external-ish client that’s explicitly allowed to reach the -rw/-ro Services, rather than something that lives inside the data tier.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: pgadmin
  namespace: pgadmin
  annotations:
    # Force HTTPS and lean on cert-manager for the cert
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    # pgAdmin needs a bigger body size for imports/exports
    nginx.ingress.kubernetes.io/proxy-body-size: "16m"
spec:
  ingressClassName: nginx
  tls:
    - hosts: ["pgadmin.example.com"]
      secretName: pgadmin-tls
  rules:
    - host: pgadmin.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: pgadmin
                port:
                  number: 80

Put authentication in front of it. pgAdmin’s own login is fine, but I’d add an ingress-level auth layer (OAuth proxy or basic auth) so a leaked pgAdmin password isn’t a direct line to your database. And lock down the NetworkPolicy so only the pgAdmin namespace can reach the database Services. A database admin GUI on the public internet with default credentials is how clusters become someone else’s crypto miner.

Why it works

The thing that finally made CNPG click for me is that it’s not pretending Postgres is stateless. It embraces the fact that a database has a primary and replicas, that failover is a real event, and that bootstrapping is a one-time Job rather than a steady-state process. Every piece of the design maps a Postgres concept onto a native Kubernetes object you can inspect with kubectl.

That’s also why the failure modes are sneaky. The operator delegates the actual work to Jobs and Pods, so when an admission controller or RBAC rule blocks one of those, the operator has no good way to surface it beyond a stalled status. There’s no exception thrown into your terminal. The reconcile loop is doing exactly what it’s designed to do, which is keep trying, and “keep trying against a wall” looks identical to “working” until you go read the Job’s events.

The Service abstraction works because CNPG owns the failover decision and the endpoint update atomically. When it promotes a replica, it updates the -rw Service’s selector in the same control loop. There’s no DNS TTL to wait out, no client-side failover logic to get wrong, no floating VIP to manage. Kubernetes Service routing was already solving “send traffic to whichever Pod currently has this role,” and CNPG just plugs the primary/replica roles into that existing machinery. Running databases reliably on Kubernetes is the kind of platform-engineering work that separates a homelab toy from production infrastructure, and it’s a chunk of what I do in consulting engagements.

Lessons learned

The biggest shift was learning to debug the resources the operator creates, not the ones it manages. kubectl describe cluster will lie to you by omission. The Job and its Pod tell the truth. If a CNPG cluster hangs in Setting up primary, my first move now is straight to the bootstrap Job’s events, and nine times out of ten it’s a policy or RBAC denial, not a database problem.

What surprised me was how much the hardened-cluster setup matters. Every CNPG tutorial assumes a permissive cluster, so the exact features that make a cluster production-grade (enforced resource limits, least-privilege RBAC, default-deny network policies) are the features that break the install. None of them are CNPG’s fault. They’re the cost of doing security right, and the fix is always a narrow, labeled exclusion rather than a blanket exception. If you run CNPG via GitOps, put those policy exclusions in the same ArgoCD app as the operator so they’re never out of sync; the App-of-Apps pattern handles this cleanly.

If I were starting over, I’d pin the PostgreSQL minor version from day one and treat floating tags as a production smell, set Guaranteed QoS on the database Pods before the first incident rather than after, and write the read/write split into the application from the start instead of routing everything at the primary and refactoring later. None of those are hard. They’re just the kind of decision that’s cheap to make early and expensive to retrofit once you have data and uptime to protect.

CNPG genuinely delivers on running Postgres in Kubernetes without the pain, but only if you account for the cluster you actually have, not the empty one the docs assume. The operator is excellent. The integration with your security posture is the part you own.