Six Kubernetes cost leaks that survive every FinOps audit — and how we plug them

Most Kubernetes cost stories you'll read in 2026 are about rightsizing — pick smaller instances, scale down at night, use spot instances. That advice is fine, but it's table stakes. By the time you're reading those articles, you've already done it.

The real cost leaks in mature Kubernetes clusters are subtler. They sit in your manifests, not your cloud console. They survive a FinOps team's quarterly review because the bill looks normal — but per unit of work delivered, you're paying 2-3x what you should.

These are the six we keep finding. We've patched all of them across the Kubernetes clusters we manage on EKS, GKE, AKS, and DOKS.

Leak 1: Requests set to the 99th percentile

The single biggest cost leak. Engineering teams, scarred by an OOM-kill in 2023, set memory requests to "what we saw at peak, plus 50% headroom." Then they multiply that by replica count, and they're reserving 4x the memory the workload actually needs.

The fix is mechanical:

resources:
  requests:
    cpu: 100m       # the floor, not the ceiling
    memory: 256Mi
  limits:
    cpu: 1000m
    memory: 512Mi

Set requests to the 50th-percentile actual usage, and limits to the 99th-percentile. The kubelet uses requests for scheduling; limits for runtime enforcement. They are not the same number and should never be set to the same number.

How do you know what the percentiles actually are? Use Vertical Pod Autoscaler in recommendation-only mode for a week:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-recommender
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"   # recommendations only

A week later, look at kubectl describe vpa my-app-recommender. The numbers it suggests are usually 30-60% below what teams have configured. That delta is your cost leak.

Leak 2: Replicas that scale up but never scale down

HPA configured with minReplicas: 3, maxReplicas: 50. The morning traffic spike scales it to 30 replicas. Traffic falls off at 6pm. At midnight, it's still running 30 replicas.

The cause is almost always scaleDown.stabilizationWindowSeconds set to its 5-minute default with a downscale policy that's too conservative:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Percent
      value: 10        # only scale down 10% per minute
      periodSeconds: 60

That config takes hours to scale from 30 back to 3. By the time it gets there, the morning rush is starting again. You never actually run at minReplicas.

For most production workloads, much more aggressive scale-down is correct:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Percent
      value: 50
      periodSeconds: 60
    - type: Pods
      value: 5
      periodSeconds: 60
    selectPolicy: Max

Pair this with proper readiness probes so a scale-down doesn't terminate a pod mid-request, and you'll find your average replica count drops by 40-60% on bursty workloads.

Leak 3: Idle namespaces still running their dev stacks

Every Kubernetes cluster we audit has 5-15 namespaces named things like feature-xyz-poc, migration-2024, tom-test. They're running 6-month-old deployments, taking up node capacity, occasionally being reached by a stray traffic test that resurrects them in the engineer's mind just long enough to forget about them again.

The fix is a namespace TTL policy. Tag every non-production namespace with a creation timestamp and an owner:

apiVersion: v1
kind: Namespace
metadata:
  name: feature-xyz-poc
  labels:
    edge.io/ttl: "30d"
    edge.io/owner: "team-frontend"
    edge.io/created: "2026-04-10"

A controller scans nightly. Anything past TTL with no recent activity (no successful pod start in 7 days) gets:

A Slack ping to the owner: "we're deleting this in 7 days unless you renew the TTL"
After 7 days, automatic namespace deletion

We've seen this single policy reclaim 25% of a customer's node capacity inside a month. It's mostly an organizational change, not a technical one — but it pays for itself many times over.

Leak 4: PodDisruptionBudgets blocking node drains

PDB-driven cost leaks are sneaky. A workload sets minAvailable: 1 with 1 replica. Cluster Autoscaler decides a node could be removed for cost savings. The drain blocks because that replica can't be evicted without violating the PDB. Autoscaler gives up after 15 minutes and leaves the node running.

You see no error. You see no alert. You just have a node that won't go away.

The fix is twofold:

PDBs should reference maxUnavailable, not minAvailable, for any deployment that has variable replica count (HPA-driven). With minAvailable: 1, a deployment scaled to 1 by HPA at night will block all evictions; with maxUnavailable: 1, it allows one to be evicted at a time, which works at any replica count.
Run kubectl-autoscaler-status or similar tooling that surfaces "node X has been unschedulable for Y hours" so blocked drains become visible.

We instrument this for every managed Kubernetes cluster we run on AWS, GCP, and Azure — silent failures in autoscaler logic are the worst kind because they don't trigger alerts; they just inflate the bill.

Leak 5: Persistent volumes left behind after pod deletion

Default reclaim policy on dynamically-provisioned PVCs in most cloud providers is Delete — so deleting the PVC deletes the underlying disk. But many production setups override this to Retain for safety, and then nobody ever goes back to clean up.

You end up with hundreds of unattached EBS volumes / GCE persistent disks / Azure managed disks, each costing $5-50 a month. We've seen customer accounts where the unattached-disk cost was a full 10% of the EBS bill.

The audit query is simple:

# AWS
aws ec2 describe-volumes --filters "Name=status,Values=available" \
  --query 'Volumes[].[VolumeId,Size,CreateTime,Tags]' --output table
 
# GCP
gcloud compute disks list --filter='-users:*'
 
# Azure
az disk list --query "[?managedBy==null].{name:name, sizeGb:diskSizeGb}"

Anything that's been "available" (= unattached) for more than 30 days, with no keep=true tag, is fair game for deletion after snapshotting.

Leak 6: Cluster control planes for clusters that aren't doing anything

The most embarrassing one. EKS control planes are $73/month. AKS is similar. GKE is $0.10/hour for the control plane on standard clusters. If you have a dev cluster, a staging cluster, a UAT cluster, and a "we'll get back to this" cluster — that's $300/month before a single workload runs.

For non-production environments, consider:

Consolidating dev/staging/UAT into one cluster with strong namespace isolation
GKE Autopilot (control plane is free) for low-utilisation clusters
k3s on a single VM for genuinely small dev clusters where Kubernetes API is needed but production-grade HA isn't

We routinely consolidate 4-5 non-prod clusters into one when we onboard a new managed customer, and it pays for the first three months of our management retainer on its own.

The instrumented version

The pattern across all six leaks: you can't fix what you can't see. Standard Kubernetes monitoring (Prometheus + Grafana) gives you CPU and memory at the pod and node level. It does not give you:

Request:limit ratio histograms
PDB-blocked eviction counters
Namespace age and last-pod-event timestamps
Unattached PV/PVC inventory
Per-namespace allocated-vs-used CPU and memory

That instrumentation has to be added explicitly. Tools like KubeCost, OpenCost, and CAST AI all do versions of this. For our customers on GKE, we usually ship OpenCost as a default add-on and feed its metrics into our shared Grafana — so the FinOps view is the same view the SREs use.

That's the whole game. The cost leaks aren't in the bill. They're in the gap between what the manifests say the workload needs and what the workload actually needs. Closing that gap is mostly tooling and discipline, and it pays back faster than any other infrastructure investment you can make.

Sudhanshu K. is Principal Engineer at EdgeServers (RemotIQ Pty Ltd, ABN 91 682 628 128). He runs managed Kubernetes for customers on every major cloud and has yet to find a cluster that didn't have at least four of these six leaks.