Skip to content

Module 3 — Cluster Upgrade with kubeadm

Overview

Upgrading a Kubernetes cluster is a high-frequency CKA exam task. You must know the exact sequence: upgrade kubeadm, upgrade the control plane, drain workers, upgrade workers, uncordon. This module covers the full procedure, the version skew policy that governs which versions can coexist, and the drain/uncordon mechanics.


1. Version Skew Policy

Before upgrading anything, you must understand which component versions can coexist.

1.1 Core Rules

The API server is the reference point — all other components are measured against it.

Component Allowed versions relative to kube-apiserver
kube-apiserver Must be the same across all control plane nodes (in HA)
kube-controller-manager -1 minor version (can lag by 1)
kube-scheduler -1 minor version (can lag by 1)
kubelet -2 minor versions (can lag by 2)
kube-proxy -2 minor versions (can lag by 2)
kubectl ±1 minor version (can be 1 ahead or 1 behind)

1.2 Visual Example

Upgrading from v1.29 to v1.30:

1
2
3
4
5
6
7
8
                    ALLOWED DURING UPGRADE
                    ──────────────────────
kube-apiserver      v1.29 → v1.30         (upgrade first)
controller-manager  v1.29 or v1.30        (can lag by 1)
scheduler           v1.29 or v1.30        (can lag by 1)
kubelet             v1.28, v1.29, v1.30   (can lag by 2)
kube-proxy          v1.28, v1.29, v1.30   (can lag by 2)
kubectl             v1.29, v1.30, v1.31   (±1)

1.3 Upgrade Constraints

  • You can only upgrade one minor version at a time (v1.29 → v1.30 ✅, v1.29 → v1.31 ❌)
  • Patch version upgrades within the same minor are always allowed (v1.30.0 → v1.30.3 ✅)
  • Always upgrade the control plane first, then the workers

CKA Tip: The exam will never ask you to skip a minor version. But knowing the skew policy helps you understand why the order matters.


2. Upgrade Strategy Overview

┌─────────────────────────────────────────────────────────────┐
│                    UPGRADE SEQUENCE                          │
│                                                             │
│  1. Upgrade kubeadm on the first control plane node         │
│  2. kubeadm upgrade plan  (check what's available)          │
│  3. kubeadm upgrade apply (upgrade control plane components)│
│  4. Upgrade kubelet & kubectl on control plane node(s)      │
│  5. For each additional CP node: kubeadm upgrade node       │
│                                                             │
│  6. For each worker node:                                   │
│     a. kubectl drain <node>                                 │
│     b. Upgrade kubeadm, kubelet, kubectl on the node        │
│     c. kubeadm upgrade node                                 │
│     d. Restart kubelet                                      │
│     e. kubectl uncordon <node>                              │
└─────────────────────────────────────────────────────────────┘

3. Upgrading the Control Plane

3.1 Step 1 — Determine the Target Version

1
2
3
4
5
6
7
# Check current cluster version
kubectl get nodes

# Check available kubeadm versions
apt-cache madison kubeadm
# or
apt list -a kubeadm

3.2 Step 2 — Upgrade kubeadm

# Unhold kubeadm
apt-mark unhold kubeadm

# Upgrade kubeadm to the target version
apt-get update && apt-get install -y kubeadm=1.30.0-1.1

# Re-hold kubeadm
apt-mark hold kubeadm

# Verify
kubeadm version

3.3 Step 3 — Plan the Upgrade

kubeadm upgrade plan

This command: - Shows the current cluster version - Shows available upgrade targets - Checks component health - Lists which components will be upgraded - Warns about any manual actions needed

Example output:

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       TARGET
kubelet     v1.29.0       v1.30.0

Upgrade to the latest version in the v1.30 series:

COMPONENT                 CURRENT   TARGET
kube-apiserver            v1.29.0   v1.30.0
kube-controller-manager   v1.29.0   v1.30.0
kube-scheduler            v1.29.0   v1.30.0
kube-proxy                v1.29.0   v1.30.0
CoreDNS                   v1.11.1   v1.11.3
etcd                      3.5.10    3.5.12

3.4 Step 4 — Apply the Upgrade (First Control Plane Node Only)

kubeadm upgrade apply v1.30.0

This upgrades: - kube-apiserver, kube-controller-manager, kube-scheduler (static pod manifests) - kube-proxy (DaemonSet) - CoreDNS (Deployment) - etcd (static pod manifest, if using stacked etcd)

Important: kubeadm upgrade apply is only run on the first control plane node. Additional CP nodes use kubeadm upgrade node.

3.5 Step 5 — Upgrade kubelet and kubectl on the Control Plane Node

# Drain the control plane node (optional but recommended)
kubectl drain <cp-node> --ignore-daemonsets

# Upgrade kubelet and kubectl
apt-mark unhold kubelet kubectl
apt-get update && apt-get install -y kubelet=1.30.0-1.1 kubectl=1.30.0-1.1
apt-mark hold kubelet kubectl

# Restart kubelet
systemctl daemon-reload
systemctl restart kubelet

# Uncordon the node
kubectl uncordon <cp-node>

3.6 Step 6 — Upgrade Additional Control Plane Nodes (HA Only)

On each additional control plane node:

# Upgrade kubeadm
apt-mark unhold kubeadm
apt-get update && apt-get install -y kubeadm=1.30.0-1.1
apt-mark hold kubeadm

# Upgrade node (NOT "upgrade apply" — that's only for the first CP node)
kubeadm upgrade node

# Upgrade kubelet and kubectl
apt-mark unhold kubelet kubectl
apt-get update && apt-get install -y kubelet=1.30.0-1.1 kubectl=1.30.0-1.1
apt-mark hold kubelet kubectl

# Restart kubelet
systemctl daemon-reload
systemctl restart kubelet

CKA Tip: kubeadm upgrade apply = first control plane node. kubeadm upgrade node = all other nodes (additional CP nodes + workers).


4. Upgrading Worker Nodes

Worker nodes are upgraded one at a time (or in batches) to maintain workload availability.

4.1 Step-by-Step for Each Worker

From the control plane — drain the worker:

kubectl drain worker-1 --ignore-daemonsets --delete-emptydir-data

SSH into the worker node:

# Upgrade kubeadm
apt-mark unhold kubeadm
apt-get update && apt-get install -y kubeadm=1.30.0-1.1
apt-mark hold kubeadm

# Upgrade node configuration
kubeadm upgrade node

# Upgrade kubelet and kubectl
apt-mark unhold kubelet kubectl
apt-get update && apt-get install -y kubelet=1.30.0-1.1 kubectl=1.30.0-1.1
apt-mark hold kubelet kubectl

# Restart kubelet
systemctl daemon-reload
systemctl restart kubelet

Back on the control plane — uncordon the worker:

kubectl uncordon worker-1

4.2 Verify the Upgrade

1
2
3
4
5
kubectl get nodes
# NAME           STATUS   ROLES           AGE   VERSION
# controlplane   Ready    control-plane   30d   v1.30.0
# worker-1       Ready    <none>          30d   v1.30.0
# worker-2       Ready    <none>          30d   v1.30.0

5. Draining and Uncordoning Nodes

5.1 kubectl drain

Drain safely evicts all pods from a node before maintenance.

kubectl drain <node-name> [flags]

What drain does: 1. Cordons the node (marks it as unschedulable) 2. Evicts all pods (respecting PodDisruptionBudgets) 3. Pods managed by controllers (Deployments, ReplicaSets, etc.) are rescheduled on other nodes 4. Standalone pods (not managed by a controller) are deleted and lost

Before drain:
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   worker-1   │  │   worker-2   │  │   worker-3   │
│              │  │              │  │              │
│  pod-a       │  │  pod-c       │  │  pod-e       │
│  pod-b       │  │  pod-d       │  │  pod-f       │
│  (daemonset) │  │  (daemonset) │  │  (daemonset) │
└──────────────┘  └──────────────┘  └──────────────┘

After: kubectl drain worker-1 --ignore-daemonsets
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   worker-1   │  │   worker-2   │  │   worker-3   │
│  CORDONED    │  │              │  │              │
│  SchedulingDisabled│  │  pod-c │  │  pod-e       │
│              │  │  pod-d       │  │  pod-f       │
│  (daemonset) │  │  pod-a ←new  │  │  pod-b ←new  │
│              │  │  (daemonset) │  │  (daemonset) │
└──────────────┘  └──────────────┘  └──────────────┘

5.2 Common drain Flags

Flag Purpose
--ignore-daemonsets Skip DaemonSet-managed pods (they can't be rescheduled elsewhere)
--delete-emptydir-data Allow eviction of pods using emptyDir volumes (data will be lost)
--force Force eviction of standalone pods (not managed by a controller)
--grace-period=<seconds> Override the pod's terminationGracePeriodSeconds
--timeout=<duration> Abort if drain doesn't complete within this time
--pod-selector=<selector> Only evict pods matching this label selector
--disable-eviction Use delete instead of eviction API (bypasses PDBs)

5.3 When drain Fails

# Error: cannot delete pods with local storage
kubectl drain worker-1 --ignore-daemonsets
# error: cannot delete Pods with local storage (use --delete-emptydir-data to override)

# Fix:
kubectl drain worker-1 --ignore-daemonsets --delete-emptydir-data

# Error: cannot delete pods not managed by a controller
kubectl drain worker-1 --ignore-daemonsets
# error: cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet

# Fix:
kubectl drain worker-1 --ignore-daemonsets --force

# Error: PodDisruptionBudget violation
# The drain will wait (or timeout) until the PDB allows eviction
kubectl drain worker-1 --ignore-daemonsets --timeout=120s

5.4 kubectl cordon / uncordon

Cordon and uncordon control whether a node accepts new pods:

# Cordon — mark node as unschedulable (existing pods keep running)
kubectl cordon worker-1

# Check node status
kubectl get nodes
# NAME       STATUS                     ROLES    AGE   VERSION
# worker-1   Ready,SchedulingDisabled   <none>   30d   v1.30.0

# Uncordon — mark node as schedulable again
kubectl uncordon worker-1

# Check node status
kubectl get nodes
# NAME       STATUS   ROLES    AGE   VERSION
# worker-1   Ready    <none>   30d   v1.30.0
Action Effect on existing pods Effect on new pods
cordon No effect — pods keep running No new pods scheduled
drain Evicts all pods (except DaemonSets with flag) No new pods scheduled
uncordon No effect Node accepts new pods again

CKA Tip: drain automatically cordons the node first. You don't need to cordon before draining. But you must uncordon after the upgrade is complete.

5.5 PodDisruptionBudgets (PDBs)

PDBs protect workload availability during voluntary disruptions (like drain):

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2        # At least 2 pods must remain running
  # OR
  # maxUnavailable: 1    # At most 1 pod can be down at a time
  selector:
    matchLabels:
      app: my-app
1
2
3
4
5
6
7
8
# Check PDBs
kubectl get pdb

# Describe a PDB to see current status
kubectl describe pdb my-app-pdb
# Allowed disruptions: 1
# Current healthy: 3
# Desired healthy: 2

When draining, if evicting a pod would violate a PDB, the drain blocks until the PDB allows it (or the timeout is reached).


6. Complete Upgrade Walkthrough — v1.29 → v1.30

Here's the full procedure condensed into a reference:

Control Plane Node

# 1. Upgrade kubeadm
apt-mark unhold kubeadm
apt-get update && apt-get install -y kubeadm=1.30.0-1.1
apt-mark hold kubeadm

# 2. Verify upgrade plan
kubeadm upgrade plan

# 3. Apply upgrade
kubeadm upgrade apply v1.30.0

# 4. Drain the node
kubectl drain <cp-node> --ignore-daemonsets

# 5. Upgrade kubelet and kubectl
apt-mark unhold kubelet kubectl
apt-get update && apt-get install -y kubelet=1.30.0-1.1 kubectl=1.30.0-1.1
apt-mark hold kubelet kubectl

# 6. Restart kubelet
systemctl daemon-reload
systemctl restart kubelet

# 7. Uncordon
kubectl uncordon <cp-node>

Each Worker Node

# 1. Drain from control plane
kubectl drain <worker-node> --ignore-daemonsets --delete-emptydir-data

# 2. SSH to worker — upgrade kubeadm
apt-mark unhold kubeadm
apt-get update && apt-get install -y kubeadm=1.30.0-1.1
apt-mark hold kubeadm

# 3. Upgrade node config
kubeadm upgrade node

# 4. Upgrade kubelet and kubectl
apt-mark unhold kubelet kubectl
apt-get update && apt-get install -y kubelet=1.30.0-1.1 kubectl=1.30.0-1.1
apt-mark hold kubelet kubectl

# 5. Restart kubelet
systemctl daemon-reload
systemctl restart kubelet

# 6. Uncordon from control plane
kubectl uncordon <worker-node>

7. Troubleshooting Upgrades

Problem Cause Fix
kubeadm upgrade plan shows no available upgrades kubeadm not upgraded yet, or repo not updated Upgrade kubeadm first, run apt-get update
Static pod not restarting after upgrade kubelet not watching manifests directory Check systemctl status kubelet, check staticPodPath in kubelet config
Node stays NotReady after upgrade kubelet not restarted systemctl daemon-reload && systemctl restart kubelet
Pods stuck in Terminating during drain Finalizers or stuck containers kubectl delete pod <pod> --grace-period=0 --force
Drain blocked by PDB Not enough healthy replicas Scale up the deployment first, or use --disable-eviction (last resort)
Version mismatch errors Skipped a minor version You must upgrade one minor version at a time
etcd health check fails after upgrade etcd not upgraded or cert mismatch Check etcd pod logs: crictl logs <etcd-container-id>
1
2
3
4
5
6
7
# Useful debugging commands during upgrade
kubectl get nodes -o wide
kubectl get pods -n kube-system
systemctl status kubelet
journalctl -u kubelet -f --no-pager | tail -50
crictl ps -a
kubeadm upgrade plan

8. Practice Exercises

Exercise 1 — Upgrade a Cluster

# Starting state: cluster running v1.29.0
# Target: upgrade to v1.30.0

# 1. Check current versions
kubectl get nodes

# 2. Upgrade the control plane (follow Section 6)
# 3. Upgrade worker-1
# 4. Upgrade worker-2
# 5. Verify all nodes show v1.30.0
kubectl get nodes

Exercise 2 — Drain and Uncordon

# 1. Create a deployment with 3 replicas
kubectl create deployment drain-test --image=nginx --replicas=3

# 2. Check which nodes the pods are running on
kubectl get pods -o wide

# 3. Drain one of the worker nodes
kubectl drain <worker-with-pods> --ignore-daemonsets

# 4. Verify pods were rescheduled to other nodes
kubectl get pods -o wide

# 5. Uncordon the node
kubectl uncordon <worker>

# 6. Clean up
kubectl delete deployment drain-test

Exercise 3 — PDB Interaction with Drain

# 1. Create a deployment with 3 replicas
kubectl create deployment pdb-test --image=nginx --replicas=3

# 2. Create a PDB requiring at least 2 pods
cat <<EOF | kubectl apply -f -
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: pdb-test
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: pdb-test
EOF

# 3. Try draining a node that has 2 of the 3 pods
#    Observe: drain blocks because evicting both would violate the PDB
kubectl drain <node> --ignore-daemonsets --timeout=30s

# 4. Scale up to 4 replicas and retry
kubectl scale deployment pdb-test --replicas=4
kubectl drain <node> --ignore-daemonsets

# 5. Clean up
kubectl uncordon <node>
kubectl delete deployment pdb-test
kubectl delete pdb pdb-test

Exercise 4 — Token Expiry During Upgrade

1
2
3
4
5
6
7
8
# Scenario: You upgraded the control plane yesterday.
# Today you want to join a new worker, but the token has expired.

# 1. Try joining with the old token — observe the error
# 2. Generate a new join command
kubeadm token create --print-join-command

# 3. Join the worker with the new token

9. Key Takeaways for the CKA Exam

Point Detail
Upgrade one minor version at a time v1.29 → v1.30 ✅, v1.29 → v1.31 ❌
Control plane first, then workers Never upgrade workers before the control plane
upgrade apply vs upgrade node apply = first CP node only, node = all other nodes
Always upgrade kubeadm first Before running any upgrade command
systemctl daemon-reload && restart kubelet Required after upgrading kubelet binary
apt-mark hold Pin versions to prevent accidental upgrades
Drain before upgrading workers Ensures workloads are safely moved
--ignore-daemonsets is almost always needed DaemonSet pods can't be rescheduled
Uncordon after upgrade Nodes stay cordoned until you explicitly uncordon
kubelet can lag by 2 minor versions But don't rely on this — upgrade promptly
PDBs can block drain Know how to check and work around them

Previous: 02-cluster-installation-kubeadm.md — Cluster Installation with kubeadm

Next: 04-etcd-backup-restore.md — etcd Backup & Restore