Module 3 — Cluster Upgrade with kubeadm
Overview
Upgrading a Kubernetes cluster is a high-frequency CKA exam task. You must know the exact sequence: upgrade kubeadm, upgrade the control plane, drain workers, upgrade workers, uncordon. This module covers the full procedure, the version skew policy that governs which versions can coexist, and the drain/uncordon mechanics.
1. Version Skew Policy
Before upgrading anything, you must understand which component versions can coexist.
1.1 Core Rules
The API server is the reference point — all other components are measured against it.
| Component |
Allowed versions relative to kube-apiserver |
| kube-apiserver |
Must be the same across all control plane nodes (in HA) |
| kube-controller-manager |
-1 minor version (can lag by 1) |
| kube-scheduler |
-1 minor version (can lag by 1) |
| kubelet |
-2 minor versions (can lag by 2) |
| kube-proxy |
-2 minor versions (can lag by 2) |
| kubectl |
±1 minor version (can be 1 ahead or 1 behind) |
1.2 Visual Example
Upgrading from v1.29 to v1.30:
| ALLOWED DURING UPGRADE
──────────────────────
kube-apiserver v1.29 → v1.30 (upgrade first)
controller-manager v1.29 or v1.30 (can lag by 1)
scheduler v1.29 or v1.30 (can lag by 1)
kubelet v1.28, v1.29, v1.30 (can lag by 2)
kube-proxy v1.28, v1.29, v1.30 (can lag by 2)
kubectl v1.29, v1.30, v1.31 (±1)
|
1.3 Upgrade Constraints
- You can only upgrade one minor version at a time (v1.29 → v1.30 ✅, v1.29 → v1.31 ❌)
- Patch version upgrades within the same minor are always allowed (v1.30.0 → v1.30.3 ✅)
- Always upgrade the control plane first, then the workers
CKA Tip: The exam will never ask you to skip a minor version. But knowing the skew policy helps you understand why the order matters.
2. Upgrade Strategy Overview
| ┌─────────────────────────────────────────────────────────────┐
│ UPGRADE SEQUENCE │
│ │
│ 1. Upgrade kubeadm on the first control plane node │
│ 2. kubeadm upgrade plan (check what's available) │
│ 3. kubeadm upgrade apply (upgrade control plane components)│
│ 4. Upgrade kubelet & kubectl on control plane node(s) │
│ 5. For each additional CP node: kubeadm upgrade node │
│ │
│ 6. For each worker node: │
│ a. kubectl drain <node> │
│ b. Upgrade kubeadm, kubelet, kubectl on the node │
│ c. kubeadm upgrade node │
│ d. Restart kubelet │
│ e. kubectl uncordon <node> │
└─────────────────────────────────────────────────────────────┘
|
3. Upgrading the Control Plane
3.1 Step 1 — Determine the Target Version
| # Check current cluster version
kubectl get nodes
# Check available kubeadm versions
apt-cache madison kubeadm
# or
apt list -a kubeadm
|
3.2 Step 2 — Upgrade kubeadm
| # Unhold kubeadm
apt-mark unhold kubeadm
# Upgrade kubeadm to the target version
apt-get update && apt-get install -y kubeadm=1.30.0-1.1
# Re-hold kubeadm
apt-mark hold kubeadm
# Verify
kubeadm version
|
3.3 Step 3 — Plan the Upgrade
This command:
- Shows the current cluster version
- Shows available upgrade targets
- Checks component health
- Lists which components will be upgraded
- Warns about any manual actions needed
Example output:
| Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT CURRENT TARGET
kubelet v1.29.0 v1.30.0
Upgrade to the latest version in the v1.30 series:
COMPONENT CURRENT TARGET
kube-apiserver v1.29.0 v1.30.0
kube-controller-manager v1.29.0 v1.30.0
kube-scheduler v1.29.0 v1.30.0
kube-proxy v1.29.0 v1.30.0
CoreDNS v1.11.1 v1.11.3
etcd 3.5.10 3.5.12
|
3.4 Step 4 — Apply the Upgrade (First Control Plane Node Only)
| kubeadm upgrade apply v1.30.0
|
This upgrades:
- kube-apiserver, kube-controller-manager, kube-scheduler (static pod manifests)
- kube-proxy (DaemonSet)
- CoreDNS (Deployment)
- etcd (static pod manifest, if using stacked etcd)
Important: kubeadm upgrade apply is only run on the first control plane node. Additional CP nodes use kubeadm upgrade node.
3.5 Step 5 — Upgrade kubelet and kubectl on the Control Plane Node
| # Drain the control plane node (optional but recommended)
kubectl drain <cp-node> --ignore-daemonsets
# Upgrade kubelet and kubectl
apt-mark unhold kubelet kubectl
apt-get update && apt-get install -y kubelet=1.30.0-1.1 kubectl=1.30.0-1.1
apt-mark hold kubelet kubectl
# Restart kubelet
systemctl daemon-reload
systemctl restart kubelet
# Uncordon the node
kubectl uncordon <cp-node>
|
3.6 Step 6 — Upgrade Additional Control Plane Nodes (HA Only)
On each additional control plane node:
| # Upgrade kubeadm
apt-mark unhold kubeadm
apt-get update && apt-get install -y kubeadm=1.30.0-1.1
apt-mark hold kubeadm
# Upgrade node (NOT "upgrade apply" — that's only for the first CP node)
kubeadm upgrade node
# Upgrade kubelet and kubectl
apt-mark unhold kubelet kubectl
apt-get update && apt-get install -y kubelet=1.30.0-1.1 kubectl=1.30.0-1.1
apt-mark hold kubelet kubectl
# Restart kubelet
systemctl daemon-reload
systemctl restart kubelet
|
CKA Tip: kubeadm upgrade apply = first control plane node. kubeadm upgrade node = all other nodes (additional CP nodes + workers).
4. Upgrading Worker Nodes
Worker nodes are upgraded one at a time (or in batches) to maintain workload availability.
4.1 Step-by-Step for Each Worker
From the control plane — drain the worker:
| kubectl drain worker-1 --ignore-daemonsets --delete-emptydir-data
|
SSH into the worker node:
| # Upgrade kubeadm
apt-mark unhold kubeadm
apt-get update && apt-get install -y kubeadm=1.30.0-1.1
apt-mark hold kubeadm
# Upgrade node configuration
kubeadm upgrade node
# Upgrade kubelet and kubectl
apt-mark unhold kubelet kubectl
apt-get update && apt-get install -y kubelet=1.30.0-1.1 kubectl=1.30.0-1.1
apt-mark hold kubelet kubectl
# Restart kubelet
systemctl daemon-reload
systemctl restart kubelet
|
Back on the control plane — uncordon the worker:
| kubectl uncordon worker-1
|
4.2 Verify the Upgrade
| kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# controlplane Ready control-plane 30d v1.30.0
# worker-1 Ready <none> 30d v1.30.0
# worker-2 Ready <none> 30d v1.30.0
|
5. Draining and Uncordoning Nodes
5.1 kubectl drain
Drain safely evicts all pods from a node before maintenance.
| kubectl drain <node-name> [flags]
|
What drain does:
1. Cordons the node (marks it as unschedulable)
2. Evicts all pods (respecting PodDisruptionBudgets)
3. Pods managed by controllers (Deployments, ReplicaSets, etc.) are rescheduled on other nodes
4. Standalone pods (not managed by a controller) are deleted and lost
| Before drain:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ worker-1 │ │ worker-2 │ │ worker-3 │
│ │ │ │ │ │
│ pod-a │ │ pod-c │ │ pod-e │
│ pod-b │ │ pod-d │ │ pod-f │
│ (daemonset) │ │ (daemonset) │ │ (daemonset) │
└──────────────┘ └──────────────┘ └──────────────┘
After: kubectl drain worker-1 --ignore-daemonsets
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ worker-1 │ │ worker-2 │ │ worker-3 │
│ CORDONED │ │ │ │ │
│ SchedulingDisabled│ │ pod-c │ │ pod-e │
│ │ │ pod-d │ │ pod-f │
│ (daemonset) │ │ pod-a ←new │ │ pod-b ←new │
│ │ │ (daemonset) │ │ (daemonset) │
└──────────────┘ └──────────────┘ └──────────────┘
|
5.2 Common drain Flags
| Flag |
Purpose |
--ignore-daemonsets |
Skip DaemonSet-managed pods (they can't be rescheduled elsewhere) |
--delete-emptydir-data |
Allow eviction of pods using emptyDir volumes (data will be lost) |
--force |
Force eviction of standalone pods (not managed by a controller) |
--grace-period=<seconds> |
Override the pod's terminationGracePeriodSeconds |
--timeout=<duration> |
Abort if drain doesn't complete within this time |
--pod-selector=<selector> |
Only evict pods matching this label selector |
--disable-eviction |
Use delete instead of eviction API (bypasses PDBs) |
5.3 When drain Fails
| # Error: cannot delete pods with local storage
kubectl drain worker-1 --ignore-daemonsets
# error: cannot delete Pods with local storage (use --delete-emptydir-data to override)
# Fix:
kubectl drain worker-1 --ignore-daemonsets --delete-emptydir-data
# Error: cannot delete pods not managed by a controller
kubectl drain worker-1 --ignore-daemonsets
# error: cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet
# Fix:
kubectl drain worker-1 --ignore-daemonsets --force
# Error: PodDisruptionBudget violation
# The drain will wait (or timeout) until the PDB allows eviction
kubectl drain worker-1 --ignore-daemonsets --timeout=120s
|
5.4 kubectl cordon / uncordon
Cordon and uncordon control whether a node accepts new pods:
| # Cordon — mark node as unschedulable (existing pods keep running)
kubectl cordon worker-1
# Check node status
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# worker-1 Ready,SchedulingDisabled <none> 30d v1.30.0
# Uncordon — mark node as schedulable again
kubectl uncordon worker-1
# Check node status
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# worker-1 Ready <none> 30d v1.30.0
|
| Action |
Effect on existing pods |
Effect on new pods |
cordon |
No effect — pods keep running |
No new pods scheduled |
drain |
Evicts all pods (except DaemonSets with flag) |
No new pods scheduled |
uncordon |
No effect |
Node accepts new pods again |
CKA Tip: drain automatically cordons the node first. You don't need to cordon before draining. But you must uncordon after the upgrade is complete.
5.5 PodDisruptionBudgets (PDBs)
PDBs protect workload availability during voluntary disruptions (like drain):
| apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
minAvailable: 2 # At least 2 pods must remain running
# OR
# maxUnavailable: 1 # At most 1 pod can be down at a time
selector:
matchLabels:
app: my-app
|
| # Check PDBs
kubectl get pdb
# Describe a PDB to see current status
kubectl describe pdb my-app-pdb
# Allowed disruptions: 1
# Current healthy: 3
# Desired healthy: 2
|
When draining, if evicting a pod would violate a PDB, the drain blocks until the PDB allows it (or the timeout is reached).
6. Complete Upgrade Walkthrough — v1.29 → v1.30
Here's the full procedure condensed into a reference:
Control Plane Node
| # 1. Upgrade kubeadm
apt-mark unhold kubeadm
apt-get update && apt-get install -y kubeadm=1.30.0-1.1
apt-mark hold kubeadm
# 2. Verify upgrade plan
kubeadm upgrade plan
# 3. Apply upgrade
kubeadm upgrade apply v1.30.0
# 4. Drain the node
kubectl drain <cp-node> --ignore-daemonsets
# 5. Upgrade kubelet and kubectl
apt-mark unhold kubelet kubectl
apt-get update && apt-get install -y kubelet=1.30.0-1.1 kubectl=1.30.0-1.1
apt-mark hold kubelet kubectl
# 6. Restart kubelet
systemctl daemon-reload
systemctl restart kubelet
# 7. Uncordon
kubectl uncordon <cp-node>
|
Each Worker Node
| # 1. Drain from control plane
kubectl drain <worker-node> --ignore-daemonsets --delete-emptydir-data
# 2. SSH to worker — upgrade kubeadm
apt-mark unhold kubeadm
apt-get update && apt-get install -y kubeadm=1.30.0-1.1
apt-mark hold kubeadm
# 3. Upgrade node config
kubeadm upgrade node
# 4. Upgrade kubelet and kubectl
apt-mark unhold kubelet kubectl
apt-get update && apt-get install -y kubelet=1.30.0-1.1 kubectl=1.30.0-1.1
apt-mark hold kubelet kubectl
# 5. Restart kubelet
systemctl daemon-reload
systemctl restart kubelet
# 6. Uncordon from control plane
kubectl uncordon <worker-node>
|
7. Troubleshooting Upgrades
| Problem |
Cause |
Fix |
kubeadm upgrade plan shows no available upgrades |
kubeadm not upgraded yet, or repo not updated |
Upgrade kubeadm first, run apt-get update |
| Static pod not restarting after upgrade |
kubelet not watching manifests directory |
Check systemctl status kubelet, check staticPodPath in kubelet config |
Node stays NotReady after upgrade |
kubelet not restarted |
systemctl daemon-reload && systemctl restart kubelet |
Pods stuck in Terminating during drain |
Finalizers or stuck containers |
kubectl delete pod <pod> --grace-period=0 --force |
| Drain blocked by PDB |
Not enough healthy replicas |
Scale up the deployment first, or use --disable-eviction (last resort) |
| Version mismatch errors |
Skipped a minor version |
You must upgrade one minor version at a time |
| etcd health check fails after upgrade |
etcd not upgraded or cert mismatch |
Check etcd pod logs: crictl logs <etcd-container-id> |
| # Useful debugging commands during upgrade
kubectl get nodes -o wide
kubectl get pods -n kube-system
systemctl status kubelet
journalctl -u kubelet -f --no-pager | tail -50
crictl ps -a
kubeadm upgrade plan
|
8. Practice Exercises
Exercise 1 — Upgrade a Cluster
| # Starting state: cluster running v1.29.0
# Target: upgrade to v1.30.0
# 1. Check current versions
kubectl get nodes
# 2. Upgrade the control plane (follow Section 6)
# 3. Upgrade worker-1
# 4. Upgrade worker-2
# 5. Verify all nodes show v1.30.0
kubectl get nodes
|
Exercise 2 — Drain and Uncordon
| # 1. Create a deployment with 3 replicas
kubectl create deployment drain-test --image=nginx --replicas=3
# 2. Check which nodes the pods are running on
kubectl get pods -o wide
# 3. Drain one of the worker nodes
kubectl drain <worker-with-pods> --ignore-daemonsets
# 4. Verify pods were rescheduled to other nodes
kubectl get pods -o wide
# 5. Uncordon the node
kubectl uncordon <worker>
# 6. Clean up
kubectl delete deployment drain-test
|
Exercise 3 — PDB Interaction with Drain
| # 1. Create a deployment with 3 replicas
kubectl create deployment pdb-test --image=nginx --replicas=3
# 2. Create a PDB requiring at least 2 pods
cat <<EOF | kubectl apply -f -
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: pdb-test
spec:
minAvailable: 2
selector:
matchLabels:
app: pdb-test
EOF
# 3. Try draining a node that has 2 of the 3 pods
# Observe: drain blocks because evicting both would violate the PDB
kubectl drain <node> --ignore-daemonsets --timeout=30s
# 4. Scale up to 4 replicas and retry
kubectl scale deployment pdb-test --replicas=4
kubectl drain <node> --ignore-daemonsets
# 5. Clean up
kubectl uncordon <node>
kubectl delete deployment pdb-test
kubectl delete pdb pdb-test
|
Exercise 4 — Token Expiry During Upgrade
| # Scenario: You upgraded the control plane yesterday.
# Today you want to join a new worker, but the token has expired.
# 1. Try joining with the old token — observe the error
# 2. Generate a new join command
kubeadm token create --print-join-command
# 3. Join the worker with the new token
|
9. Key Takeaways for the CKA Exam
| Point |
Detail |
| Upgrade one minor version at a time |
v1.29 → v1.30 ✅, v1.29 → v1.31 ❌ |
| Control plane first, then workers |
Never upgrade workers before the control plane |
upgrade apply vs upgrade node |
apply = first CP node only, node = all other nodes |
| Always upgrade kubeadm first |
Before running any upgrade command |
systemctl daemon-reload && restart kubelet |
Required after upgrading kubelet binary |
apt-mark hold |
Pin versions to prevent accidental upgrades |
| Drain before upgrading workers |
Ensures workloads are safely moved |
--ignore-daemonsets is almost always needed |
DaemonSet pods can't be rescheduled |
| Uncordon after upgrade |
Nodes stay cordoned until you explicitly uncordon |
| kubelet can lag by 2 minor versions |
But don't rely on this — upgrade promptly |
| PDBs can block drain |
Know how to check and work around them |
Previous: 02-cluster-installation-kubeadm.md — Cluster Installation with kubeadm
Next: 04-etcd-backup-restore.md — etcd Backup & Restore