Module 4 — etcd Backup & Restore
Overview
etcd backup and restore is one of the most commonly tested tasks on the CKA exam. You must be able to take a snapshot, restore it, and understand the etcd data model. This module covers etcd internals, the exact backup/restore commands, and the certificate flags you'll need.
1. etcd Architecture and Data Model
1.1 What etcd Stores
etcd is the single source of truth for the entire cluster. Every object you create with kubectl ends up here.
| etcd key-value store
│
├── /registry/pods/default/nginx-pod
├── /registry/pods/kube-system/coredns-xxxxx
├── /registry/services/specs/default/kubernetes
├── /registry/deployments/default/my-app
├── /registry/secrets/default/my-secret
├── /registry/configmaps/default/my-config
├── /registry/nodes/worker-1
├── /registry/namespaces/default
├── /registry/clusterroles/cluster-admin
└── ... every Kubernetes object
|
All keys are prefixed with /registry/ followed by the resource type and namespace/name.
1.2 How Data Flows
| kubectl create deployment nginx --image=nginx
│
▼
┌──────────────┐ write ┌─────────┐
│ kube-apiserver│ ──────────────▶│ etcd │
└──────┬───────┘ └────┬────┘
│ │
│ ◀── watch ─────────────────┘
│
▼
API server notifies watchers
(scheduler, controller-manager, kubelet)
|
Only the API server reads from and writes to etcd. No other component has direct access.
1.3 Querying etcd Directly
You can inspect etcd data using etcdctl (useful for debugging, not for production use):
| # Set API version (always v3 for modern Kubernetes)
export ETCDCTL_API=3
# List all keys
etcdctl get / --prefix --keys-only \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# Get a specific key (e.g., the default namespace)
etcdctl get /registry/namespaces/default \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# Count all keys
etcdctl get / --prefix --keys-only \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key | wc -l
|
Note: etcd stores data in protobuf format, so values are not human-readable. The keys themselves are readable.
1.4 etcd Cluster Health
| # Check endpoint health
etcdctl endpoint health \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# Check endpoint status (shows leader, DB size, raft index)
etcdctl endpoint status --write-out=table \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# List cluster members
etcdctl member list --write-out=table \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
|
1.5 Finding etcd Configuration
On the exam, you need to find the certificate paths and endpoint. They're in the etcd static pod manifest:
| # View the etcd manifest
cat /etc/kubernetes/manifests/etcd.yaml
|
Key fields to extract:
| spec:
containers:
- command:
- etcd
- --listen-client-urls=https://127.0.0.1:2379,https://192.168.1.10:2379 # ← endpoint
- --cert-file=/etc/kubernetes/pki/etcd/server.crt # ← --cert
- --key-file=/etc/kubernetes/pki/etcd/server.key # ← --key
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt # ← --cacert
- --data-dir=/var/lib/etcd # ← data directory
|
CKA Tip: On the exam, always check the etcd manifest first to get the correct certificate paths and endpoint. Don't assume defaults.
2. Backing Up etcd
2.1 The Snapshot Command
| ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-backup.db \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
|
That's it. This is the command you must memorize.
2.2 Required Flags Breakdown
| Flag |
Value |
Where to find it |
--cacert |
/etc/kubernetes/pki/etcd/ca.crt |
etcd manifest: --trusted-ca-file |
--cert |
/etc/kubernetes/pki/etcd/server.crt |
etcd manifest: --cert-file |
--key |
/etc/kubernetes/pki/etcd/server.key |
etcd manifest: --key-file |
--endpoints |
https://127.0.0.1:2379 |
etcd manifest: --listen-client-urls (defaults to localhost if omitted) |
CKA Tip: If you're running etcdctl on the control plane node where etcd is running, you can omit --endpoints — it defaults to https://127.0.0.1:2379. Include it explicitly if the exam question specifies a different endpoint.
2.3 Verify the Backup
| ETCDCTL_API=3 etcdctl snapshot status /opt/etcd-backup.db --write-out=table
|
Output:
| +----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 3e9d3c18 | 412039 | 1250 | 5.1 MB |
+----------+----------+------------+------------+
|
2.4 Using etcdctl from Inside the etcd Pod
If etcdctl is not installed on the host, you can exec into the etcd pod:
| kubectl exec -it etcd-controlplane -n kube-system -- sh -c \
"ETCDCTL_API=3 etcdctl snapshot save /var/lib/etcd/backup.db \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key"
# Copy the backup out of the pod (etcd data dir is host-mounted)
cp /var/lib/etcd/backup.db /opt/etcd-backup.db
|
2.5 Quick Reference — Certificate Mapping
| etcd manifest flag → etcdctl flag
───────────────────────── ────────────
--trusted-ca-file → --cacert
--cert-file → --cert
--key-file → --key
--listen-client-urls → --endpoints
|
3. Restoring etcd from a Snapshot
Restoring is more involved than backing up. The key concept: restore creates a new data directory, then you point etcd to it.
3.1 The Restore Command
| ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd-backup.db \
--data-dir=/var/lib/etcd-restored
|
This creates a new data directory at /var/lib/etcd-restored with the restored data.
Important: The restore command does NOT need the --cacert, --cert, --key flags. It operates on the local snapshot file, not on a running etcd instance.
3.2 Point etcd to the Restored Data
After restoring, update the etcd static pod manifest to use the new data directory:
| # Edit the etcd manifest
vi /etc/kubernetes/manifests/etcd.yaml
|
Change two things:
| # 1. Update the --data-dir flag
spec:
containers:
- command:
- etcd
- --data-dir=/var/lib/etcd-restored # ← changed from /var/lib/etcd
# 2. Update the hostPath volume to match
volumes:
- hostPath:
path: /var/lib/etcd-restored # ← changed from /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
|
kubelet detects the manifest change and recreates the etcd pod with the restored data.
3.3 Full Restore Procedure — Step by Step
| ┌─────────────────────────────────────────────────────────────┐
│ RESTORE SEQUENCE │
│ │
│ 1. Restore snapshot to a new data directory │
│ etcdctl snapshot restore /opt/backup.db │
│ --data-dir=/var/lib/etcd-restored │
│ │
│ 2. Edit /etc/kubernetes/manifests/etcd.yaml │
│ - Change --data-dir to /var/lib/etcd-restored │
│ - Change volumes.hostPath.path to /var/lib/etcd-restored│
│ │
│ 3. Wait for etcd pod to restart automatically │
│ (kubelet watches the manifests directory) │
│ │
│ 4. Verify: kubectl get pods -n kube-system │
│ Verify: kubectl get nodes │
└─────────────────────────────────────────────────────────────┘
|
3.4 Verifying the Restore
| # Wait for etcd to come back (may take 30-60 seconds)
# If kubectl doesn't work yet, check with crictl
crictl ps | grep etcd
# Once etcd is running, verify the API server reconnects
kubectl get nodes
kubectl get pods -A
# Verify etcd health
ETCDCTL_API=3 etcdctl endpoint health \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
|
3.5 Alternative: Restore to the Original Path
Instead of changing the manifest, you can restore directly to the original path:
| # Stop etcd by moving the manifest
mv /etc/kubernetes/manifests/etcd.yaml /tmp/etcd.yaml
# Remove the old data
rm -rf /var/lib/etcd
# Restore to the original path
ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd-backup.db \
--data-dir=/var/lib/etcd
# Move the manifest back
mv /tmp/etcd.yaml /etc/kubernetes/manifests/etcd.yaml
# Wait for etcd to start
crictl ps | grep etcd
|
CKA Tip: Both approaches work. The "new directory + edit manifest" approach is safer because you keep the old data as a fallback. The "restore to original path" approach requires fewer manifest edits.
4. etcd in HA Clusters
4.1 Stacked etcd Backup
In a stacked HA setup, you only need to back up from one etcd member — they all have the same data (Raft consensus).
| # Backup from any one control plane node
ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-backup.db \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--endpoints=https://127.0.0.1:2379
|
4.2 External etcd Backup
For external etcd, SSH to one of the etcd nodes and use the certificates from that node:
| # The cert paths may differ on external etcd nodes
ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-backup.db \
--cacert=/etc/etcd/ca.crt \
--cert=/etc/etcd/server.crt \
--key=/etc/etcd/server.key \
--endpoints=https://<etcd-node-ip>:2379
|
4.3 HA Restore Considerations
Restoring in an HA cluster is more complex:
- Stop etcd on all members
- Restore the snapshot on each member with unique
--name and --initial-cluster flags
- Start etcd on all members
| # Example for a 3-member cluster (on each node respectively)
ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd-backup.db \
--name=etcd-1 \
--data-dir=/var/lib/etcd-restored \
--initial-cluster=etcd-1=https://192.168.1.10:2380,etcd-2=https://192.168.1.11:2380,etcd-3=https://192.168.1.12:2380 \
--initial-advertise-peer-urls=https://192.168.1.10:2380
|
CKA Tip: The exam typically tests single-node etcd backup/restore. HA restore is good to know but unlikely to appear.
5. Common Pitfalls
| Pitfall |
What happens |
Fix |
Forgetting ETCDCTL_API=3 |
etcdctl defaults to API v2, commands fail |
Always set export ETCDCTL_API=3 or prefix commands |
| Wrong certificate paths |
connection refused or certificate signed by unknown authority |
Check /etc/kubernetes/manifests/etcd.yaml for correct paths |
Using --cert and --key on restore |
Unnecessary — restore reads a local file |
Only --data-dir is needed for restore |
| Not updating the volume hostPath |
etcd pod starts but reads old data |
Change both --data-dir AND volumes.hostPath.path |
| Not waiting for etcd to restart |
kubectl commands fail |
Wait 30-60s, check with crictl ps |
Restoring to /var/lib/etcd without stopping etcd |
Data corruption |
Move the manifest out first, or use a new directory |
| File permissions on restored directory |
etcd can't read the data |
chown -R etcd:etcd /var/lib/etcd-restored (if etcd runs as non-root) |
6. Exam-Ready Command Templates
Backup Template
| # 1. Find the cert paths
cat /etc/kubernetes/manifests/etcd.yaml | grep -E "cert-file|key-file|trusted-ca-file|listen-client"
# 2. Take the snapshot
ETCDCTL_API=3 etcdctl snapshot save <backup-path> \
--cacert=<trusted-ca-file value> \
--cert=<cert-file value> \
--key=<key-file value> \
--endpoints=<listen-client-urls value>
# 3. Verify
ETCDCTL_API=3 etcdctl snapshot status <backup-path> --write-out=table
|
Restore Template
| # 1. Restore to a new directory
ETCDCTL_API=3 etcdctl snapshot restore <backup-path> \
--data-dir=<new-data-dir>
# 2. Update etcd manifest
vi /etc/kubernetes/manifests/etcd.yaml
# --data-dir=<new-data-dir>
# volumes.hostPath.path=<new-data-dir>
# 3. Wait and verify
crictl ps | grep etcd
kubectl get nodes
|
7. Practice Exercises
Exercise 1 — Backup and Verify
| # 1. Extract certificate paths from the etcd manifest
grep -E "cert-file|key-file|trusted-ca-file" /etc/kubernetes/manifests/etcd.yaml
# 2. Take a snapshot to /opt/etcd-backup.db
ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-backup.db \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# 3. Verify the snapshot
ETCDCTL_API=3 etcdctl snapshot status /opt/etcd-backup.db --write-out=table
# 4. Check the file size
ls -lh /opt/etcd-backup.db
|
Exercise 2 — Backup, Break, Restore
| # 1. Create some test data
kubectl create namespace backup-test
kubectl create configmap test-data -n backup-test --from-literal=key=before-backup
# 2. Take a backup
ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-backup.db \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# 3. Create data AFTER the backup (this should be lost after restore)
kubectl create configmap post-backup -n backup-test --from-literal=key=after-backup
# 4. Verify both configmaps exist
kubectl get configmaps -n backup-test
# 5. Restore from backup
ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd-backup.db \
--data-dir=/var/lib/etcd-restored
# 6. Update etcd manifest to use /var/lib/etcd-restored
# (edit --data-dir and volumes.hostPath.path)
vi /etc/kubernetes/manifests/etcd.yaml
# 7. Wait for etcd to restart
sleep 30
crictl ps | grep etcd
# 8. Verify: "post-backup" configmap should be GONE
kubectl get configmaps -n backup-test
# Only "test-data" should exist — "post-backup" was created after the snapshot
# 9. Clean up
kubectl delete namespace backup-test
|
Exercise 3 — Inspect etcd Data
| # 1. List all keys related to namespaces
ETCDCTL_API=3 etcdctl get /registry/namespaces --prefix --keys-only \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# 2. Count total keys in etcd
ETCDCTL_API=3 etcdctl get / --prefix --keys-only \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key | wc -l
# 3. Check etcd cluster health
ETCDCTL_API=3 etcdctl endpoint health \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# 4. Check etcd member list
ETCDCTL_API=3 etcdctl member list --write-out=table \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
|
Exercise 4 — Restore to Original Path
| # 1. Take a backup
ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-backup.db \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# 2. Stop etcd by moving the manifest
mv /etc/kubernetes/manifests/etcd.yaml /tmp/etcd.yaml
# 3. Wait for etcd to stop
sleep 10
crictl ps | grep etcd # should show nothing
# 4. Remove old data and restore
rm -rf /var/lib/etcd
ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd-backup.db \
--data-dir=/var/lib/etcd
# 5. Bring etcd back
mv /tmp/etcd.yaml /etc/kubernetes/manifests/etcd.yaml
# 6. Verify
sleep 30
kubectl get nodes
|
8. Key Takeaways for the CKA Exam
| Point |
Detail |
Always set ETCDCTL_API=3 |
Modern Kubernetes uses etcd v3 API exclusively |
| Get cert paths from the manifest |
cat /etc/kubernetes/manifests/etcd.yaml — don't guess |
| Backup needs certs, restore does not |
snapshot save requires --cacert, --cert, --key; snapshot restore only needs --data-dir |
| Restore creates a new data directory |
Then update the etcd manifest to point to it |
Change both --data-dir and hostPath |
Forgetting the volume mount is the #1 restore mistake |
Verify with snapshot status |
Confirms the backup file is valid before you need it |
crictl ps when kubectl is down |
etcd restart may take 30-60s; use crictl to check |
| Only API server talks to etcd |
No other component has direct access |
| One backup is enough in HA |
All etcd members have the same data via Raft |
| Know the cert mapping |
--trusted-ca-file → --cacert, --cert-file → --cert, --key-file → --key |
Previous: 03-cluster-upgrade.md — Cluster Upgrade
Next: 05-rbac.md — RBAC