Kubernetes Pod Lifecycle Guide — From Pending to Running
In this tutorial, you'll learn about Kubernetes Pod Lifecycle Guide. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Kubernetes pod lifecycle defines the sequence of states a pod passes through from creation to termination, including initialization, readiness checks, and graceful shutdown.
What You'll Learn
You'll master the pod lifecycle — phase transitions (Pending, Running, Succeeded, Failed, Unknown), container state machines, init containers, probe types (liveness, readiness, startup), and the termination grace period.
Why This Problem Matters
Understanding pod lifecycle is essential for debugging failed deployments, designing graceful shutdowns, and configuring rolling updates. A pod that appears "Running" may be stuck in crash loop backoff, and without understanding lifecycle states, you waste hours troubleshooting.
Real-World Use
Doda Browser's Kubernetes deployment uses init containers to download malware signature databases before the main application starts. Readiness probes ensure traffic is only routed to pods that have fully loaded their signatures.
Pod Lifecycle Phases
flowchart TB
Create[User creates Pod] --> Pending[Phase: Pending]
Pending -->|Scheduler assigns node| ContainerCreating[Container Creating]
ContainerCreating -->|Init containers complete| Running[Phase: Running]
Running -->|All containers exit 0| Succeeded[Phase: Succeeded]
Running -->|Any container exits non-zero| Failed[Phase: Failed]
Running -->|Node failure / timeout| Unknown[Phase: Unknown]
Pending -->|Scheduling failure| Failed
subgraph InitFlow
Init[Init Container 1] --> Init2[Init Container 2]
Init2 --> Main[Main Container Starts]
end
ContainerCreating --> InitFlow
Pod Phase Demonstration
import time
import kubernetes
from kubernetes import client, config
config.load_kube_config()
v1 = client.CoreV1Api()
def watch_pod_lifecycle(namespace: str, pod_name: str, timeout: int = 120):
start = time.time()
w = kubernetes.watch.Watch()
for event in w.stream(
v1.list_namespaced_pod,
namespace,
field_selector=f"metadata.name={pod_name}",
timeout_seconds=timeout
):
pod = event["object"]
phase = pod.status.phase
conditions = {
c.type: c.status for c in (pod.status.conditions or [])
}
container_statuses = pod.status.container_statuses or []
container_states = {}
for cs in container_statuses:
state = cs.state
if state.running:
container_states[cs.name] = "running"
elif state.waiting:
container_states[cs.name] = f"waiting({state.waiting.reason})"
elif state.terminated:
container_states[cs.name] = f"terminated({state.terminated.exit_code})"
elapsed = time.time() - start
print(f"[{elapsed:5.1f}s] Phase: {phase:>10} | "
f"Conditions: {conditions} | "
f"Containers: {container_states}")
if phase in ("Succeeded", "Failed"):
break
w.stop()
# Run a simple pod and watch its lifecycle
pod_manifest = {
"apiVersion": "v1",
"kind": "Pod",
"metadata": {"name": "lifecycle-demo"},
"spec": {
"restartPolicy": "Never",
"containers": [{
"name": "main",
"image": "alpine",
"command": ["sh", "-c", "echo 'Starting...'; sleep 3; echo 'Done!'"]
}]
}
}
v1.create_namespaced_pod("default", pod_manifest)
watch_pod_lifecycle("default", "lifecycle-demo")
v1.delete_namespaced_pod("default", "lifecycle-demo")
Expected output:
[ 0.5s] Phase: Pending | Conditions: {} | Containers: {}
[ 1.2s] Phase: Pending | Conditions: {'PodScheduled': 'True'} | Containers: {}
[ 2.8s] Phase: Running | Conditions: {'PodScheduled': 'True', 'Initialized': 'True', 'Ready': 'True'} | Containers: {'main': 'running'}
[ 6.0s] Phase: Succeeded | Conditions: {'PodScheduled': 'True', 'Initialized': 'True', 'Ready': 'False'} | Containers: {'main': 'terminated(0)'}
Init Containers
Init containers run sequentially before the main application starts:
# init-container-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: init-demo
spec:
initContainers:
- name: init-download
image: busybox:1.36
command:
- wget
- "-O"
- "/data/signatures.db"
- "https://cdn.dodatech.com/signatures/latest.db"
volumeMounts:
- name: data
mountPath: /data
- name: init-check
image: alpine:3.20
command:
- sh
- "-c"
- "test -s /data/signatures.db && echo 'Signatures verified'"
volumeMounts:
- name: data
mountPath: /data
containers:
- name: main
image: alpine:3.20
command: ["sh", "-c", "cat /data/signatures.db | head -5"]
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
emptyDir: {}
kubectl apply -f init-container-pod.yaml
kubectl get pod init-demo -w
Expected output:
NAME READY STATUS RESTARTS AGE
init-demo 0/1 Init:0/2 0 0s
init-demo 0/1 Init:0/2 0 3s
init-demo 0/1 Init:1/2 0 7s
init-demo 0/1 PodInitializing 0 10s
init-demo 1/1 Running 0 12s
init-demo 0/1 Completed 0 14s
Probes: Liveness, Readiness, Startup
apiVersion: v1
kind: Pod
metadata:
name: probe-demo
spec:
containers:
- name: app
image: python:3.12-slim
command:
- python
- "-c"
- |
import http.server, time
class H(http.server.BaseHTTPRequestHandler):
ready = False
def do_GET(self):
if self.path == "/healthz":
if H.ready:
self.send_response(200)
else:
self.send_response(503)
else:
self.send_response(200)
self.end_headers()
time.sleep(5) # Simulate slow startup
H.ready = True
http.server.HTTPServer(("0.0.0.0", 8080), H).serve_forever()
ports:
- containerPort: 8080
startupProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 0
periodSeconds: 2
failureThreshold: 30
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 5
initialDelaySeconds: 0
Container State Machine
class ContainerStateMachine:
def __init__(self):
self.state = "waiting"
self.reason = "ContainerCreating"
self.restart_count = 0
self.exit_code = None
def transition(self):
transitions = {
"waiting": self._waiting_transition,
"running": self._running_transition,
"terminated": self._terminated_transition,
}
return transitions.get(self.state, lambda: self.state)()
def _waiting_transition(self):
print(f"Waiting: {self.reason}")
if self.reason == "ContainerCreating":
self.state = "running"
self.reason = None
elif self.reason == "CrashLoopBackOff":
self.restart_count += 1
self.state = "waiting"
self.reason = f"CrashLoopBackOff (backoff {2**self.restart_count}s)"
def _running_transition(self):
print("Running")
import random
if random.random() < 0.2: # 20% chance of crash
self.state = "terminated"
self.exit_code = 1
self.reason = "Error"
def _terminated_transition(self):
print(f"Terminated: exit code {self.exit_code}")
if self.exit_code != 0 and self.restart_count < 3:
self.state = "waiting"
self.reason = "CrashLoopBackOff"
else:
print("Pod failed permanently")
fsm = ContainerStateMachine()
for i in range(10):
print(f"Step {i}: ", end="")
fsm.transition()
if fsm.state == "terminated" and fsm.restart_count >= 3:
break
Expected output:
Step 0: Waiting: ContainerCreating
Step 1: Running
Step 2: Running
Step 3: Terminated: exit code 1
Step 4: Waiting: CrashLoopBackOff
Step 5: Running
Step 6: Running
Step 7: Terminated: exit code 1
Step 8: Waiting: CrashLoopBackOff (backoff 4s)
Step 9: Running
Pod Termination Flow
flowchart LR
A[Delete Pod] --> B[PreStop Hook]
B --> C[SIGTERM to PID 1]
C --> D{Grace Period
Default 30s}
D -->|Within period| E[Process exits cleanly]
D -->|Timeout| F[SIGKILL to PID 1]
E --> G[Pod removed from Endpoints]
F --> G
G --> H[Pod resource freed]
apiVersion: v1
kind: Pod
metadata:
name: graceful-shutdown
spec:
terminationGracePeriodSeconds: 60
containers:
- name: app
image: python:3.12-slim
command:
- python
- "-c"
- |
import signal, time, sys
def handler(signum, frame):
print("Received SIGTERM, draining connections...")
time.sleep(3) # Complete in-flight requests
print("Shutdown complete")
sys.exit(0)
signal.signal(signal.SIGTERM, handler)
print("Server started")
while True:
time.sleep(1)
lifecycle:
preStop:
exec:
command: ["sh", "-c", "echo 'PreStop: removing from LB'; sleep 2"]
Common Mistakes
1. No Readiness Probe
Without readiness probes, the Service sends traffic to pods before they're ready. The first few requests fail. Always set a readiness probe that checks application-level readiness (not just port listening).
2. Liveness Probe That Depends on External Services
A liveness probe that checks a database causes a restart loop when the DB is temporarily slow. Use readiness for external dependencies, liveness for application-internal health.
3. Init Container Resource Limits
Init containers without resource limits can consume all node resources before the main app starts. Set CPU/memory limits on init containers too.
4. Setting terminationGracePeriodSeconds Too Short
If the application takes 10 seconds to drain connections but the grace period is 5 seconds, the Process gets SIGKILL with connections still open. Set grace period to at least 2x the expected drain time.
5. Ignoring CrashLoopBackOff
CrashLoopBackOff is a symptom, not a cause. Check container logs (kubectl logs <pod> --previous) to see the previous (crashed) container's output.
6. No PreStop Hook for Stateful Workloads
Databases and queue consumers need PreStop hooks to deregister from the cluster before shutting down. Without it, other nodes try to route traffic to a dying pod.
7. RestartPolicy Always for Batch Jobs
Use restartPolicy: OnFailure or Never for batch jobs. The default Always restarts a completed job container, causing it to run again unnecessarily.
Practice Questions
1. What is the difference between init containers and sidecar containers?
Init containers run to completion before the main container starts. Sidecar containers run alongside the main container for the pod's entire lifetime (e.g., log shippers, proxies). Init containers are sequential; sidecars are parallel.
2. Why does a pod stay in Pending state?
Pending means the pod hasn't been scheduled to a node. Common causes: insufficient resources (CPU/memory), node selector or taint mismatch, persistent volume claim not bound, or scheduler issues.
3. How does kubectl delete pod behave differently from kubectl delete pod --force?
Without --force, the pod gets a SIGTERM and has terminationGracePeriodSeconds to shut down gracefully. With --force, the pod is immediately removed from the API server (SIGKILL sent after the short grace period).
4. What happens to pods when a node fails?
After node-monitor-grace-period (default 40s), the node is marked Unknown. After pod-eviction-timeout (default 5m), pods on that node are evicted and rescheduled on healthy nodes.
5. Challenge: Design a zero-downtime deployment using pod lifecycle hooks.
A web service with 10 replicas. During rollout, new pods must be fully ready before old pods are terminated. Design the probe configuration, termination grace period, and PreStop hook needed for zero-downtime updates.
Mini Project: Pod Lifecycle Simulator
import time
import random
class Pod:
def __init__(self, name: str):
self.name = name
self.phase = "Pending"
self.ready = False
self.init_complete = False
self.startup_ok = False
self.restart_count = 0
self.deleted = False
def tick(self):
if self.deleted and self.phase != "Terminating":
self.phase = "Terminating"
return
if self.phase == "Pending" and self.init_complete:
self.phase = "Running"
if self.phase == "Running":
if random.random() < 0.05:
self.restart_count += 1
if self.restart_count > 3:
self.phase = "CrashLoopBackOff"
else:
self.phase = "Running"
def show(self):
status = f"{self.phase:>15}"
if self.phase == "Running":
status += f" (restarts: {self.restart_count})"
return f"Pod {self.name:>10}: {status} | Ready: {str(self.ready):>5}"
pods = [Pod(f"web-{i}") for i in range(3)]
for _ in range(20):
for p in pods:
p.tick()
for p in pods:
print(p.show())
print()
time.sleep(0.3)
Expected output:
Pod web-0: Pending | Ready: False
Pod web-1: Pending | Ready: False
Pod web-2: Pending | Ready: False
Pod web-0: Running | Ready: True
Pod web-1: Running | Ready: True
Pod web-2: Pending | Ready: False
Pod web-0: Running | Ready: True
Pod web-1: Running | Ready: True
Pod web-2: Running | Ready: True
FAQ
What's Next
Congratulations on completing this pod lifecycle guide! Here's where to go from here:
- Practice daily — Inspect pod states in your cluster
- Build a project — Create a deployment with init containers and probes
- Explore related topics — Container lifecycle hooks, pod disruption budgets, taints and tolerations
- Join the community — Share your lifecycle debugging stories and get feedback
Remember: every expert was once a beginner. Keep orchestrating!
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro