Kubernetes pod status messages are often treated like the diagnosis, but they are usually only the symptom label. A pod in CrashLoopBackOff, ImagePullBackOff, or Pending tells you where the startup path is failing, not why. This guide turns those labels into a practical troubleshooting workflow: what each status actually means, how to inspect it with kubectl, the root causes that show up most often in real clusters, and the fixes that are worth trying first. Keep it nearby as a repeatable reference whenever a deployment stalls, a rollout fails, or a previously healthy workload starts behaving differently.
Overview
If you only remember one idea, remember this: Kubernetes pod status should be read as a sequence, not a single field. A pod moves through scheduling, image pull, container start, readiness, and ongoing health checks. The visible status reflects the stage where progress stopped.
That matters because the same application can fail in several different places:
- Pending usually means the pod has not been scheduled onto a node yet, or it is waiting for a prerequisite such as a volume.
- ImagePullBackOff means Kubernetes could not pull the container image and is backing off before retrying.
- CrashLoopBackOff means the container started, exited, and Kubernetes is delaying repeated restart attempts.
- ContainerCreating often means scheduling succeeded, but runtime setup such as networking, mounts, or image operations is still in progress.
- Running does not always mean healthy; the container can be running while the readiness probe still fails, keeping it out of service.
The fastest way to understand a stuck pod is to gather four views together instead of staring at the status column alone:
kubectl get pods -n <namespace>for the broad symptom.kubectl describe pod <pod> -n <namespace>for events, conditions, scheduling messages, and container state transitions.kubectl logs <pod> -n <namespace>and, if it restarted,kubectl logs <pod> --previous -n <namespace>for application output.kubectl get events -n <namespace> --sort-by=.lastTimestampfor recent cluster-level signals.
In practice, most pod issues fall into one of three buckets:
- Spec problems: wrong image, bad command, missing secret, invalid mount, unrealistic resource settings.
- Cluster problems: no capacity, storage binding issues, node pressure, registry connectivity, admission or policy blocks.
- Application problems: process exits on startup, bad config, dependency failures, probe misconfiguration.
Once you classify the failure into one of those buckets, troubleshooting becomes much faster and much less guess-heavy.
Core framework
Use this framework whenever you need a reliable kubectl pod status guide. It is intentionally simple: identify the stage, inspect the direct evidence, then test the likely causes in the correct order.
1. Start with status, but read the detailed state
kubectl get pods shows a convenient summary, but the useful details live under container state and events. In describe, look for:
- Pod conditions:
PodScheduled,Initialized,ContainersReady,Ready. - Container state:
Waiting,Running,Terminated. - Reason and message: these often name the immediate failure, such as failed image pull or failed mount.
- Events: scheduler decisions, kubelet errors, probe failures, volume attachment delays.
Think of the lifecycle in order:
Schedule -> Prepare image and volumes -> Start process -> Pass probes -> Stay healthy
The status tells you which step to inspect first.
2. Pending: why the pod is not getting to a node
Pending in Kubernetes usually means the pod has been accepted by the API server, but the cluster has not made it runnable yet.
Common causes:
- Insufficient CPU or memory: requests exceed available node capacity.
- Unschedulable constraints: node selectors, affinity rules, taints and tolerations, topology rules.
- Persistent volume issues: claims not bound, storage class mismatch, delayed provisioning.
- Admission or policy controls: constraints that mutate or block the workload indirectly.
- Init containers or setup dependencies: sometimes the pod appears stalled because prerequisites are not completing.
What to check first:
- Run
kubectl describe podand read the events section carefully. - Compare resource requests, not just limits, against available cluster capacity.
- Inspect node selectors, affinity, anti-affinity, and tolerations in the manifest.
- Verify PVCs with
kubectl get pvcandkubectl describe pvc.
One frequent root cause is oversized requests. A pod that asks for more memory than any node can offer will stay pending forever. If you need a deeper baseline on sizing workloads, see Kubernetes Resource Requests and Limits Best Practices by Workload Type.
3. ImagePullBackOff: why the image cannot be fetched
ImagePullBackOff and related messages such as ErrImagePull point to the image retrieval step. Kubernetes tried to pull the image, failed, and is waiting before another attempt.
Common causes:
- Wrong image name or tag: typos, deleted tags, or a tag that was never pushed.
- Registry authentication problems: missing or invalid image pull secrets.
- Registry reachability issues: DNS, egress, firewall, or proxy restrictions.
- Private registry policy differences: repository permissions vary across environments.
- Architecture mismatch: image not built for the node architecture.
What to check first:
- Copy the exact image reference from the pod spec and verify it exists in the registry.
- Confirm the namespace has access to the expected image pull secret.
- Read the event message in
kubectl describe pod; it often distinguishes authentication failure from not found. - Check whether the image was recently renamed or the CI pipeline changed its tagging logic.
If your team works across multiple registries or cloud platforms, standardizing image naming and access patterns removes many avoidable errors. For broader registry tradeoffs, see Container Registry Comparison: ECR vs GCR vs ACR vs Docker Hub.
4. CrashLoopBackOff: why the container starts and then dies
CrashLoopBackOff is one of the most misunderstood pod states. Kubernetes is not crashing; your container process is. The platform is trying to restart it and slowing down between attempts.
Common causes:
- Application exits immediately: bad command, missing entrypoint, startup error.
- Configuration problems: missing environment variables, invalid config files, broken secrets.
- Dependency failures: app cannot reach database, queue, API, or service it expects during boot.
- Probe misconfiguration: liveness probe kills a slow-starting process.
- Resource pressure: out-of-memory kill or startup spikes that exceed limits.
What to check first:
- Read current and previous logs. Previous logs are often the key evidence for a crashloop.
- Inspect the termination reason in
describe, especially exit codes and OOM kill signals. - Review the startup command, arguments, and mounted config.
- Temporarily separate application startup failure from probe failure by reviewing liveness and readiness settings.
If the termination reason suggests memory pressure, revisit both application behavior and container sizing. If the crash began after a deployment change, correlate with your build and release workflow using a checklist such as CI/CD Pipeline Troubleshooting Checklist for Failing Builds and Deployments.
5. Running but not Ready: the quiet failure state
A pod can be Running while still failing readiness checks. That means the container process exists, but Kubernetes is correctly keeping it out of service endpoints.
Common causes:
- Readiness path or port does not match the application.
- Startup time is longer than expected.
- Dependencies are not reachable yet.
- TLS, auth, or sidecar assumptions changed.
This state matters because it often looks less urgent than a crashloop while causing the same user-facing outage: no traffic reaches the application.
6. Build a diagnosis order that avoids wasted time
A practical order for kubernetes troubleshooting is:
- Describe the pod to identify the failing stage.
- Read events for the cluster's explanation.
- Check logs if the container started at least once.
- Inspect manifest assumptions: image, command, env, secret refs, probes, resources, volumes.
- Check external dependencies: registry, storage, DNS, service discovery, identity.
This order works because it follows the control flow of the workload instead of jumping to random fixes.
Practical examples
These examples show how the common statuses usually map to root causes and remediation.
Example 1: Pending after a new deployment
You deploy a service and the pod remains pending. In describe, the events mention insufficient memory and no nodes matching the required affinity rule.
Interpretation: the pod requests do not fit the available nodes, and the scheduling rules further narrow the set of eligible nodes.
Likely fix:
- Reduce requests if they were set conservatively high.
- Relax affinity if it is stricter than necessary.
- Add capacity or place the workload in a node pool that matches its needs.
This is a good example of why pending is not always “the cluster is full.” Sometimes the cluster has capacity, just not capacity that matches the pod's constraints.
Example 2: ImagePullBackOff after a pipeline change
A rollout succeeds in staging but fails in production with ImagePullBackOff. The event message says the image tag was not found.
Interpretation: your deployment refers to a tag that does not exist in the production-accessible registry, or the tagging convention changed between environments.
Likely fix:
- Verify the image tag was pushed.
- Check if the deployment still points to the old repository path.
- Confirm registry replication or promotion steps actually completed.
This is especially common when organizations mix CI tagging styles, retention policies, and multiple registries.
Example 3: CrashLoopBackOff caused by probe timing
An application takes a long time to warm caches on startup. The liveness probe begins too early, fails repeatedly, and kubelet restarts the container before it can finish initialization.
Interpretation: the app might be fine, but the health policy is not aligned with its startup profile.
Likely fix:
- Use a startup probe where appropriate.
- Increase initial delay or adjust thresholds carefully.
- Validate that the liveness endpoint reflects true deadlock or failure, not temporary warm-up.
This is one of the easiest ways to create a crashloop without any bug in business logic.
Example 4: Running but failing readiness because of dependencies
The pod is running, but no traffic reaches it. Logs show the app is waiting for a database migration to complete or for an upstream API to become reachable.
Interpretation: the pod process is alive, but the service is not yet safe to serve requests.
Likely fix:
- Make readiness reflect true serving ability.
- Reduce hard startup coupling where possible.
- Move one-time initialization into a job or init container if that suits the workload.
Good readiness behavior prevents bad traffic from reaching half-initialized applications.
Example 5: Pod failures tied to identity and access
A pod starts but exits because it cannot access a secret store, cloud API, or registry. The manifest is correct, but the runtime identity differs from what the app expects.
Interpretation: the issue may be service account mapping, workload identity configuration, or missing permissions rather than the container itself.
Likely fix:
- Check which Kubernetes service account the pod actually uses.
- Validate IAM or workload identity bindings.
- Review recent changes to policy or token projection.
For teams adopting stronger identity separation, related concepts are covered in Workload Identity for AI Agents: Separating Who Runs from What They Can Do.
Common mistakes
Most slow troubleshooting sessions come from a small set of avoidable habits.
Treating the status label as the cause
CrashLoopBackOff is not the root cause. It is the restart behavior after the real failure. Always look for the first meaningful error: exit code, missing config, failed dependency, probe event, or OOM kill.
Ignoring events and reading only logs
Logs help when the process starts. They are much less helpful for scheduling failures, mount errors, and image pull problems. Events often contain the decisive clue.
Checking limits but not requests
Scheduling decisions are based primarily on requests. Teams sometimes focus on limits and miss the fact that oversized requests are what keep pods pending.
Assuming probe failures mean the app is broken
Sometimes the app is broken. Sometimes the probe path, port, timing, or semantics are wrong. Review probes as part of the workload design, not just as a health add-on.
Overlooking previous logs in crash loops
When a container restarts quickly, current logs may show almost nothing. kubectl logs --previous is often the missing step in a useful crashloopbackoff fix workflow.
Forgetting environmental drift
If something works in one cluster but fails in another, compare registry access, storage classes, admission policies, network egress rules, and Kubernetes version behavior. Version and control plane differences matter more than many teams expect. For planning around compatibility, see Kubernetes Version Skew Policy and Upgrade Planning Guide.
When to revisit
This guide is worth revisiting whenever your troubleshooting inputs change, not just when a pod is already broken. In Kubernetes, many failures are introduced indirectly by platform and workflow evolution.
Review your pod-status playbook when:
- You change CI/CD image tagging or promotion rules. That can create new
ImagePullBackOffpatterns. - You adjust requests, limits, autoscaling, or node pools. That changes
Pendingbehavior and placement outcomes. - You introduce new probes, sidecars, or service mesh behavior. That can change startup order and readiness timing.
- You upgrade Kubernetes or alter admission policies. Scheduling, defaults, and validations may behave differently.
- You migrate registries, identity models, or storage classes. Pod startup dependencies often fail at those boundaries first.
A practical team habit is to maintain a short internal runbook with three sections:
- Command set: the exact
kubectlcommands everyone should run first. - Environment-specific checks: registry auth, storage defaults, workload identity, network policy, ingress assumptions.
- Known failure patterns: the recurring pod issues your platform actually sees.
If you want to make this article actionable today, use the checklist below the next time a pod looks unhealthy:
- Run
kubectl get pods -n <namespace>and identify whether the failure is scheduling, image pull, startup, or readiness. - Run
kubectl describe pod <pod> -n <namespace>and read conditions, state, and events from top to bottom. - If the container started, pull both current and previous logs.
- Compare the manifest against recent deployment changes: image, command, env, secrets, probes, requests, volumes.
- Check the platform dependencies most relevant to that stage: node capacity, registry access, storage binding, identity, or network reachability.
- Document the exact symptom and cause for the next incident, not just the fix.
Kubernetes pod status meanings become much less mysterious once you stop reading them as isolated labels and start reading them as a lifecycle checkpoint. That shift turns Pending, ImagePullBackOff, and CrashLoopBackOff from frustrating messages into fast routing signals for the right next step.