Kubernetes Interview Questions and Answers

Kubernetes interview questions and answers, docker and kubernetes interview questions, and advanced kubernetes interview questions for 2026: pods, services, HPA, RBAC, networking, and production debugging.

Published

Updated

Tech reviewed byDeepak Prasad

Kubernetes Interview Questions and Answers

Kubernetes interview questions, kubernetes interview questions and answers, docker and kubernetes interview questions, and advanced kubernetes interview questions show up in DevOps, platform engineering, SRE, cloud developer, and backend loops wherever workloads run in containers. Interviewers expect more than "Kubernetes orchestrates containers"—they probe Pod lifecycle, Service types, probe differences, HPA prerequisites, NetworkPolicy DNS gotchas, and how you debug Pending or CrashLoopBackOff pods under pressure.

Below are 45 questions with elaborate answers; technical sections include a strong answer sample you can say aloud. Pair with computer networks interview questions for TCP/IP, DNS, and Service/Ingress fundamentals, operating system interview questions for process and networking fundamentals, Kafka interview questions for event workloads on Kubernetes, Spring Boot interview questions for experienced developers when Java services deploy to clusters, Azure developer interviews and AWS interview questions for managed Kubernetes (AKS/EKS), Git interview questions for GitOps pipelines, and full stack developer interviews when teams own deployment end to end.

NOTE
Prep target: Deploy a sample app with Deployment + Service + Ingress, add liveness/readiness probes, configure HPA with resource requests, and explain one NetworkPolicy that allows DNS egress on port 53.

Tested on: Ubuntu 25.04 (Plucky Puffin); kernel 6.14.0-37-generic; kubectl v1.36.1 client; Docker 29.2.1.


Interview context and how to prepare

What do Kubernetes interviews actually test?

Kubernetes interviews test whether you can run containerized workloads reliably at scale—not recite every kubectl alias.

Layer What interviewers probe
Containers Images, registries, Docker vs runtime
Core objects Pod, Deployment, ReplicaSet, Service
Networking ClusterIP, DNS, Ingress, NetworkPolicy
Configuration ConfigMap, Secret, env vs volume mounts
Operations Probes, requests/limits, HPA, rollouts
Security RBAC, ServiceAccount, namespaces
Production Debugging, observability, GitOps, upgrades
Role Emphasis
Junior DevOps kubectl basics, Pod/Deployment YAML
Platform engineer Cluster add-ons, CNI, admission control
SRE Incident response, HPA failures, etcd backups
Cloud developer Deploy to AKS/EKS/GKE, managed services

Expect probe differences, Service types, Pending pod debugging, and HPA with metrics-server in most Kubernetes screens.

Docker and Kubernetes — how are they related in interviews?
Concept Docker (typical) Kubernetes
Unit of run Container on one host Pod (one or more containers)
Orchestration docker compose (single host) Multi-node scheduling, self-healing
Networking Bridge networks CNI plugins, Services, kube-proxy/Cilium
Scaling Manual docker scale (limited) ReplicaSet, HPA, cluster autoscaler
Config Env files, bind mounts ConfigMap, Secret, downward API
Runtime containerd via Docker Engine containerd/CRI-O (Docker optional)

Kubernetes does not require Docker—it needs a CRI-compatible runtime (containerd, CRI-O). Docker builds images; Kubernetes schedules and operates them.

Docker and kubernetes interview questions often start with "what problem does K8s solve that Docker alone does not?"—answer: desired state reconciliation across many nodes.

What is a typical Kubernetes interview loop?
Round Duration Focus
Screening 30 min Experience, cloud, on-call stories
Fundamentals 45 min Pods, Deployments, Services, YAML
Docker + K8s 30–45 min Images, multi-stage builds, compose vs K8s
Scenario / troubleshooting 45–60 min CrashLoopBackOff, Pending, HPA not scaling
Advanced (senior) 45 min NetworkPolicy, RBAC, etcd, upgrades
Architecture design 45 min Multi-tenant cluster, GitOps, DR

LinkedIn scenario-based guides and RisingStack-style lists emphasize spoken troubleshooting narratives—describe commands before typing them.

What is a realistic 4–6 week Kubernetes prep plan?
Week Focus Output
1 Docker images, Dockerfile, registries Push image to local/minikube registry
2 Pods, Deployments, Services, kubectl Deploy nginx with ClusterIP Service
3 ConfigMap, Secret, probes, resources App with health checks and requests
4 Ingress, HPA, rolling updates Scale on CPU with metrics-server
5 RBAC, NetworkPolicy, namespaces Restrict cross-namespace traffic
6 Mock interviews + CKA-style drills Timed troubleshooting scenarios

Use minikube, kind, or a cloud free tier—hands-on beats flashcards.

How do junior and advanced Kubernetes expectations differ?
Topic Junior / mid Advanced
Objects Deployment, Service StatefulSet, DaemonSet, CRDs
Networking ClusterIP vs NodePort CNI, NetworkPolicy, service mesh basics
Scaling replicas: 3 HPA behavior, VPA trade-offs, cluster autoscaler
Security Namespaces RBAC least privilege, Pod Security Standards
Ops kubectl get/describe/logs Control plane components, etcd backup
Delivery Manual apply Helm, Kustomize, Argo CD / Flux GitOps

Advanced kubernetes interview questions focus on failure modes: metrics-server down during traffic spike, NetworkPolicy blocking DNS, readiness passing while app is broken.


Docker and container foundations

Docker image vs container — what is the difference?

A Docker image is the packaged, immutable artifact. A container is a running instance of that image.

Image Container
Immutable template Running instance of an image
Built from a Dockerfile or build system Created from an image
Made of filesystem layers Adds a writable layer on top
Stored locally or in a registry Runs as one or more processes
Identified by tag or digest Has container ID, PID, network, mounts
Safe to share as release artifact Usually ephemeral

Images are built in layers. When a container starts, the runtime creates a unified filesystem view from those layers and adds a thin writable layer for runtime changes.

Example:

bash
docker build -t myapp:v1 .
docker run myapp:v1

In Kubernetes, you do not normally create containers manually. You declare an image in the Pod spec, and the kubelet asks the container runtime to pull and start it.

yaml
spec:
  containers:
  - name: api
    image: registry.example.com/myapp:v1

Important interview points:

  • Image is the artifact you build and push
  • Container is what actually runs
  • Container filesystem changes are not permanent unless stored in a volume
  • Same image can run many containers
  • Kubernetes schedules Pods, and containers run inside Pods
  • Container logs, writable layer, and process lifecycle are separate from the image

Common trap:

“A container is not a lightweight VM. It is an isolated process using kernel features like namespaces and cgroups.”

A strong answer is:

“An image is the immutable package made of layers. A container is a running process created from that image with its own writable layer, runtime isolation, environment, network, and mounts.”

Dockerfile basics interviewers expect you to explain?

A Dockerfile describes how to build a container image.

Common instructions:

Instruction Role
FROM Sets the base image
WORKDIR Sets the working directory
COPY Copies files into the image
ADD Copies files, with extra features like archive extraction
RUN Executes build-time commands
ENV Sets environment variables
ARG Build-time variable
EXPOSE Documents intended port; does not publish it by itself
USER Sets runtime user
CMD Default command/arguments
ENTRYPOINT Main executable for the container

Prefer COPY over ADD unless you specifically need ADD behavior.

A common production pattern is a multi-stage build. Build tools stay in the builder stage, and the final image contains only what is needed to run.

dockerfile
FROM golang:1.22 AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /app .

FROM gcr.io/distroless/static-debian12
COPY --from=build /app /app
USER nonroot:nonroot
ENTRYPOINT ["/app"]

Good Dockerfile practices:

  • Use multi-stage builds
  • Keep final image small
  • Run as non-root where possible
  • Do not copy secrets into the image
  • Use .dockerignore
  • Pin base image versions or digests for production
  • Put dependency download steps before copying full source to improve cache reuse
  • Prefer exec form for ENTRYPOINT/CMD
  • Avoid installing unnecessary packages
  • Scan images for vulnerabilities in CI

CMD vs ENTRYPOINT interview angle:

Instruction Meaning
ENTRYPOINT Main executable
CMD Default arguments or default command

Example:

dockerfile
ENTRYPOINT ["/app"]
CMD ["--config", "/etc/app/config.yaml"]

A strong answer is:

“A Dockerfile builds an image layer by layer. For production, I use multi-stage builds, a small runtime image, non-root user, .dockerignore, no embedded secrets, and pinned base images for reproducible Kubernetes deployments.”

Docker Compose vs Kubernetes — when use each?

Docker Compose and Kubernetes both run containerized applications, but they are used at different scales and for different workflows.

Docker Compose Kubernetes
Best for local development and simple stacks Best for production orchestration
Uses compose.yaml Uses API objects/manifests
Runs commonly on one Docker host Runs on one or more cluster nodes
Simple service/network/volume setup Pods, Deployments, Services, Ingress, RBAC
Easy developer onboarding Strong rollout, scaling, and policy controls
Limited cluster-level scheduling Scheduler places workloads on nodes
Good for local dependencies Good for highly available services

Compose example use:

  • App + database + Redis locally
  • Developer laptop setup
  • Integration test environment
  • Small internal demo stack

Kubernetes example use:

  • Production deployment
  • Rolling updates
  • Horizontal scaling
  • Self-healing through controllers
  • Service discovery through cluster DNS
  • Config/Secret management
  • RBAC and admission policies
  • Multi-node scheduling

Compose is not “bad.” It is excellent for local developer experience. But it does not replace Kubernetes for large production environments with scheduling, rollout strategy, cluster policies, and high availability requirements.

A common workflow:

text
Dockerfile
  → Docker Compose for local dev
  → CI builds image
  → Helm/Kustomize deploys to Kubernetes

Interview trap:

“Do not say Compose is only for beginners. Many professional teams use it for local development even when production is Kubernetes.”

A strong answer is:

“I use Compose for local multi-container development and simple test stacks. I use Kubernetes when I need production orchestration, self-healing, rolling updates, scaling, service discovery, and cluster-level policy.”

What container runtime does Kubernetes use?

Kubernetes does not talk directly to Docker Engine in modern clusters. The kubelet talks to a container runtime through the Container Runtime Interface (CRI).

Common CRI runtimes:

Runtime Notes
containerd Common default runtime in many Kubernetes distributions
CRI-O Lightweight CRI runtime, common in OpenShift
Docker Engine Still useful for building/running locally, but not a direct Kubernetes runtime through dockershim

Important history:

  • Older Kubernetes versions used dockershim to support Docker Engine as a runtime
  • Dockershim was removed in Kubernetes v1.24
  • Kubernetes still runs Docker-built images because images follow OCI/container image standards
  • Docker itself uses containerd internally, but Kubernetes talks to CRI-compatible runtimes

Node-level debugging:

Tool Talks to Use
kubectl Kubernetes API server Normal cluster debugging
crictl CRI runtime Node-level container/runtime debugging
ctr containerd Low-level containerd debugging
docker Docker Engine Local Docker workflows

Example node debugging:

bash
crictl ps
crictl images
crictl logs <container-id>

Interview trap:

“Kubernetes removed Docker” does not mean Docker images stopped working. It means Kubernetes removed the in-tree dockershim runtime integration.

A strong answer is:

“The kubelet uses CRI to talk to runtimes such as containerd or CRI-O. Docker is still widely used to build images, but modern Kubernetes clusters do not depend on Docker Engine through dockershim.”

OCI images and container registries?

OCI stands for Open Container Initiative. OCI specifications help image builders, registries, and runtimes interoperate.

Important terms:

Concept Meaning
Image Packaged filesystem, config, and metadata
Registry Server that stores and distributes images
Repository Named image path, such as team/api
Tag Human-friendly reference like v1.2.3
Digest Immutable content hash like sha256:...
Manifest Metadata describing image layers/config
Image pull secret Kubernetes Secret used for private registry credentials

Examples of registries:

  • Docker Hub
  • Amazon ECR
  • Azure Container Registry
  • Google Artifact Registry
  • GitHub Container Registry
  • Harbor
  • Quay

Tag vs digest is important in production.

Reference Behavior
myapp:latest Floating tag; can change
myapp:v1.2.3 Better, but tag can still be moved
myapp@sha256:... Immutable content reference
myapp:v1.2.3@sha256:... Human-readable tag plus exact digest

Kubernetes private registry example:

yaml
apiVersion: v1
kind: Pod
metadata:
  name: api
spec:
  imagePullSecrets:
  - name: regcred
  containers:
  - name: api
    image: registry.example.com/team/api:v1.2.3

Digest-pinned production example:

yaml
containers:
- name: api
  image: registry.example.com/team/api:v1.2.3@sha256:abc123...

Production interview points:

  • Avoid latest in production
  • Use private registry credentials through imagePullSecrets or service account configuration
  • Pin digests for exact rollbacks and reproducible deploys
  • Scan images for vulnerabilities
  • Sign images if supply-chain security is required
  • Keep SBOM/provenance where required
  • Use immutable tags or registry policies if supported

Common pull errors:

Error Likely cause
ImagePullBackOff Pull failed repeatedly
ErrImagePull Initial image pull failed
unauthorized Missing/wrong registry credentials
manifest unknown Image tag/digest does not exist
x509: certificate signed by unknown authority Registry CA trust issue

A strong answer is:

“OCI standards let different tools build, store, and run container images consistently. In production, I push images to a registry, avoid latest, use imagePullSecrets for private registries, and pin digests when I need exact reproducibility and rollback.”


Kubernetes architecture

Explain Kubernetes architecture at a high level.

A Kubernetes cluster has two main parts:

  • Control plane — decides what should happen
  • Worker nodes — run application workloads as Pods

Control plane components:

Component Role
kube-apiserver Front door to the cluster API; all clients and controllers talk to it
etcd Consistent key-value store for cluster state
kube-scheduler Assigns unscheduled Pods to suitable nodes
kube-controller-manager Runs built-in controllers such as Deployment, ReplicaSet, Node, Job
cloud-controller-manager Integrates with cloud APIs for load balancers, routes, volumes, nodes

Worker node components:

Component Role
kubelet Node agent that ensures Pods and containers run as requested
kube-proxy Implements Service networking rules, unless replaced by an eBPF dataplane
Container runtime Starts containers through CRI, such as containerd or CRI-O
CNI plugin Provides Pod networking

High-level flow when you create a Deployment:

  1. User runs kubectl apply
  2. Request goes to kube-apiserver
  3. API server authenticates, authorizes, admits, and stores desired state in etcd
  4. Deployment controller creates/updates ReplicaSets
  5. ReplicaSet controller creates Pods
  6. Scheduler assigns Pods to nodes
  7. Kubelet on each node asks the runtime to start containers
  8. Service/CNI networking makes Pods reachable

Kubernetes is declarative. You define desired state, and controllers continuously reconcile actual state toward that desired state.

Example:

bash
kubectl apply -f deployment.yaml

You are not directly starting containers. You are asking the Kubernetes API to store desired state. Controllers, scheduler, kubelet, CNI, and runtime do the rest.

Good troubleshooting mindset:

Symptom First layer to check
Object not accepted API server, validation, admission, RBAC
Pod stuck Pending Scheduler events, resources, taints, PVC
Pod assigned but not running Kubelet, runtime, image pull, volume mount
Service not reachable Service, Endpoints/EndpointSlice, DNS, CNI, kube-proxy/eBPF
Rollout stuck Deployment, ReplicaSet, Pod events, probes

A strong answer is:

“Kubernetes has a control plane that stores and reconciles desired state, and worker nodes that run Pods. The API server and etcd hold the source of truth, controllers create/update objects, the scheduler places Pods, and kubelets start containers through the runtime.”

What is etcd and why does it matter?

etcd is the strongly consistent key-value store used by Kubernetes to store cluster state.

It stores Kubernetes objects such as:

  • Pods
  • Deployments
  • Services
  • ConfigMaps
  • Secrets
  • Nodes
  • RBAC objects
  • Custom resources

Important properties:

Property Interview point
Source of truth Kubernetes desired/current API state is persisted in etcd
Consistency Uses Raft consensus; quorum matters
Backup etcd snapshots are critical for disaster recovery
Security Protect with TLS, RBAC, network restrictions, and encryption at rest
Performance Large clusters need healthy etcd latency and disk I/O
Availability Losing quorum blocks writes and many control plane operations

If etcd loses quorum:

  • Existing Pods may keep running
  • New scheduling may fail
  • Deployments/rollouts may not progress
  • API writes may fail
  • Controllers cannot persist new state

Important Secrets detail:

Kubernetes Secrets are stored in etcd. They are only base64-encoded by default, not automatically safe encryption. For stronger protection, enable encryption at rest and protect etcd backups.

Backup example concept:

bash
ETCDCTL_API=3 etcdctl snapshot save snapshot.db

In managed Kubernetes, the cloud provider usually manages etcd. In self-managed clusters, etcd backup, restore, quorum, TLS, disk performance, and monitoring are platform-critical responsibilities.

Good SRE-level metrics:

  • etcd leader changes
  • fsync latency
  • database size
  • request latency
  • quorum/member health
  • disk space
  • snapshot success

A strong answer is:

“etcd is Kubernetes’ source of truth. If etcd is unhealthy or loses quorum, the cluster may keep existing workloads running but cannot reliably schedule, update, or persist new state. That is why etcd backup, encryption, quorum, and monitoring are critical.”

Role of the Kubernetes API server?

kube-apiserver is the front door of Kubernetes.

Every major Kubernetes interaction goes through the API server:

  • kubectl
  • Controllers
  • Scheduler
  • Admission webhooks
  • Operators
  • CI/CD systems
  • GitOps tools

The API server is responsible for:

Stage Meaning
Authentication Who are you?
Authorization Are you allowed to do this?
Admission Should this request be modified or rejected?
Validation Is the object valid?
Persistence Store accepted state in etcd
Watch API Let controllers watch for changes

Typical request path:

text
kubectl / controller / operator
  → kube-apiserver
  → authentication
  → authorization
  → admission
  → validation
  → etcd

kubectl is only a client. It does not directly create containers or modify nodes. It sends requests to the API server.

Useful interview commands:

bash
kubectl auth can-i create pods -n dev
bash
kubectl create deployment demo-nginx \
  --image=nginx:1.27 \
  --dry-run=client \
  -o yaml
bash
kubectl apply --server-side -f deployment.yaml

Client dry-run is useful for generating manifests. Server-side dry-run is better when you want API server validation and admission behavior without persisting the object.

bash
kubectl apply -f deployment.yaml --dry-run=server

Common API server related failures:

Error Likely area
Unauthorized Authentication
Forbidden RBAC/authorization
Admission webhook denied Policy or webhook validation
Object validation error Invalid manifest field/value
Timeout calling webhook Webhook service/cert/network issue
API server unavailable Control plane or network problem

A strong answer is:

“The API server is the only supported gateway for cluster state changes. It authenticates, authorizes, admits, validates, and persists objects to etcd, while clients and controllers interact through its REST/watch API.”

How does the Kubernetes scheduler work?

The Kubernetes scheduler assigns Pods to nodes.

It watches for Pods where spec.nodeName is not set, then selects a suitable node based on resources, constraints, and scheduling policy.

Common scheduling factors:

Factor Examples
Resource requests CPU, memory, hugepages, ephemeral storage
Node capacity Node allocatable resources
Node labels nodeSelector, node affinity
Taints Pod must have matching tolerations
Pod affinity Place near certain Pods
Pod anti-affinity Avoid placing near certain Pods
Topology spread Spread across zones, nodes, racks
Volumes PVC binding, zone-specific storage
Runtime needs GPU, device plugins, local SSD
Scheduling plugins Custom scoring/filtering behavior

Simple scheduling idea:

text
Filter nodes that cannot run the Pod
→ score remaining nodes
→ bind Pod to selected node

A Pod remains Pending when no suitable node is found or a required dependency is not ready.

Common Pending causes:

Event clue Meaning
Insufficient cpu Requests do not fit available allocatable CPU
Insufficient memory Requests do not fit available allocatable memory
had taint ... that the pod didn't tolerate Missing toleration
didn't match Pod's node affinity/selector Node label constraint mismatch
pod has unbound immediate PersistentVolumeClaims PVC/storage not bound
volume node affinity conflict Volume is tied to a different zone/node
Too many pods Node pod density limit reached

Best first command:

bash
kubectl describe pod <pod-name> -n <namespace>

Then read the Events section.

Important distinction:

  • Scheduler chooses a node
  • Kubelet starts containers after the Pod is assigned
  • Image pull, volume mount, and container start errors are usually kubelet/runtime phase issues, not scheduler issues

Example:

bash
kubectl get pod -o wide

If NODE is empty, scheduling has not succeeded. If NODE is set but the Pod is not running, check kubelet/runtime/image/volume/probe events.

A strong answer is:

“The scheduler watches unscheduled Pods, filters nodes based on resources and constraints, scores suitable nodes, and binds the Pod. When a Pod is Pending, I check Events for resource, taint, affinity, topology, or PVC-related failures.”

What are Kubernetes controllers?

Kubernetes controllers are control loops. They watch current cluster state, compare it with desired state, and take action to move the system closer to the desired state.

Basic control loop idea:

text
observe current state
→ compare with desired state
→ create/update/delete objects
→ repeat

Common controllers:

Controller Manages
Deployment Rolling updates and ReplicaSets for stateless apps
ReplicaSet Desired number of matching Pods
StatefulSet Stateful Pods with stable identity and ordered rollout
DaemonSet One Pod per selected node
Job Run-to-completion batch work
CronJob Scheduled Jobs
Node controller Node health and lifecycle
EndpointSlice controller Service endpoint tracking
ServiceAccount controller ServiceAccount-related resources
Garbage collector Removes dependent objects based on ownerReferences

Deployment example:

  1. You update the Deployment image
  2. Deployment controller creates a new ReplicaSet
  3. New ReplicaSet scales up new Pods
  4. Old ReplicaSet scales down old Pods
  5. Rollout completes if readiness checks pass

Useful rollout commands:

bash
kubectl rollout status deployment/api -n prod
bash
kubectl rollout history deployment/api -n prod
bash
kubectl rollout undo deployment/api -n prod

Important ownership relationship:

text
Deployment
  → ReplicaSet
    → Pods

You normally edit the Deployment, not the ReplicaSet or Pods directly. If you delete a Pod managed by a ReplicaSet, the controller creates a replacement.

Different workload controllers solve different problems:

Workload Use when
Deployment Stateless web/API apps
StatefulSet Databases, brokers, ordered identity
DaemonSet Node agents, CNI, log collectors
Job One-time batch task
CronJob Scheduled batch task

Custom controllers/operators extend the same idea for application-specific resources. For example, an operator can watch a custom resource and reconcile database clusters, certificates, backups, or Helm releases.

A strong answer is:

“Controllers are reconciliation loops. They watch desired and actual state, then create, update, or delete resources to converge. For normal apps, I change the Deployment and let ReplicaSets and Pods be managed by controllers.”


Pods and workload controllers

What is a Pod and why is it the smallest deployable unit?

A Pod is the smallest deployable compute object in Kubernetes.

A Pod wraps one or more containers that are scheduled together on the same node and share some runtime resources.

Containers inside the same Pod share:

  • Network namespace — one Pod IP; containers communicate using localhost
  • Storage volumes — containers can mount the same volume
  • Scheduling — all containers in the Pod run on the same node
  • Lifecycle — the Pod is created, scheduled, and terminated as one unit

Most application Pods run one main container.

Multi-container Pods are used when containers must work closely together:

Pattern Example
Sidecar Service mesh proxy, log shipper
Adapter Convert app output into standard format
Ambassador Proxy outbound connections
Init container Run setup before app container starts

Example Pod:

yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.27
    ports:
    - containerPort: 80

Important interview point: bare Pods are rarely used for production apps because Pods are ephemeral. If a node fails or a Pod is deleted, you usually want a controller such as a Deployment, StatefulSet, DaemonSet, Job, or CronJob to create replacements.

Common Pod facts:

  • A Pod gets its own IP address
  • Containers in the same Pod share that IP
  • Containers in different Pods communicate through Pod IPs or Services
  • Pod IP can change when the Pod is recreated
  • Persistent data should use volumes, not the container writable layer
  • A Pod is not the same as a container; it is a wrapper around one or more containers

A strong answer is:

“A Pod is Kubernetes’ smallest schedulable unit. It gives one or more containers a shared network namespace, optional shared volumes, and a common lifecycle. For production, I usually manage Pods through controllers, not bare Pod manifests.”

Deployment vs ReplicaSet vs Pod?

A Deployment, ReplicaSet, and Pod are related, but they operate at different levels.

Object Purpose
Pod Runs one or more containers
ReplicaSet Maintains a desired number of matching Pods
Deployment Manages ReplicaSets and provides rollout/rollback

Ownership chain:

text
Deployment
  → ReplicaSet
    → Pods

You normally create a Deployment, not a ReplicaSet directly.

Example Deployment behavior:

  1. You create a Deployment with replicas: 3
  2. Deployment creates a ReplicaSet
  3. ReplicaSet creates 3 Pods
  4. If a Pod dies, ReplicaSet creates a replacement
  5. If you update the image, Deployment creates a new ReplicaSet
  6. Old ReplicaSet scales down while new ReplicaSet scales up

Common rollout strategies:

Strategy Behavior
RollingUpdate Gradually replace old Pods with new Pods
Recreate Stop old Pods first, then start new Pods

Rolling update example:

yaml
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

Meaning:

Field Meaning
maxSurge Extra Pods allowed above desired replicas during rollout
maxUnavailable Pods allowed to be unavailable during rollout

maxUnavailable: 0 helps keep capacity during rollout, but the rollout may stall if new Pods cannot become Ready. Always combine rollout strategy with correct readiness probes and capacity planning.

Useful commands:

bash
kubectl rollout status deployment/api
kubectl rollout history deployment/api
kubectl rollout undo deployment/api

Common interview mistake: editing or deleting Pods directly. If a Pod is managed by a ReplicaSet, manual Pod changes are temporary. The controller will recreate or replace Pods based on the desired state.

A strong answer is:

“A Pod runs containers, a ReplicaSet keeps the desired number of Pods, and a Deployment manages ReplicaSets for rolling updates and rollback. In normal application deployments, I change the Deployment and let controllers handle Pod replacement.”

When do you use StatefulSet instead of Deployment?

Use a StatefulSet when each Pod needs a stable identity or stable storage.

A StatefulSet provides:

Requirement What StatefulSet gives
Stable Pod name app-0, app-1, app-2
Stable network identity DNS identity through a headless Service
Stable storage PVC per Pod through volumeClaimTemplates
Ordered rollout Predictable create/update/delete order
Ordered scaling Scale up/down in ordinal order by default

Example use cases:

  • Databases
  • Kafka brokers
  • ZooKeeper
  • Elasticsearch/OpenSearch
  • Redis cluster members
  • Stateful queue/broker systems

Deployment vs StatefulSet:

Deployment StatefulSet
Pods are interchangeable Pods have stable identity
Good for stateless APIs Good for stateful clustered systems
Any Pod can serve any request Pod identity may map to data/shard
Uses ReplicaSet Uses stable ordinals
Storage usually shared/external Each Pod can have its own PVC

StatefulSet Pods have predictable names:

text
mysql-0
mysql-1
mysql-2

With a headless Service, each Pod can get stable DNS such as:

text
mysql-0.mysql.default.svc.cluster.local
mysql-1.mysql.default.svc.cluster.local

Important storage point: if a StatefulSet Pod is rescheduled, Kubernetes can attach the same PVC to the replacement Pod with the same identity.

Do not use StatefulSet just because an app writes files. Many apps should store state in external databases/object storage and run as stateless Deployments.

A strong answer is:

“I use StatefulSet when Pod identity and storage must be stable across rescheduling, such as databases or brokers. For stateless APIs where replicas are interchangeable, I use Deployment.”

DaemonSet, Job, and CronJob — use cases?

DaemonSet, Job, and CronJob solve different workload problems.

Workload Use case Example
DaemonSet Run a Pod on every selected node Log agent, node exporter, CNI plugin
Job Run a task to completion DB migration, batch export
CronJob Run Jobs on a schedule Nightly report, cleanup, certificate check

DaemonSet:

  • Ensures each selected node runs a copy of a Pod
  • Adds Pods when nodes are added
  • Removes Pods when nodes are removed
  • Common for node-level agents

Examples:

text
Fluent Bit
node-exporter
CNI agents
storage plugins
security agents

Job:

  • Creates one or more Pods
  • Retries failed Pods depending on backoffLimit
  • Completes when the required number of successful completions is reached

CronJob:

  • Creates Jobs on a schedule
  • Uses cron syntax
  • Good for repeated batch tasks

Example CronJob idea:

yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-cleanup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: cleanup
            image: registry.example.com/cleanup:v1

Important interview points:

  • Use restartPolicy: OnFailure or Never for Jobs, not Always
  • Use backoffLimit to control retries
  • Use concurrencyPolicy for CronJobs when overlapping runs are dangerous
  • Use ttlSecondsAfterFinished if completed Job cleanup is needed
  • DaemonSets may need tolerations to run on tainted nodes
  • Control-plane nodes usually require explicit tolerations if you want DaemonSet Pods there

Common mistake:

Running a database migration as a bare Pod without thinking about retries, restart policy, idempotency, or whether it can run twice.

A strong answer is:

“I use DaemonSet for node-level agents, Job for run-to-completion tasks, and CronJob for scheduled Jobs. For batch work, I think carefully about restart policy, backoff, idempotency, and cleanup.”

Pod lifecycle phases and restart policies?

A Pod has a high-level phase, and each container inside the Pod has a more detailed state.

Pod phases:

Phase Meaning
Pending Pod accepted, but one or more containers are not running yet
Running Pod assigned to a node and at least one container is running
Succeeded All containers exited successfully and will not restart
Failed At least one container exited in failure and will not restart
Unknown Node state cannot be determined

Container states:

State Meaning
Waiting Container not running yet
Running Container is running
Terminated Container exited

Common container waiting/terminated reasons:

Reason Likely cause
ImagePullBackOff Wrong image, missing tag, registry auth, network, CA
ErrImagePull Initial image pull failure
CrashLoopBackOff Process starts then exits repeatedly
CreateContainerConfigError Bad config, missing Secret/ConfigMap
RunContainerError Runtime failed to start container
OOMKilled Container exceeded memory limit
Completed Process finished successfully
Error Process exited with non-zero status

Restart policies:

Policy Behavior Common use
Always Restart containers when they exit Deployments
OnFailure Restart only if non-zero exit Jobs
Never Do not restart Debug/batch cases

For Deployments, restartPolicy is normally Always.

Debugging flow:

bash
kubectl get pod <pod-name>
kubectl describe pod <pod-name>
kubectl logs <pod-name>
kubectl logs <pod-name> --previous

How to read symptoms:

Symptom First check
Pending Events: scheduler, resources, taints, PVC
ImagePullBackOff Image name/tag, pull secret, registry, CA
CrashLoopBackOff App logs, command, env, config, probes
OOMKilled Memory limit, app memory use, node pressure
Completed in Deployment Wrong command for long-running service
Readiness false App readiness endpoint or dependency

Important nuance: Pod phase alone is not enough. Always check container state, reason, restart count, events, and logs.

A strong answer is:

“I do not stop at the Pod phase. I check container states, reasons, restart count, events, and previous logs. CrashLoopBackOff points to app exit/probe/config issues, while ImagePullBackOff points to registry, image, or credential problems.”


Services, networking, and Ingress

Kubernetes Service types — ClusterIP, NodePort, LoadBalancer?

A Kubernetes Service provides a stable network endpoint for a set of Pods.

Pods are ephemeral and their IPs can change. A Service gives clients a stable DNS name and virtual IP while routing traffic to matching Pods.

Service types:

Type Access pattern
ClusterIP Internal cluster access only; default type
NodePort Exposes Service on every node at a static port
LoadBalancer Provisions external load balancer when infrastructure supports it
ExternalName Maps Service name to external DNS name

Example Service:

yaml
apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  type: ClusterIP
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080

Important fields:

Field Meaning
selector Labels used to find backend Pods
port Service port exposed to clients
targetPort Port on the selected Pods
nodePort Static node port for NodePort type
clusterIP Internal virtual IP

Most common use:

Need Usually use
Service-to-service inside cluster ClusterIP
Temporary simple external access NodePort
Cloud external access LoadBalancer
HTTP/HTTPS host/path routing Ingress or Gateway API

A Service selector mismatch creates a Service with no backends.

Debug commands:

bash
kubectl get svc api
kubectl get endpoints api
kubectl get endpointslice -l kubernetes.io/service-name=api
kubectl get pods -l app=api

If the Service has no endpoints, check:

  • Pod labels
  • Service selector
  • Pod readiness
  • Namespace
  • Target port
  • EndpointSlice objects

Important note: Services route only to ready endpoints by default. If Pods exist but are not Ready, traffic may not be sent to them.

A strong answer is:

“A Service gives stable networking for ephemeral Pods. I use ClusterIP for internal traffic, LoadBalancer or Ingress/Gateway for external access, and I always verify that Service selectors produce ready endpoints.”

How does DNS work inside Kubernetes?

Kubernetes creates DNS records for Services and Pods. CoreDNS commonly serves these records inside the cluster.

Common Service DNS formats:

Name Meaning
api Service in the same namespace
api.dev Service in namespace dev
api.dev.svc Service under cluster service domain
api.dev.svc.cluster.local Fully qualified service name

Example:

text
my-svc.my-namespace.svc.cluster.local

From a Pod in the same namespace, short name usually works:

bash
curl http://api

From another namespace, use namespace-qualified name:

bash
curl http://api.dev

Headless Service:

yaml
spec:
  clusterIP: None

A headless Service does not get a normal ClusterIP. It can return individual Pod IPs, which is useful for StatefulSets and direct Pod identity.

StatefulSet DNS example:

text
mysql-0.mysql.default.svc.cluster.local

Common DNS troubleshooting:

bash
kubectl get svc -n kube-system kube-dns
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl exec -it <pod> -- nslookup kubernetes.default
kubectl exec -it <pod> -- cat /etc/resolv.conf

Check:

  • Service exists
  • Correct namespace
  • CoreDNS Pods running
  • NetworkPolicy allows DNS egress
  • Pod /etc/resolv.conf
  • Search domains and ndots
  • Service endpoints exist
  • CNI connectivity to CoreDNS

NetworkPolicy gotcha:

If egress is restricted, allow DNS to CoreDNS/kube-dns on UDP and TCP 53. DNS can use TCP for large responses or retries, so allowing only UDP may cause intermittent issues.

A strong answer is:

“CoreDNS gives Services stable names inside the cluster. I test short and fully qualified names, check Service endpoints, inspect /etc/resolv.conf, and make sure NetworkPolicy allows DNS egress to kube-dns/CoreDNS on port 53.”

What is Ingress and how does it differ from LoadBalancer?

An Ingress manages HTTP/HTTPS routing from outside the cluster to Services inside the cluster.

Ingress can provide:

  • Host-based routing
  • Path-based routing
  • TLS termination
  • HTTP routing rules
  • One external entry point for many Services

An Ingress requires an Ingress controller. Without a controller, the Ingress object exists but does not route traffic.

Common controllers:

  • NGINX Ingress Controller
  • Traefik
  • HAProxy
  • AWS Load Balancer Controller
  • GCE/GKE Ingress controller
  • Azure Application Gateway Ingress Controller

LoadBalancer Service vs Ingress:

LoadBalancer Service Ingress
Exposes one Service externally Routes to many Services
Usually L4 TCP/UDP style L7 HTTP/HTTPS routing
May create one cloud LB per Service Can consolidate many routes
Simple for one service Better for host/path routing
No built-in path routing Supports host/path rules
TLS depends on LB setup Commonly terminates TLS at controller

Ingress example:

yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web
spec:
  ingressClassName: nginx
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web
            port:
              number: 80

Common Ingress troubleshooting:

bash
kubectl get ingress
kubectl describe ingress web
kubectl get ingressclass
kubectl get svc web
kubectl get endpoints web
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller

Check:

  • Ingress controller is installed
  • ingressClassName matches the controller
  • DNS points to the external address
  • TLS Secret exists and matches host
  • Backend Service exists
  • Service has ready endpoints
  • Path type and rewrite annotations are correct
  • Controller-specific annotations are valid

Modern senior note: Gateway API is the newer Kubernetes API family for more expressive traffic management. Ingress is still widely used, but Gateway API is increasingly relevant for advanced routing and platform/team ownership models.

A strong answer is:

“Ingress is L7 HTTP/HTTPS routing to Services and needs an Ingress controller. I use it when I need host/path routing and TLS termination instead of creating one LoadBalancer per microservice.”

kube-proxy and CNI — networking stack basics?

Kubernetes networking has multiple layers.

Layer Component Role
Pod networking CNI plugin Assigns Pod IPs and connects Pods across nodes
Service networking kube-proxy or eBPF dataplane Implements Service virtual IP/load balancing
DNS CoreDNS Resolves Service and Pod DNS names
Policy NetworkPolicy + CNI support Controls allowed ingress/egress traffic
Ingress/Gateway Controller Handles external L7 traffic

CNI plugin examples:

  • Calico
  • Cilium
  • Flannel
  • Weave Net
  • Antrea
  • AWS VPC CNI
  • Azure CNI

CNI responsibilities:

  • Allocate Pod IPs
  • Set up network interfaces
  • Configure routes
  • Enable Pod-to-Pod connectivity
  • Optionally enforce NetworkPolicy

kube-proxy responsibilities:

  • Watches Services and EndpointSlices
  • Programs node networking rules
  • Sends Service traffic to backend Pods
  • Common modes include iptables and IPVS

Some CNIs, such as Cilium, can replace kube-proxy functionality with an eBPF dataplane.

Important distinction:

Problem Likely layer
Pod has no IP CNI setup issue
Pod-to-Pod fails CNI routing/policy issue
Service IP fails but Pod IP works kube-proxy/eBPF/Service endpoints
DNS name fails but Service IP works CoreDNS or DNS policy
External HTTP route fails Ingress/Gateway/controller/LB
Policy not enforced CNI may not support NetworkPolicy

Service mesh note:

A service mesh such as Istio or Linkerd adds proxies around application traffic. Sidecars affect:

  • Pod CPU/memory requests
  • Startup/shutdown behavior
  • Probes
  • mTLS
  • Traffic routing
  • HPA calculations if resources are not set properly

A strong answer is:

“CNI gives Pods network connectivity and IPs. kube-proxy or an eBPF replacement implements Service load balancing. CoreDNS handles names, and NetworkPolicy works only if the CNI enforces it.”

NetworkPolicy — what do interviewers test?

A NetworkPolicy controls allowed ingress and/or egress traffic for selected Pods.

Important baseline:

  • If no NetworkPolicy selects a Pod, traffic is allowed by default
  • Once a Pod is selected by an ingress policy, only allowed ingress traffic is permitted
  • Once a Pod is selected by an egress policy, only allowed egress traffic is permitted
  • NetworkPolicy requires CNI support; otherwise policies may not be enforced

Common rules:

Scenario Behavior
No policies in namespace Traffic allowed by default
Ingress policy selects Pod Only allowed ingress reaches that Pod
Egress policy selects Pod Only allowed egress leaves that Pod
policyTypes: [Ingress] Only ingress restricted
policyTypes: [Egress] Only egress restricted
Empty ingress list Deny all ingress to selected Pods
Empty egress list Deny all egress from selected Pods
Multiple policies apply Allowed traffic is the union of all policies

Example policy allowing frontend to call API and API to resolve DNS:

yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-allow
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53

Important DNS gotcha:

If you restrict egress and forget DNS, the app may fail even though Services and Pods are healthy.

Allow:

  • UDP 53
  • TCP 53
  • Correct namespace/pod labels for CoreDNS/kube-dns

Another common interview gotcha: NetworkPolicy is not a firewall for all cluster traffic in every direction automatically. It applies to selected Pods and supported traffic types through the CNI implementation.

Troubleshooting checklist:

bash
kubectl get networkpolicy -n <ns>
kubectl describe networkpolicy <name> -n <ns>
kubectl get pods --show-labels -n <ns>
kubectl get ns --show-labels

Check:

  • Does the policy select the intended Pods?
  • Are namespace labels correct?
  • Are pod labels correct?
  • Is ingress allowed on the destination?
  • Is egress allowed from the source?
  • Is DNS allowed?
  • Does the CNI support NetworkPolicy?
  • Are ports matching container/application ports?

Advanced production patterns:

  • Default deny all ingress
  • Default deny all egress
  • Allow DNS explicitly
  • Allow only required service-to-service flows
  • Force external egress through gateway/proxy
  • Separate policies by app/team/namespace

A strong answer is:

“NetworkPolicy starts from default allow, then restricts selected Pods. I verify both destination ingress and source egress, allow DNS explicitly, and confirm the CNI actually enforces NetworkPolicy.”


Configuration, secrets, and storage

ConfigMap — how and when to use it?

A ConfigMap stores non-sensitive configuration separately from the container image.

Use ConfigMap for:

  • Feature flags
  • Log levels
  • App settings
  • Config files
  • Non-secret URLs
  • Runtime toggles

Do not store passwords, API keys, tokens, or certificates in ConfigMaps. Use Secrets or an external secret manager for sensitive values.

Example:

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  LOG_LEVEL: info
  app.properties: |
    cache.ttl=300
    feature.newCheckout=true

Common ways to consume ConfigMap:

Method Best use
Environment variable Simple key/value settings
Volume mount Config files read from disk
Command args Startup flags
Projected volume Combine ConfigMap with Secret/downward API

Environment variable example:

yaml
env:
- name: LOG_LEVEL
  valueFrom:
    configMapKeyRef:
      name: app-config
      key: LOG_LEVEL

Volume mount example:

yaml
volumes:
- name: config
  configMap:
    name: app-config

containers:
- name: app
  image: example/app:v1
  volumeMounts:
  - name: config
    mountPath: /etc/app

Important update behavior:

Usage What happens when ConfigMap changes?
Env var Existing Pod does not automatically get new value
Volume mount File content updates eventually
subPath mount Does not receive automatic updates
App config reload App must watch/reload file or Pod must restart

In production, many teams trigger a Deployment rollout when ConfigMap changes so Pods restart with known config.

Common rollout pattern:

bash
kubectl rollout restart deployment/api

Interview mistake: saying “ConfigMap update immediately changes the app.” The mounted file may update eventually, but the application must re-read it. Environment variables require a new Pod.

A strong answer is:

“I use ConfigMap for non-secret runtime configuration. Env vars are simple but require Pod restart to change; mounted files can update eventually, but the app must reload them. For predictable production releases, I often trigger a rollout when config changes.”

Secrets — storage, mounting, and security?

A Kubernetes Secret stores sensitive data such as passwords, tokens, private keys, and certificates.

Example:

yaml
apiVersion: v1
kind: Secret
metadata:
  name: db-creds
type: Opaque
stringData:
  username: app
  password: changeme

Important security point:

Base64 is encoding, not encryption.

Secrets are stored as Kubernetes API objects. Depending on cluster configuration, they may be stored in etcd in a form that needs encryption at rest for stronger protection.

Good practices:

Practice Why it matters
Enable encryption at rest Protect Secret data in etcd
Restrict RBAC Least privilege for get/list/watch secrets
Use external secret managers Centralized rotation/audit
Avoid committing Secrets to Git Prevent credential leakage
Prefer short-lived credentials Reduce blast radius
Treat etcd backups as sensitive Backups may contain Secret data
Rotate secrets Handle leaks and lifecycle
Limit mounted keys Expose only what the container needs

Secret as environment variable:

yaml
env:
- name: DB_PASSWORD
  valueFrom:
    secretKeyRef:
      name: db-creds
      key: password

Secret as volume:

yaml
volumes:
- name: db-creds
  secret:
    secretName: db-creds

containers:
- name: app
  volumeMounts:
  - name: db-creds
    mountPath: /etc/secrets
    readOnly: true

Volume mounts are often preferred for secrets because:

  • File permissions can be controlled
  • Some apps can reload files
  • They avoid putting secrets directly into environment variables
  • Rotation can be easier than env-based secrets

But do not overstate it: mounted Secrets can still be read by any process with filesystem access inside the container.

External secret options:

  • HashiCorp Vault
  • AWS Secrets Manager
  • Azure Key Vault
  • Google Secret Manager
  • External Secrets Operator
  • Secrets Store CSI Driver
  • Sealed Secrets for GitOps workflows

Common interview mistake: saying Kubernetes Secrets are secure because they are base64-encoded. They need RBAC, encryption at rest, secure backup handling, and secret rotation strategy.

A strong answer is:

“Kubernetes Secrets are sensitive API objects, not magically encrypted values. I restrict RBAC, enable encryption at rest, avoid Git commits, prefer external secret managers for production, and mount only the secret keys a container actually needs.”

PersistentVolume, PersistentVolumeClaim, and StorageClass?

Kubernetes separates storage into three main objects.

Object Role
PersistentVolume Actual cluster storage resource
PersistentVolumeClaim User/application request for storage
StorageClass Dynamic provisioning template

A PV is the storage resource. It may be backed by NFS, EBS, Azure Disk, Ceph, vSphere, local storage, or another CSI driver.

A PVC is the application’s request for storage.

A StorageClass tells Kubernetes how to dynamically provision storage.

PVC example:

yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 10Gi

Access modes:

Access mode Meaning
ReadWriteOnce Mounted read-write by one node
ReadOnlyMany Mounted read-only by many nodes
ReadWriteMany Mounted read-write by many nodes
ReadWriteOncePod Mounted read-write by only one Pod

Common mapping:

Workload Typical storage
Single database Pod RWO block volume
Shared uploads RWX filesystem such as NFS/CephFS/EFS
StatefulSet database PVC per Pod
Cache/temp data emptyDir, not PVC
Logs Usually stdout + log pipeline, not PVC

Dynamic provisioning flow:

text
Pod references PVC
→ PVC references StorageClass
→ CSI provisioner creates PV
→ PVC binds to PV
→ Pod mounts volume

If a Pod stays Pending, check PVC status:

bash
kubectl get pvc
kubectl describe pvc data
kubectl describe pod <pod-name>

Common storage problems:

Symptom Likely cause
PVC Pending No matching PV or provisioner issue
Pod Pending PVC not bound
Multi-attach error RWO volume mounted on another node
Volume node affinity conflict Volume tied to different zone
Permission denied Filesystem ownership/security context
Data deleted unexpectedly Reclaim policy or PVC deletion

Reclaim policy matters:

Policy Behavior
Delete Delete backing storage when PVC/PV is deleted
Retain Keep backing storage for manual recovery

StatefulSets commonly use volumeClaimTemplates so each Pod gets its own PVC.

A strong answer is:

“A PVC is the app’s storage request, a PV is the actual storage, and a StorageClass defines dynamic provisioning. I match access mode and reclaim policy to the workload, and I debug Pending Pods by checking PVC binding and storage events.”

emptyDir, projected volumes, and downward API?

Kubernetes has several volume types for non-persistent or metadata-driven use cases.

Volume type Use
emptyDir Temporary storage shared by containers in one Pod
projected Combine ConfigMap, Secret, downwardAPI, and service account token
downward API Expose Pod/container metadata to the container
ConfigMap volume Mount non-secret config files
Secret volume Mount sensitive files

emptyDir is created when a Pod is assigned to a node and deleted when the Pod is removed from that node.

Use emptyDir for:

  • Scratch files
  • Shared files between app and sidecar
  • Temporary processing
  • Buffering
  • Logs consumed by sidecar

Example:

yaml
volumes:
- name: shared-logs
  emptyDir: {}

containers:
- name: app
  volumeMounts:
  - name: shared-logs
    mountPath: /var/log/app
- name: log-shipper
  volumeMounts:
  - name: shared-logs
    mountPath: /logs

Downward API exposes Pod metadata without requiring the app to call the Kubernetes API.

Environment variable example:

yaml
env:
- name: POD_NAME
  valueFrom:
    fieldRef:
      fieldPath: metadata.name
- name: POD_NAMESPACE
  valueFrom:
    fieldRef:
      fieldPath: metadata.namespace

Resource field example:

yaml
env:
- name: CPU_REQUEST
  valueFrom:
    resourceFieldRef:
      resource: requests.cpu

Projected volume example:

yaml
volumes:
- name: app-projected
  projected:
    sources:
    - configMap:
        name: app-config
    - secret:
        name: app-secret
    - downwardAPI:
        items:
        - path: pod-name
          fieldRef:
            fieldPath: metadata.name

Common interview distinctions:

Need Use
Temporary shared files emptyDir
Pod name/namespace/labels Downward API
Combine multiple config sources Projected volume
Persistent data PVC
Sensitive config Secret

A strong answer is:

“I use emptyDir for temporary files shared inside one Pod, projected volumes to combine config sources, and downward API when the app needs its own Pod metadata without calling the Kubernetes API.”

How does Kubernetes support 12-factor app configuration?

The 12-factor app model says configuration should be separated from code and injected at runtime.

Kubernetes supports this with:

12-factor idea Kubernetes mechanism
Config separated from code ConfigMap and Secret
Backing services as attached resources Service DNS, credentials, ExternalName
Port binding Container port + Service
Processes One main process per container
Concurrency Replicas and HPA
Disposability Fast startup and graceful shutdown
Logs stdout/stderr collected by platform
Dev/prod parity Same image, different config

Config examples:

Config type Kubernetes object
Log level ConfigMap
Feature flag ConfigMap
DB password Secret
API URL ConfigMap
TLS key/cert Secret
Runtime environment ConfigMap/fieldRef

Important shutdown behavior:

  1. Pod deletion or rollout begins
  2. Endpoint is removed when readiness fails/Pod terminates
  3. Container receives SIGTERM
  4. Kubernetes waits terminationGracePeriodSeconds
  5. If still running, container receives SIGKILL

Default grace period is commonly 30 seconds unless changed.

Example:

yaml
terminationGracePeriodSeconds: 45

App responsibilities:

  • Handle SIGTERM
  • Stop accepting new requests
  • Finish or cancel in-flight work
  • Close DB connections cleanly
  • Flush logs/metrics
  • Exit before grace period ends

Common production additions:

  • Readiness probe for traffic gating
  • PreStop hook only when truly needed
  • Config rollout automation
  • Separate config per environment
  • Secret rotation plan
  • Avoid baking environment-specific values into images

Good interview distinction:

Build once, deploy many times with different runtime config.

A strong answer is:

“Kubernetes supports 12-factor apps by keeping config outside the image through ConfigMaps and Secrets, scaling with replicas/HPA, logging to stdout, and relying on fast startup plus graceful SIGTERM handling during rollouts.”


Probes, resources, and autoscaling

Liveness, readiness, and startup probes — differences?

Kubernetes probes tell the kubelet how healthy a container is and whether it should receive traffic.

Probe Purpose Failure action
Startup probe Has the app finished starting? Container is killed if startup probe keeps failing
Liveness probe Is the app alive, not deadlocked? Container is restarted
Readiness probe Can the app serve traffic now? Pod is removed from Service endpoints

Example:

yaml
startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 10

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  periodSeconds: 5

How to design endpoints:

Endpoint Should check
Startup App initialization complete
Liveness Process/event loop not deadlocked
Readiness App can safely receive traffic
Deep health Dependencies, DB, cache, downstreams

Important trap: do not put downstream database checks in liveness unless the app truly cannot recover without restart. If the DB blips, liveness may restart every Pod and make the outage worse.

Better approach:

  • Liveness: shallow process health
  • Readiness: dependency readiness
  • Startup: slow initialization protection

Common probe mistakes:

Mistake Impact
Liveness too aggressive Crash loops during slow startup
Readiness missing Traffic sent before app is ready
Liveness checks DB Restart storm during dependency outage
Timeout too low False failures under load
Same endpoint for all probes Wrong failure behavior
No startup probe for slow app Killed before startup completes

Troubleshooting:

bash
kubectl describe pod <pod-name>
kubectl logs <pod-name>
kubectl logs <pod-name> --previous

Look for Events such as:

text
Liveness probe failed
Readiness probe failed
Startup probe failed

A strong answer is:

“Startup protects slow boots, readiness controls whether a Pod receives Service traffic, and liveness restarts only truly unhealthy containers. I keep liveness shallow and put dependency checks in readiness.”

Resource requests and limits?

Resource requests and limits control scheduling and runtime resource behavior.

Field Meaning
CPU request CPU amount used for scheduling and guaranteed share
Memory request Memory amount used for scheduling
CPU limit Maximum CPU allowed; may cause throttling
Memory limit Maximum memory allowed; can cause OOM kill

Example:

yaml
resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 256Mi

CPU vs memory:

Resource Behavior
CPU Compressible; container can be throttled
Memory Incompressible; container can be OOMKilled
Ephemeral storage Can trigger eviction if overused
HugePages Must be requested/limited explicitly where used

Requests matter because the scheduler uses them to decide where a Pod can fit.

text
Pod requests
→ scheduler checks node allocatable
→ Pod assigned only if resources fit

Limits matter because the runtime enforces them.

Common issues:

Symptom Possible cause
Pod Pending Requests too high for available nodes
CPU throttling CPU limit too low
OOMKilled Memory limit too low or app leak/spike
HPA not scaling on CPU Missing CPU requests
Node pressure eviction Requests too low or node overcommitted
Sidecar consumes resources Sidecar missing requests/limits

Quality of Service classes:

QoS Condition
Guaranteed Every container has equal request and limit for CPU/memory
Burstable Some requests/limits set, but not Guaranteed
BestEffort No CPU/memory requests or limits

Production guidance:

  • Set requests for every container, including sidecars
  • Set memory limits carefully
  • Avoid very low CPU limits for latency-sensitive apps
  • Use metrics to tune requests
  • Watch OOMKills, throttling, and node pressure
  • Remember HPA resource utilization depends on requests

A strong answer is:

“Requests are for scheduling and baseline capacity; limits are runtime caps. CPU over limit is throttled, memory over limit is killed. I set requests for every container because scheduling and HPA depend on them.”

Horizontal Pod Autoscaler (HPA) — how does it work?

The Horizontal Pod Autoscaler adjusts replica count based on metrics.

It can scale workloads such as Deployments, ReplicaSets, and StatefulSets through the scale subresource.

Example:

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

How CPU utilization HPA works conceptually:

text
current CPU usage / requested CPU
→ compare with target utilization
→ calculate desired replicas
→ scale target up/down

Prerequisites:

Requirement Why
metrics-server Provides CPU/memory metrics
CPU/memory requests Needed for utilization-based resource metrics
Scale target Deployment/StatefulSet/etc. must support scaling
Enough cluster capacity New Pods must schedule
Reasonable readiness/startup Avoid unstable scaling during startup

Common HPA metrics:

Metric type Example
Resource CPU, memory
Pods Requests per second per Pod
Object Queue depth
External Cloud metric, Kafka lag, custom adapter metric

Common troubleshooting:

bash
kubectl get hpa
kubectl describe hpa api
kubectl top pods
kubectl top nodes
kubectl get apiservice | grep metrics

Failure examples:

Symptom Likely cause
<unknown> metric metrics-server/custom metrics issue
HPA not scaling on CPU CPU requests missing
Scaled up but Pods Pending Cluster lacks capacity
Scaled up but traffic still slow App startup slow or readiness failing
Too much scaling up/down Metric noisy or behavior not tuned
Sidecar skews metrics Scale based on container metric or custom metric

If metrics are missing, HPA cannot make normal scaling decisions from those metrics. Existing replicas continue running, but scale decisions may be skipped or degraded.

Senior point: HPA is reactive. It scales after metrics show load. For sudden spikes, use enough baseline replicas, fast startup, queue-based scaling, predictive scaling, or KEDA/custom metrics where appropriate.

A strong answer is:

“HPA watches metrics and changes replica count between min and max. CPU utilization scaling depends on container requests and metrics-server. I debug HPA by checking conditions, metrics availability, requests, pending Pods, and whether the metric actually represents user load.”

HPA vs VPA — interview trade-offs?

HPA and VPA solve different scaling problems.

Autoscaler Scales Best for
HPA Number of Pods Traffic/load changes
VPA CPU/memory requests Right-sizing workloads
Cluster Autoscaler Number of nodes Unschedulable Pods due to lack of capacity

HPA example:

text
More traffic
→ CPU/RPS/queue metric rises
→ HPA adds replicas

VPA example:

text
Pod consistently uses more memory than requested
→ VPA recommends or updates larger request

Cluster Autoscaler example:

text
HPA creates more Pods
→ Pods remain Pending
→ Cluster Autoscaler adds nodes

Trade-offs:

Topic HPA VPA
Works well for stateless apps Yes Helps tune resources
Handles traffic spikes Yes No, not directly
Changes Pod count Yes No
Changes request size No Yes
May need custom metrics Often Usually not
Can restart Pods No direct restart for scaling May evict/recreate Pods depending mode

Important conflict:

Do not blindly run HPA and VPA on the same CPU/memory signal for the same workload. HPA uses utilization relative to requests. If VPA changes requests while HPA scales on CPU utilization, the two controllers can fight or create confusing behavior.

Common best practice:

  • HPA for scaling replicas on traffic/load metrics
  • VPA in recommendation mode for right-sizing
  • VPA for workloads not horizontally scalable
  • Cluster Autoscaler for node capacity
  • Custom metrics for queue/RPS/latency-driven scaling

Good examples:

Workload Better scaling
Stateless API HPA on CPU/RPS/custom metric
Batch worker HPA/KEDA on queue depth
Database VPA recommendations/manual tuning
Memory-heavy singleton VPA or manual sizing
Web app with sidecar Container metric or custom metric

A strong answer is:

“HPA adds or removes Pods, VPA adjusts resource requests, and Cluster Autoscaler adds nodes. I avoid letting HPA and VPA fight over the same CPU utilization signal and usually use VPA recommendation mode for HPA-managed apps.”

Taints, tolerations, and node affinity?

Taints, tolerations, and affinity control where Pods run.

Simple rule:

Feature Meaning
Taint Node repels Pods
Toleration Pod is allowed to tolerate a taint
Node affinity Pod prefers or requires nodes with labels
Pod affinity Pod prefers/requires running near other Pods
Pod anti-affinity Pod prefers/requires avoiding other Pods

Taint example:

bash
kubectl taint nodes node1 dedicated=gpu:NoSchedule

Toleration example:

yaml
tolerations:
- key: dedicated
  operator: Equal
  value: gpu
  effect: NoSchedule

Taint effects:

Effect Behavior
NoSchedule Do not schedule new Pods unless tolerated
PreferNoSchedule Try to avoid scheduling
NoExecute Evict existing Pods that do not tolerate the taint

Important distinction:

  • A toleration does not force a Pod onto a node
  • It only allows the Pod to be scheduled there
  • Use node affinity/nodeSelector to attract the Pod to the desired node

Example dedicated GPU nodes:

yaml
tolerations:
- key: dedicated
  operator: Equal
  value: gpu
  effect: NoSchedule

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node-type
          operator: In
          values:
          - gpu

Node affinity examples:

Need Use
Run only on SSD nodes Required node affinity
Prefer same zone as dependency Preferred node affinity
Spread replicas across nodes/zones Pod anti-affinity or topology spread
Reserve nodes for platform agents Taints/tolerations
Keep all replicas off one node Pod anti-affinity

Common Pending event:

text
node(s) had taint {dedicated: gpu}, that the pod didn't tolerate

Troubleshooting:

bash
kubectl describe pod <pod-name>
kubectl describe node <node-name>
kubectl get nodes --show-labels

Check:

  • Node taints
  • Pod tolerations
  • Node labels
  • nodeSelector/affinity
  • Required vs preferred rules
  • Resource requests
  • PVC zone constraints
  • Topology spread constraints

Production pattern:

  • Taint special nodes to reserve them
  • Add tolerations only to approved workloads
  • Add node affinity so workloads actually target those nodes
  • Use pod anti-affinity/topology spread to avoid placing all replicas on one failure domain

A strong answer is:

“Taints repel Pods, tolerations allow Pods onto tainted nodes, and affinity attracts Pods to labeled nodes or spreads them relative to other Pods. For dedicated nodes, I usually combine taints, tolerations, and node affinity.”


Security, RBAC, and multi-tenancy

Kubernetes RBAC — Roles, ClusterRoles, Bindings?

Kubernetes RBAC controls who can perform which actions on which API resources.

There are four main RBAC objects:

Object Scope Purpose
Role Namespace Defines permissions inside one namespace
ClusterRole Cluster-wide Defines permissions for cluster-scoped resources or reusable namespaced rules
RoleBinding Namespace Grants a Role or ClusterRole to subjects in one namespace
ClusterRoleBinding Cluster-wide Grants a ClusterRole across the cluster

A Role defines permissions, but it does not grant them until a binding attaches it to a subject.

Example Role:

yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: dev
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]

Example RoleBinding:

yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: dev
  name: read-pods
subjects:
- kind: ServiceAccount
  name: app-reader
  namespace: dev
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Important interview details:

Need Better choice
App reads Pods in one namespace Role + RoleBinding
CI deploys one app namespace Role + RoleBinding with limited verbs
Read Nodes or PersistentVolumes ClusterRole
Grant cluster admin rights ClusterRoleBinding, only for trusted admins
Reuse same permission set in many namespaces ClusterRole + RoleBinding per namespace

Common verbs:

text
get, list, watch, create, update, patch, delete

Good debugging commands:

bash
kubectl auth can-i get pods -n dev
kubectl auth can-i create deployments -n dev --as=system:serviceaccount:dev:ci-deployer

Least privilege examples:

  • CI deploy ServiceAccount can update Deployments in one namespace
  • App ServiceAccount can read only required ConfigMaps or custom resources
  • Monitoring can list/watch metrics-related resources
  • Avoid giving app workloads list secrets unless truly required
  • Avoid cluster-admin for pipelines and applications

Common mistake:

Giving a namespace deploy pipeline cluster-admin because one permission was missing.

Better: identify the exact API group, resource, namespace, and verb needed.

A strong answer is:

“RBAC separates permission definition from permission binding. I use Roles for namespace-scoped access, ClusterRoles for cluster-wide or reusable rules, and bind them to users/groups/ServiceAccounts with least privilege.”

ServiceAccount and Pod identity?

A ServiceAccount gives a Pod an identity inside Kubernetes.

Every Pod runs as a ServiceAccount. If you do not specify one, Kubernetes uses the namespace’s default ServiceAccount.

Example:

yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: api-sa
  namespace: prod

Pod using a ServiceAccount:

yaml
apiVersion: v1
kind: Pod
metadata:
  name: api
spec:
  serviceAccountName: api-sa
  containers:
  - name: api
    image: registry.example.com/api:v1

Why ServiceAccounts matter:

Topic Explanation
Kubernetes API access Pod can authenticate to the API server
RBAC Permissions are granted to the ServiceAccount
Token projection Tokens can be mounted into Pods
Cloud IAM integration Map Pod identity to cloud identity
Auditability API actions can be traced to workload identity

Older clusters often used long-lived ServiceAccount tokens. Modern Kubernetes favors projected, bounded, expiring tokens.

Good production practices:

  • Create a dedicated ServiceAccount per app/workload
  • Avoid using the default ServiceAccount for production apps
  • Bind only required RBAC permissions
  • Disable token mounting if the app does not need Kubernetes API access
  • Use cloud workload identity instead of static cloud keys in Secrets
  • Rotate and audit credentials

Disable automount when not needed:

yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: no-api-access
automountServiceAccountToken: false

Pod-level override:

yaml
spec:
  automountServiceAccountToken: false

Cloud identity examples:

Cloud Common mechanism
AWS IRSA / EKS Pod Identity
GCP Workload Identity
Azure Workload Identity / managed identity integration

Interview trap:

A Kubernetes ServiceAccount is not the same as a Linux user inside the container.

The ServiceAccount controls Kubernetes API identity. Linux user settings are controlled by container image and securityContext.

A strong answer is:

“A Pod runs as a ServiceAccount for Kubernetes API identity. I use one dedicated ServiceAccount per workload, bind minimal RBAC, avoid default accounts, and prefer cloud workload identity over long-lived cloud keys stored in Secrets.”

Namespaces — isolation and organization?

Namespaces divide a Kubernetes cluster into logical scopes.

Use namespaces for:

  • Teams
  • Applications
  • Environments
  • Tenants
  • System components
  • Blast-radius control

Examples:

text
dev
staging
prod
team-payments
team-platform
ingress-nginx
monitoring

Namespaces provide namespacing for many Kubernetes objects:

Namespaced objects Cluster-scoped objects
Pods Nodes
Services PersistentVolumes
Deployments StorageClasses
ConfigMaps ClusterRoles
Secrets ClusterRoleBindings
PVCs CustomResourceDefinitions
Roles Namespaces

Important point:

Namespaces are not complete security boundaries by themselves.

For real isolation, combine namespaces with:

Control Why
RBAC Who can access objects
ResourceQuota Limit resource consumption
LimitRange Default/min/max requests and limits
NetworkPolicy Restrict traffic between namespaces/apps
Pod Security Admission Enforce Pod hardening
Admission policies Enforce org standards
Separate node pools Stronger workload separation
Separate clusters Strongest isolation for high-risk tenants

ResourceQuota example idea:

yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-payments
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    pods: "50"

LimitRange example use:

  • Default CPU/memory requests
  • Default CPU/memory limits
  • Minimum/maximum container resources
  • Prevent BestEffort Pods in shared namespaces

Good namespace practices:

  • Avoid running production workloads in default
  • Use consistent labels
  • Apply quotas for shared clusters
  • Apply NetworkPolicies for sensitive namespaces
  • Use separate namespaces for platform components
  • Keep environment separation clear

Common mistake:

Creating dev, staging, and prod namespaces but giving everyone cluster-admin and allowing all network traffic.

That is organization, not secure multi-tenancy.

A strong answer is:

“Namespaces scope names and organize resources, but they are not hard security boundaries alone. I pair them with RBAC, quotas, NetworkPolicy, Pod Security Admission, and sometimes separate node pools or clusters.”

Pod Security Standards and security contexts?

Kubernetes Pod Security Standards define three policy levels:

Level Meaning
privileged Unrestricted; trusted system workloads only
baseline Prevents known privilege escalation patterns
restricted Stronger hardening for security-sensitive workloads

Pod Security Admission can enforce, warn, or audit these standards through namespace labels.

Example namespace labels:

yaml
metadata:
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

A securityContext controls Linux/security settings for Pods and containers.

Example hardened container:

yaml
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop: ["ALL"]
  seccompProfile:
    type: RuntimeDefault

Common hardening controls:

Control Why
runAsNonRoot Avoid root inside container
runAsUser Run as specific UID
allowPrivilegeEscalation: false Prevent privilege escalation
readOnlyRootFilesystem: true Reduce writable attack surface
Drop capabilities Remove unnecessary Linux privileges
seccompProfile: RuntimeDefault Restrict syscalls
AppArmor/SELinux Extra MAC enforcement where supported
Avoid privileged Pods Prevent broad host access
Avoid hostPath Prevent direct host filesystem exposure
Avoid hostNetwork/hostPID Reduce node-level exposure

Pod-level example:

yaml
spec:
  securityContext:
    runAsNonRoot: true
    fsGroup: 2000

Container-level settings override or complement Pod-level settings.

Interview caveat:

Some workloads need exceptions, such as CNI, CSI, monitoring agents, or node-level security agents. Exceptions should be isolated and reviewed.

Common production approach:

  • Enforce restricted for app namespaces
  • Allow baseline or controlled exceptions for platform namespaces
  • Use admission policies to block privileged containers
  • Scan images and manifests in CI
  • Avoid root containers unless justified

A strong answer is:

“I harden Pods with non-root users, dropped capabilities, no privilege escalation, read-only root filesystems, and RuntimeDefault seccomp. Pod Security Standards help enforce these controls consistently at namespace level.”

Admission controllers and policy engines?

Admission control runs after authentication and authorization but before an object is persisted.

Request flow:

text
request
→ authentication
→ authorization
→ admission
→ validation
→ etcd

Admission can:

  • Mutate an object
  • Validate an object
  • Reject an object
  • Apply defaults
  • Enforce organization policy

Webhook types:

Type Behavior
Mutating admission webhook Changes the object before storage
Validating admission webhook Allows or rejects the object
ValidatingAdmissionPolicy Built-in declarative validation without external webhook callouts

Examples of admission policy:

  • Require resource requests/limits
  • Block :latest image tag
  • Require approved registries
  • Require labels/annotations
  • Enforce non-root containers
  • Require probes
  • Restrict hostPath
  • Block privileged containers
  • Add sidecars or default labels
  • Enforce image signature policy through external tools

Common policy engines:

Tool Style
OPA Gatekeeper Rego-based policy
Kyverno Kubernetes/YAML-native policy
ValidatingAdmissionPolicy Kubernetes-native CEL validation
Custom webhooks Application/platform-specific logic

Example interview scenario:

“A developer applies a Pod without resource requests. CI missed it. Admission policy rejects it before etcd stores it.”

Good practices:

  • Test policies in audit/warn mode before enforce
  • Avoid slow or unreliable webhooks
  • Configure failure policy intentionally
  • Monitor webhook latency and errors
  • Keep policies version controlled
  • Use CI policy checks before admission rejection
  • Avoid mutating too much invisibly
  • Document exceptions

Common failure:

text
failed calling webhook

Check:

bash
kubectl get validatingwebhookconfiguration
kubectl get mutatingwebhookconfiguration
kubectl describe validatingwebhookconfiguration <name>
kubectl get pods -n <webhook-namespace>
kubectl logs -n <webhook-namespace> deploy/<webhook>

A strong answer is:

“Admission controllers enforce policy before objects are stored. I use validating policies to reject unsafe manifests, mutating policies for controlled defaults, and policy-as-code in CI plus admission so bad config is caught before production.”


Advanced topics, troubleshooting, and scenarios

Helm, Kustomize, and GitOps basics?

Helm, Kustomize, and GitOps solve related but different deployment problems.

Tool Role
Helm Packages Kubernetes resources as charts with templates and values
Kustomize Customizes plain YAML using bases, patches, and overlays
Argo CD / Flux Reconciles cluster state from declared sources such as Git or OCI artifacts

Helm concepts:

Concept Meaning
Chart Package of Kubernetes templates/files
Values User-provided configuration for templates
Release Installed instance of a chart
Revision Versioned release history
Rollback Return a release to a previous revision

Common Helm commands:

bash
helm install api ./chart -n prod
helm upgrade api ./chart -n prod -f values-prod.yaml
helm history api -n prod
helm rollback api 3 -n prod

Kustomize concepts:

text
base/
  deployment.yaml
  service.yaml

overlays/
  dev/
    kustomization.yaml
  prod/
    kustomization.yaml

Example:

bash
kubectl apply -k overlays/prod

GitOps flow:

text
developer merges change
→ CI builds image
→ manifest/chart values updated in Git
→ GitOps controller detects desired state
→ controller reconciles cluster
→ drift is corrected or reported

Good GitOps practices:

  • Git is source of truth
  • Avoid manual kubectl edit in production
  • Use pull requests for environment changes
  • Keep secrets encrypted or externally referenced
  • Pin image tags/digests intentionally
  • Separate app source repo and environment config repo if needed
  • Use health checks and sync status
  • Roll back by Git revert or Helm rollback depending workflow

Helm vs Kustomize interview answer:

Use Helm when Use Kustomize when
You need packaged reusable app charts You want plain YAML overlays
You need templating and values You prefer patching existing manifests
Third-party app already ships Helm chart You manage environment-specific deltas
Release history matters You want kubectl-native customization

Common senior point:

Helm can be used with GitOps. Argo CD and Flux can deploy Helm charts while still using Git as the desired-state source.

A strong answer is:

“Helm packages and templates applications, Kustomize patches plain manifests, and GitOps controllers reconcile the cluster from version-controlled desired state. In production, I treat Git as source of truth and avoid manual drift.”

Scenario: Pod stuck Pending — how do you debug?

A Pod stuck in Pending usually means it has not been scheduled or a required dependency is not ready.

Start with Events:

bash
kubectl describe pod <pod-name> -n <namespace>
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

Then classify the issue.

Event clue Likely cause Fix direction
Insufficient cpu Requests do not fit nodes Add nodes, reduce request, scale down other workloads
Insufficient memory Memory request too high Add capacity or tune requests
had taint ... didn't tolerate Missing toleration Add toleration or use correct node pool
didn't match node selector Label/selector mismatch Fix node labels or selector
didn't match node affinity Affinity too strict Relax/fix affinity
pod has unbound immediate PersistentVolumeClaims PVC not bound Check PVC/StorageClass/PV
volume node affinity conflict Volume in wrong zone Schedule in same zone or reprovision storage
Too many pods Node Pod density limit Add nodes or reduce Pods per node

Useful commands:

bash
kubectl get pod <pod-name> -n <namespace> -o wide
kubectl describe pod <pod-name> -n <namespace>
kubectl get nodes --show-labels
kubectl describe node <node-name>
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>
kubectl top nodes

How to reason:

text
NODE column empty
→ scheduling failed
→ read Events
→ classify resource / taint / affinity / volume

Important distinction:

  • Pending with empty NODE usually points to scheduling or PVC binding
  • Pod assigned to a node but stuck in image pull or container creation is a kubelet/runtime issue
  • Image pull failures may briefly show waiting states, then become ImagePullBackOff or ErrImagePull

Common interview mistake:

Guessing resource issue without reading Events.

Better narrative:

  1. Describe Pod
  2. Read Events
  3. Check resources/taints/affinity/PVC
  4. Confirm node capacity
  5. Fix the exact constraint
  6. Recheck scheduling

A strong answer is:

“I start with kubectl describe pod and Events. Pending usually points to scheduling constraints such as insufficient resources, taints, affinity, or unbound PVCs, so I classify the Event before changing anything.”

Scenario: CrashLoopBackOff — debugging steps?

CrashLoopBackOff means a container starts, exits, and Kubernetes backs off before restarting it again.

Start with:

bash
kubectl get pod <pod-name> -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --previous

Use --previous because the current container may have already restarted.

Check termination reason:

bash
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.containerStatuses[*].lastState.terminated.reason}'

Common causes:

Cause Signal Fix direction
App exception on startup Stack trace in logs Fix app/config/dependency
Missing env/Secret/ConfigMap Config error in logs or Events Fix mounted/env config
Wrong command/args Process exits immediately Fix container command
Liveness too aggressive Restarts soon after start Tune probes/add startupProbe
OOMKilled Termination reason OOMKilled Increase memory or fix memory use
Permission denied Logs show file/user issue Fix image user, volume permissions, securityContext
Port conflict/app bind issue Logs show bind failure Fix port/config
Dependency unavailable Readiness/liveness confusion Keep dependency checks out of liveness

Probe-related command:

bash
kubectl describe pod <pod-name> -n <namespace> | grep -A5 -i probe

Resource check:

bash
kubectl top pod <pod-name> -n <namespace>
kubectl describe pod <pod-name> -n <namespace> | grep -A20 -i resources

Config check:

bash
kubectl get cm -n <namespace>
kubectl get secret -n <namespace>
kubectl describe pod <pod-name> -n <namespace>

Important interview distinctions:

Symptom Meaning
CrashLoopBackOff Process starts but exits repeatedly
ImagePullBackOff Image cannot be pulled
CreateContainerConfigError Config reference problem before container starts
RunContainerError Runtime failed to start container
OOMKilled Memory limit exceeded

Do not immediately increase limits without checking logs. If the app crashes due to missing config, more memory will not help.

A strong answer is:

“For CrashLoopBackOff, I check previous logs, describe Events, termination reason, probes, config mounts, and OOMKilled status. I separate app startup failure from probe misconfiguration and resource exhaustion.”

Scenario: Deployment rollout stuck — what do you check?

A Deployment rollout can get stuck when the new ReplicaSet cannot produce Ready Pods.

Start with:

bash
kubectl rollout status deployment/<name> -n <namespace>
kubectl describe deployment <name> -n <namespace>
kubectl get rs -n <namespace>
kubectl get pods -n <namespace> -l app=<label>

Find the new ReplicaSet:

bash
kubectl get rs -n <namespace> --sort-by=.metadata.creationTimestamp

Then inspect new Pods:

bash
kubectl describe pod <new-pod> -n <namespace>
kubectl logs <new-pod> -n <namespace>
kubectl logs <new-pod> -n <namespace> --previous

Common rollout blockers:

Issue Signal Fix direction
New image pull fails ImagePullBackOff Fix tag, registry, pull secret
New Pods crash CrashLoopBackOff Check logs/config/probes
Readiness never passes Ready 0/1, readiness Events Fix readiness endpoint/dependency
No capacity Pending new Pods Add nodes or adjust requests
maxUnavailable: 0 and no surge/capacity Old Pods stay, new Pods cannot schedule Allow surge or add capacity
Progress deadline exceeded ProgressDeadlineExceeded Fix failure or rollback
PVC/volume issue Pending/mount errors Fix storage binding/mount
Admission denied Events/webhook errors Fix policy violation
Bad ConfigMap/Secret Env/mount errors Restore config or rollout new config

Useful Deployment fields:

Field Why it matters
maxSurge Allows extra Pods during rollout
maxUnavailable Controls unavailable Pods
progressDeadlineSeconds Marks rollout as failed after deadline
minReadySeconds Requires Pods to stay ready before progress
readinessProbe Determines whether new Pods receive traffic

Rollback:

bash
kubectl rollout undo deployment/<name> -n <namespace>

Rollback to a specific revision:

bash
kubectl rollout history deployment/<name> -n <namespace>
kubectl rollout undo deployment/<name> -n <namespace> --to-revision=<revision>

Good production response:

  1. Check whether users are impacted
  2. Confirm old Pods are still serving
  3. Inspect new ReplicaSet Pods
  4. Roll back if impact is high
  5. Fix image/config/probe/resource issue
  6. Re-deploy with smaller blast radius if needed

Common interview mistake:

Looking only at the Deployment object and not the new ReplicaSet Pods.

The failing evidence is usually in the new Pods.

A strong answer is:

“For a stuck rollout, I inspect the new ReplicaSet and its Pods. Most failures are readiness, image pull, crash, capacity, or admission issues. If production traffic is at risk, I rollback first, then fix and redeploy safely.”

What should you rehearse before Kubernetes interviews?

Use the final week to practice both concepts and live troubleshooting narration.

Docker/container basics:

  • Image vs container
  • Dockerfile instructions
  • Multi-stage build
  • Compose vs Kubernetes
  • Container runtime, CRI, containerd, CRI-O
  • OCI image, tag vs digest, registry pull errors

Kubernetes architecture:

  • Control plane vs worker node
  • API server request path
  • etcd role and quorum
  • Scheduler decisions
  • Controllers and reconciliation
  • kubelet/runtime responsibilities

Workloads:

  • Pod basics and sidecars
  • Deployment vs ReplicaSet vs Pod
  • StatefulSet vs Deployment
  • DaemonSet, Job, CronJob
  • Pod phases and container states
  • Rollout and rollback commands

Networking:

  • ClusterIP, NodePort, LoadBalancer
  • Service selector and endpoints
  • CoreDNS name formats
  • Ingress vs LoadBalancer
  • Gateway API awareness
  • CNI vs kube-proxy/eBPF
  • NetworkPolicy default allow/deny behavior
  • DNS egress on UDP/TCP 53

Configuration and storage:

  • ConfigMap env vs mounted file behavior
  • Secret security and encryption at rest
  • External secret managers
  • PV, PVC, StorageClass
  • Access modes: RWO, RWX, ROX, RWOP
  • emptyDir, projected volumes, downward API
  • Graceful shutdown and SIGTERM

Resources and scaling:

  • Liveness vs readiness vs startup probes
  • CPU requests/limits and throttling
  • Memory limits and OOMKilled
  • QoS classes
  • HPA prerequisites
  • HPA vs VPA vs Cluster Autoscaler
  • Taints, tolerations, affinity, anti-affinity

Security and platform:

  • RBAC Role vs ClusterRole
  • ServiceAccount and workload identity
  • Namespaces with quotas and policies
  • Pod Security Standards
  • securityContext hardening
  • Admission webhooks and policy engines
  • Helm, Kustomize, and GitOps

Troubleshooting drills:

  • Pod Pending
  • CrashLoopBackOff
  • ImagePullBackOff
  • OOMKilled
  • Service has no endpoints
  • DNS resolution failure
  • Ingress 404/502
  • HPA not scaling
  • Deployment rollout stuck
  • PVC Pending
  • NetworkPolicy blocks app traffic

Must-practice commands:

bash
kubectl get pods -A
kubectl describe pod <pod> -n <ns>
kubectl logs <pod> -n <ns>
kubectl logs <pod> -n <ns> --previous
kubectl get events -n <ns> --sort-by='.lastTimestamp'
kubectl get deploy,rs,pod -n <ns>
kubectl rollout status deployment/<name> -n <ns>
kubectl rollout undo deployment/<name> -n <ns>
kubectl get svc,endpoints,endpointslice -n <ns>
kubectl auth can-i <verb> <resource> -n <ns>
kubectl top pods -n <ns>
kubectl get hpa -n <ns>

Scenario stories to prepare:

  • A rollout failed because readiness never passed
  • A Pod stayed Pending due to PVC or taints
  • HPA did not scale because CPU requests were missing
  • A NetworkPolicy broke DNS
  • A CrashLoop was caused by missing Secret or wrong command
  • A Deployment was fixed safely with rollback
  • A migration from manual YAML to Helm/GitOps improved release safety

Good final close:

“I do not only know Kubernetes objects by definition. I can explain how they reconcile, how traffic flows, how security is enforced, and how to debug failures from Events, logs, rollout status, and resource conditions.”


Pattern cheat sheet (quick reference)

Task Kubernetes approach
Stateless web app Deployment + Service + Ingress
Stable Pod identity StatefulSet + headless Service
Per-node agent DaemonSet
Internal service DNS ClusterIP + CoreDNS
Scale on CPU HPA v2 + metrics-server + requests
Non-secret config ConfigMap volume or env
Credentials Secret or external secrets operator
Block lateral movement NetworkPolicy + default deny egress
Debug scheduling kubectl describe pod Events
Rollback deploy kubectl rollout undo or GitOps revert
Private registry imagePullSecrets + digest pin
Graceful shutdown preStop hook + SIGTERM handling

References

Kubernetes interview prep

On-site prep


Summary

Kubernetes interviews test Pod lifecycle, Service networking, probe semantics, and HPA prerequisites—not the definition of orchestration alone. Deploy a sample app, break it, fix it from Events, and narrate kubectl describe findings aloud. Pair with Docker foundations in Q6–10, AWS for EKS operations, and Git for GitOps delivery.

Deepak Prasad

R&D Engineer

Founder of GoLinuxCloud with more than 15 years of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive …