45+ Kubernetes Interview Questions and Answers 2026

Kubernetes interview questions, kubernetes interview questions and answers, docker and kubernetes interview questions, and advanced kubernetes interview questions show up in DevOps, platform engineering, SRE, cloud developer, and backend loops wherever workloads run in containers. Interviewers expect more than "Kubernetes orchestrates containers"—they probe Pod lifecycle, Service types, probe differences, HPA prerequisites, NetworkPolicy DNS gotchas, and how you debug Pending or CrashLoopBackOff pods under pressure.

Below are 45 questions with elaborate answers; technical sections include a strong answer sample you can say aloud. Pair with computer networks interview questions for TCP/IP, DNS, and Service/Ingress fundamentals, operating system interview questions for process and networking fundamentals, Kafka interview questions for event workloads on Kubernetes, Spring Boot interview questions for experienced developers when Java services deploy to clusters, Azure developer interviews and AWS interview questions for managed Kubernetes (AKS/EKS), Git interview questions for GitOps pipelines, and full stack developer interviews when teams own deployment end to end.

NOTE

Prep target: Deploy a sample app with Deployment + Service + Ingress, add liveness/readiness probes, configure HPA with resource requests, and explain one NetworkPolicy that allows DNS egress on port 53.

Tested on: Ubuntu 25.04 (Plucky Puffin); kernel 6.14.0-37-generic; kubectl v1.36.1 client; Docker 29.2.1.

Interview context and how to prepare

What do Kubernetes interviews actually test?

Kubernetes interviews test whether you can run containerized workloads reliably at scale—not recite every kubectl alias.

Layer	What interviewers probe
Containers	Images, registries, Docker vs runtime
Core objects	Pod, Deployment, ReplicaSet, Service
Networking	ClusterIP, DNS, Ingress, NetworkPolicy
Configuration	ConfigMap, Secret, env vs volume mounts
Operations	Probes, requests/limits, HPA, rollouts
Security	RBAC, ServiceAccount, namespaces
Production	Debugging, observability, GitOps, upgrades

Role	Emphasis
Junior DevOps	kubectl basics, Pod/Deployment YAML
Platform engineer	Cluster add-ons, CNI, admission control
SRE	Incident response, HPA failures, etcd backups
Cloud developer	Deploy to AKS/EKS/GKE, managed services

Expect probe differences, Service types, Pending pod debugging, and HPA with metrics-server in most Kubernetes screens.

Docker and Kubernetes — how are they related in interviews?

Concept	Docker (typical)	Kubernetes
Unit of run	Container on one host	Pod (one or more containers)
Orchestration	docker compose (single host)	Multi-node scheduling, self-healing
Networking	Bridge networks	CNI plugins, Services, kube-proxy/Cilium
Scaling	Manual `docker scale` (limited)	ReplicaSet, HPA, cluster autoscaler
Config	Env files, bind mounts	ConfigMap, Secret, downward API
Runtime	containerd via Docker Engine	containerd/CRI-O (Docker optional)

Kubernetes does not require Docker—it needs a CRI-compatible runtime (containerd, CRI-O). Docker builds images; Kubernetes schedules and operates them.

Docker and kubernetes interview questions often start with "what problem does K8s solve that Docker alone does not?"—answer: desired state reconciliation across many nodes.

What is a typical Kubernetes interview loop?

Round	Duration	Focus
Screening	30 min	Experience, cloud, on-call stories
Fundamentals	45 min	Pods, Deployments, Services, YAML
Docker + K8s	30–45 min	Images, multi-stage builds, compose vs K8s
Scenario / troubleshooting	45–60 min	CrashLoopBackOff, Pending, HPA not scaling
Advanced (senior)	45 min	NetworkPolicy, RBAC, etcd, upgrades
Architecture design	45 min	Multi-tenant cluster, GitOps, DR

LinkedIn scenario-based guides and RisingStack-style lists emphasize spoken troubleshooting narratives—describe commands before typing them.

What is a realistic 4–6 week Kubernetes prep plan?

Week	Focus	Output
1	Docker images, Dockerfile, registries	Push image to local/minikube registry
2	Pods, Deployments, Services, kubectl	Deploy nginx with ClusterIP Service
3	ConfigMap, Secret, probes, resources	App with health checks and requests
4	Ingress, HPA, rolling updates	Scale on CPU with metrics-server
5	RBAC, NetworkPolicy, namespaces	Restrict cross-namespace traffic
6	Mock interviews + CKA-style drills	Timed troubleshooting scenarios

Use minikube, kind, or a cloud free tier—hands-on beats flashcards.

How do junior and advanced Kubernetes expectations differ?

Topic	Junior / mid	Advanced
Objects	Deployment, Service	StatefulSet, DaemonSet, CRDs
Networking	ClusterIP vs NodePort	CNI, NetworkPolicy, service mesh basics
Scaling	`replicas: 3`	HPA behavior, VPA trade-offs, cluster autoscaler
Security	Namespaces	RBAC least privilege, Pod Security Standards
Ops	`kubectl get/describe/logs`	Control plane components, etcd backup
Delivery	Manual apply	Helm, Kustomize, Argo CD / Flux GitOps

Advanced kubernetes interview questions focus on failure modes: metrics-server down during traffic spike, NetworkPolicy blocking DNS, readiness passing while app is broken.

Docker and container foundations

Docker image vs container — what is the difference?

A Docker image is the packaged, immutable artifact. A container is a running instance of that image.

Image	Container
Immutable template	Running instance of an image
Built from a Dockerfile or build system	Created from an image
Made of filesystem layers	Adds a writable layer on top
Stored locally or in a registry	Runs as one or more processes
Identified by tag or digest	Has container ID, PID, network, mounts
Safe to share as release artifact	Usually ephemeral

Images are built in layers. When a container starts, the runtime creates a unified filesystem view from those layers and adds a thin writable layer for runtime changes.

Example:

bash

docker build -t myapp:v1 .
docker run myapp:v1

In Kubernetes, you do not normally create containers manually. You declare an image in the Pod spec, and the kubelet asks the container runtime to pull and start it.

yaml


spec:
  containers:
  - name: api
    image: registry.example.com/myapp:v1

Important interview points:

Image is the artifact you build and push
Container is what actually runs
Container filesystem changes are not permanent unless stored in a volume
Same image can run many containers
Kubernetes schedules Pods, and containers run inside Pods
Container logs, writable layer, and process lifecycle are separate from the image

Common trap:

“A container is not a lightweight VM. It is an isolated process using kernel features like namespaces and cgroups.”

A strong answer is:

“An image is the immutable package made of layers. A container is a running process created from that image with its own writable layer, runtime isolation, environment, network, and mounts.”

Dockerfile basics interviewers expect you to explain?

A Dockerfile describes how to build a container image.

Common instructions:

Instruction	Role
`FROM`	Sets the base image
`WORKDIR`	Sets the working directory
`COPY`	Copies files into the image
`ADD`	Copies files, with extra features like archive extraction
`RUN`	Executes build-time commands
`ENV`	Sets environment variables
`ARG`	Build-time variable
`EXPOSE`	Documents intended port; does not publish it by itself
`USER`	Sets runtime user
`CMD`	Default command/arguments
`ENTRYPOINT`	Main executable for the container

Prefer COPY over ADD unless you specifically need ADD behavior.

A common production pattern is a multi-stage build. Build tools stay in the builder stage, and the final image contains only what is needed to run.

dockerfile


FROM golang:1.22 AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /app .

FROM gcr.io/distroless/static-debian12
COPY --from=build /app /app
USER nonroot:nonroot
ENTRYPOINT ["/app"]

Good Dockerfile practices:

Use multi-stage builds
Keep final image small
Run as non-root where possible
Do not copy secrets into the image
Use .dockerignore
Pin base image versions or digests for production
Put dependency download steps before copying full source to improve cache reuse
Prefer exec form for ENTRYPOINT/CMD
Avoid installing unnecessary packages
Scan images for vulnerabilities in CI

CMD vs ENTRYPOINT interview angle:

Instruction	Meaning
`ENTRYPOINT`	Main executable
`CMD`	Default arguments or default command

Example:

dockerfile


ENTRYPOINT ["/app"]
CMD ["--config", "/etc/app/config.yaml"]

A strong answer is:

“A Dockerfile builds an image layer by layer. For production, I use multi-stage builds, a small runtime image, non-root user, .dockerignore, no embedded secrets, and pinned base images for reproducible Kubernetes deployments.”

Docker Compose vs Kubernetes — when use each?

Docker Compose and Kubernetes both run containerized applications, but they are used at different scales and for different workflows.

Docker Compose	Kubernetes
Best for local development and simple stacks	Best for production orchestration
Uses `compose.yaml`	Uses API objects/manifests
Runs commonly on one Docker host	Runs on one or more cluster nodes
Simple service/network/volume setup	Pods, Deployments, Services, Ingress, RBAC
Easy developer onboarding	Strong rollout, scaling, and policy controls
Limited cluster-level scheduling	Scheduler places workloads on nodes
Good for local dependencies	Good for highly available services

Compose example use:

App + database + Redis locally
Developer laptop setup
Integration test environment
Small internal demo stack

Kubernetes example use:

Production deployment
Rolling updates
Horizontal scaling
Self-healing through controllers
Service discovery through cluster DNS
Config/Secret management
RBAC and admission policies
Multi-node scheduling

Compose is not “bad.” It is excellent for local developer experience. But it does not replace Kubernetes for large production environments with scheduling, rollout strategy, cluster policies, and high availability requirements.

A common workflow:

text


Dockerfile
  → Docker Compose for local dev
  → CI builds image
  → Helm/Kustomize deploys to Kubernetes

Interview trap:

“Do not say Compose is only for beginners. Many professional teams use it for local development even when production is Kubernetes.”

A strong answer is:

“I use Compose for local multi-container development and simple test stacks. I use Kubernetes when I need production orchestration, self-healing, rolling updates, scaling, service discovery, and cluster-level policy.”

What container runtime does Kubernetes use?

Kubernetes does not talk directly to Docker Engine in modern clusters. The kubelet talks to a container runtime through the Container Runtime Interface (CRI).

Common CRI runtimes:

Runtime	Notes
`containerd`	Common default runtime in many Kubernetes distributions
`CRI-O`	Lightweight CRI runtime, common in OpenShift
Docker Engine	Still useful for building/running locally, but not a direct Kubernetes runtime through dockershim

Important history:

Older Kubernetes versions used dockershim to support Docker Engine as a runtime
Dockershim was removed in Kubernetes v1.24
Kubernetes still runs Docker-built images because images follow OCI/container image standards
Docker itself uses containerd internally, but Kubernetes talks to CRI-compatible runtimes

Node-level debugging:

Tool	Talks to	Use
`kubectl`	Kubernetes API server	Normal cluster debugging
`crictl`	CRI runtime	Node-level container/runtime debugging
`ctr`	containerd	Low-level containerd debugging
`docker`	Docker Engine	Local Docker workflows

Example node debugging:

bash


crictl ps
crictl images
crictl logs <container-id>

Interview trap:

“Kubernetes removed Docker” does not mean Docker images stopped working. It means Kubernetes removed the in-tree dockershim runtime integration.

A strong answer is:

“The kubelet uses CRI to talk to runtimes such as containerd or CRI-O. Docker is still widely used to build images, but modern Kubernetes clusters do not depend on Docker Engine through dockershim.”

OCI images and container registries?

OCI stands for Open Container Initiative. OCI specifications help image builders, registries, and runtimes interoperate.

Important terms:

Concept	Meaning
Image	Packaged filesystem, config, and metadata
Registry	Server that stores and distributes images
Repository	Named image path, such as `team/api`
Tag	Human-friendly reference like `v1.2.3`
Digest	Immutable content hash like `sha256:...`
Manifest	Metadata describing image layers/config
Image pull secret	Kubernetes Secret used for private registry credentials

Examples of registries:

Docker Hub
Amazon ECR
Azure Container Registry
Google Artifact Registry
GitHub Container Registry
Harbor
Quay

Tag vs digest is important in production.

Reference	Behavior
`myapp:latest`	Floating tag; can change
`myapp:v1.2.3`	Better, but tag can still be moved
`myapp@sha256:...`	Immutable content reference
`myapp:v1.2.3@sha256:...`	Human-readable tag plus exact digest

Kubernetes private registry example:

yaml


apiVersion: v1
kind: Pod
metadata:
  name: api
spec:
  imagePullSecrets:
  - name: regcred
  containers:
  - name: api
    image: registry.example.com/team/api:v1.2.3

Digest-pinned production example:

yaml


containers:
- name: api
  image: registry.example.com/team/api:v1.2.3@sha256:abc123...

Production interview points:

Avoid latest in production
Use private registry credentials through imagePullSecrets or service account configuration
Pin digests for exact rollbacks and reproducible deploys
Scan images for vulnerabilities
Sign images if supply-chain security is required
Keep SBOM/provenance where required
Use immutable tags or registry policies if supported

Common pull errors:

Error	Likely cause
`ImagePullBackOff`	Pull failed repeatedly
`ErrImagePull`	Initial image pull failed
`unauthorized`	Missing/wrong registry credentials
`manifest unknown`	Image tag/digest does not exist
`x509: certificate signed by unknown authority`	Registry CA trust issue

A strong answer is:

“OCI standards let different tools build, store, and run container images consistently. In production, I push images to a registry, avoid latest, use imagePullSecrets for private registries, and pin digests when I need exact reproducibility and rollback.”

Kubernetes architecture

Explain Kubernetes architecture at a high level.

A Kubernetes cluster has two main parts:

Control plane — decides what should happen
Worker nodes — run application workloads as Pods

Control plane components:

Component	Role
`kube-apiserver`	Front door to the cluster API; all clients and controllers talk to it
`etcd`	Consistent key-value store for cluster state
`kube-scheduler`	Assigns unscheduled Pods to suitable nodes
`kube-controller-manager`	Runs built-in controllers such as Deployment, ReplicaSet, Node, Job
`cloud-controller-manager`	Integrates with cloud APIs for load balancers, routes, volumes, nodes

Worker node components:

Component	Role
`kubelet`	Node agent that ensures Pods and containers run as requested
`kube-proxy`	Implements Service networking rules, unless replaced by an eBPF dataplane
Container runtime	Starts containers through CRI, such as containerd or CRI-O
CNI plugin	Provides Pod networking

High-level flow when you create a Deployment:

User runs kubectl apply
Request goes to kube-apiserver
API server authenticates, authorizes, admits, and stores desired state in etcd
Deployment controller creates/updates ReplicaSets
ReplicaSet controller creates Pods
Scheduler assigns Pods to nodes
Kubelet on each node asks the runtime to start containers
Service/CNI networking makes Pods reachable

Kubernetes is declarative. You define desired state, and controllers continuously reconcile actual state toward that desired state.

Example:

bash

kubectl apply -f deployment.yaml

You are not directly starting containers. You are asking the Kubernetes API to store desired state. Controllers, scheduler, kubelet, CNI, and runtime do the rest.

Good troubleshooting mindset:

Symptom	First layer to check
Object not accepted	API server, validation, admission, RBAC
Pod stuck Pending	Scheduler events, resources, taints, PVC
Pod assigned but not running	Kubelet, runtime, image pull, volume mount
Service not reachable	Service, Endpoints/EndpointSlice, DNS, CNI, kube-proxy/eBPF
Rollout stuck	Deployment, ReplicaSet, Pod events, probes

A strong answer is:

“Kubernetes has a control plane that stores and reconciles desired state, and worker nodes that run Pods. The API server and etcd hold the source of truth, controllers create/update objects, the scheduler places Pods, and kubelets start containers through the runtime.”

What is etcd and why does it matter?

etcd is the strongly consistent key-value store used by Kubernetes to store cluster state.

It stores Kubernetes objects such as:

Pods
Deployments
Services
ConfigMaps
Secrets
Nodes
RBAC objects
Custom resources

Important properties:

Property	Interview point
Source of truth	Kubernetes desired/current API state is persisted in etcd
Consistency	Uses Raft consensus; quorum matters
Backup	etcd snapshots are critical for disaster recovery
Security	Protect with TLS, RBAC, network restrictions, and encryption at rest
Performance	Large clusters need healthy etcd latency and disk I/O
Availability	Losing quorum blocks writes and many control plane operations

If etcd loses quorum:

Existing Pods may keep running
New scheduling may fail
Deployments/rollouts may not progress
API writes may fail
Controllers cannot persist new state

Important Secrets detail:

Kubernetes Secrets are stored in etcd. They are only base64-encoded by default, not automatically safe encryption. For stronger protection, enable encryption at rest and protect etcd backups.

Backup example concept:

bash

ETCDCTL_API=3 etcdctl snapshot save snapshot.db

In managed Kubernetes, the cloud provider usually manages etcd. In self-managed clusters, etcd backup, restore, quorum, TLS, disk performance, and monitoring are platform-critical responsibilities.

Good SRE-level metrics:

etcd leader changes
fsync latency
database size
request latency
quorum/member health
disk space
snapshot success

A strong answer is:

“etcd is Kubernetes’ source of truth. If etcd is unhealthy or loses quorum, the cluster may keep existing workloads running but cannot reliably schedule, update, or persist new state. That is why etcd backup, encryption, quorum, and monitoring are critical.”

Role of the Kubernetes API server?

kube-apiserver is the front door of Kubernetes.

Every major Kubernetes interaction goes through the API server:

kubectl
Controllers
Scheduler
Admission webhooks
Operators
CI/CD systems
GitOps tools

The API server is responsible for:

Stage	Meaning
Authentication	Who are you?
Authorization	Are you allowed to do this?
Admission	Should this request be modified or rejected?
Validation	Is the object valid?
Persistence	Store accepted state in etcd
Watch API	Let controllers watch for changes

Typical request path:

text


kubectl / controller / operator
  → kube-apiserver
  → authentication
  → authorization
  → admission
  → validation
  → etcd

kubectl is only a client. It does not directly create containers or modify nodes. It sends requests to the API server.

Useful interview commands:

bash

kubectl auth can-i create pods -n dev

bash


kubectl create deployment demo-nginx \
  --image=nginx:1.27 \
  --dry-run=client \
  -o yaml

bash

kubectl apply --server-side -f deployment.yaml

Client dry-run is useful for generating manifests. Server-side dry-run is better when you want API server validation and admission behavior without persisting the object.

bash

kubectl apply -f deployment.yaml --dry-run=server

Common API server related failures:

Error	Likely area
`Unauthorized`	Authentication
`Forbidden`	RBAC/authorization
Admission webhook denied	Policy or webhook validation
Object validation error	Invalid manifest field/value
Timeout calling webhook	Webhook service/cert/network issue
API server unavailable	Control plane or network problem

A strong answer is:

“The API server is the only supported gateway for cluster state changes. It authenticates, authorizes, admits, validates, and persists objects to etcd, while clients and controllers interact through its REST/watch API.”

How does the Kubernetes scheduler work?

The Kubernetes scheduler assigns Pods to nodes.

It watches for Pods where spec.nodeName is not set, then selects a suitable node based on resources, constraints, and scheduling policy.

Common scheduling factors:

Factor	Examples
Resource requests	CPU, memory, hugepages, ephemeral storage
Node capacity	Node allocatable resources
Node labels	`nodeSelector`, node affinity
Taints	Pod must have matching tolerations
Pod affinity	Place near certain Pods
Pod anti-affinity	Avoid placing near certain Pods
Topology spread	Spread across zones, nodes, racks
Volumes	PVC binding, zone-specific storage
Runtime needs	GPU, device plugins, local SSD
Scheduling plugins	Custom scoring/filtering behavior

Simple scheduling idea:

text


Filter nodes that cannot run the Pod
→ score remaining nodes
→ bind Pod to selected node

A Pod remains Pending when no suitable node is found or a required dependency is not ready.

Common Pending causes:

Event clue	Meaning
`Insufficient cpu`	Requests do not fit available allocatable CPU
`Insufficient memory`	Requests do not fit available allocatable memory
`had taint ... that the pod didn't tolerate`	Missing toleration
`didn't match Pod's node affinity/selector`	Node label constraint mismatch
`pod has unbound immediate PersistentVolumeClaims`	PVC/storage not bound
`volume node affinity conflict`	Volume is tied to a different zone/node
`Too many pods`	Node pod density limit reached

Best first command:

bash

kubectl describe pod <pod-name> -n <namespace>

Then read the Events section.

Important distinction:

Scheduler chooses a node
Kubelet starts containers after the Pod is assigned
Image pull, volume mount, and container start errors are usually kubelet/runtime phase issues, not scheduler issues

Example:

bash

kubectl get pod -o wide

If NODE is empty, scheduling has not succeeded. If NODE is set but the Pod is not running, check kubelet/runtime/image/volume/probe events.

A strong answer is:

“The scheduler watches unscheduled Pods, filters nodes based on resources and constraints, scores suitable nodes, and binds the Pod. When a Pod is Pending, I check Events for resource, taint, affinity, topology, or PVC-related failures.”

What are Kubernetes controllers?

Kubernetes controllers are control loops. They watch current cluster state, compare it with desired state, and take action to move the system closer to the desired state.

Basic control loop idea:

text


observe current state
→ compare with desired state
→ create/update/delete objects
→ repeat

Common controllers:

Controller	Manages
Deployment	Rolling updates and ReplicaSets for stateless apps
ReplicaSet	Desired number of matching Pods
StatefulSet	Stateful Pods with stable identity and ordered rollout
DaemonSet	One Pod per selected node
Job	Run-to-completion batch work
CronJob	Scheduled Jobs
Node controller	Node health and lifecycle
EndpointSlice controller	Service endpoint tracking
ServiceAccount controller	ServiceAccount-related resources
Garbage collector	Removes dependent objects based on ownerReferences

Deployment example:

You update the Deployment image
Deployment controller creates a new ReplicaSet
New ReplicaSet scales up new Pods
Old ReplicaSet scales down old Pods
Rollout completes if readiness checks pass

Useful rollout commands:

bash

kubectl rollout status deployment/api -n prod

bash

kubectl rollout history deployment/api -n prod

bash

kubectl rollout undo deployment/api -n prod

Important ownership relationship:

text


Deployment
  → ReplicaSet
    → Pods

You normally edit the Deployment, not the ReplicaSet or Pods directly. If you delete a Pod managed by a ReplicaSet, the controller creates a replacement.

Different workload controllers solve different problems:

Workload	Use when
Deployment	Stateless web/API apps
StatefulSet	Databases, brokers, ordered identity
DaemonSet	Node agents, CNI, log collectors
Job	One-time batch task
CronJob	Scheduled batch task

Custom controllers/operators extend the same idea for application-specific resources. For example, an operator can watch a custom resource and reconcile database clusters, certificates, backups, or Helm releases.

A strong answer is:

“Controllers are reconciliation loops. They watch desired and actual state, then create, update, or delete resources to converge. For normal apps, I change the Deployment and let ReplicaSets and Pods be managed by controllers.”

Pods and workload controllers

What is a Pod and why is it the smallest deployable unit?

A Pod is the smallest deployable compute object in Kubernetes.

A Pod wraps one or more containers that are scheduled together on the same node and share some runtime resources.

Containers inside the same Pod share:

Network namespace — one Pod IP; containers communicate using localhost
Storage volumes — containers can mount the same volume
Scheduling — all containers in the Pod run on the same node
Lifecycle — the Pod is created, scheduled, and terminated as one unit

Most application Pods run one main container.

Multi-container Pods are used when containers must work closely together:

Pattern	Example
Sidecar	Service mesh proxy, log shipper
Adapter	Convert app output into standard format
Ambassador	Proxy outbound connections
Init container	Run setup before app container starts

Example Pod:

yaml


apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.27
    ports:
    - containerPort: 80

Important interview point: bare Pods are rarely used for production apps because Pods are ephemeral. If a node fails or a Pod is deleted, you usually want a controller such as a Deployment, StatefulSet, DaemonSet, Job, or CronJob to create replacements.

Common Pod facts:

A Pod gets its own IP address
Containers in the same Pod share that IP
Containers in different Pods communicate through Pod IPs or Services
Pod IP can change when the Pod is recreated
Persistent data should use volumes, not the container writable layer
A Pod is not the same as a container; it is a wrapper around one or more containers

A strong answer is:

“A Pod is Kubernetes’ smallest schedulable unit. It gives one or more containers a shared network namespace, optional shared volumes, and a common lifecycle. For production, I usually manage Pods through controllers, not bare Pod manifests.”

Deployment vs ReplicaSet vs Pod?

A Deployment, ReplicaSet, and Pod are related, but they operate at different levels.

Object	Purpose
Pod	Runs one or more containers
ReplicaSet	Maintains a desired number of matching Pods
Deployment	Manages ReplicaSets and provides rollout/rollback

Ownership chain:

text


Deployment
  → ReplicaSet
    → Pods

You normally create a Deployment, not a ReplicaSet directly.

Example Deployment behavior:

You create a Deployment with replicas: 3
Deployment creates a ReplicaSet
ReplicaSet creates 3 Pods
If a Pod dies, ReplicaSet creates a replacement
If you update the image, Deployment creates a new ReplicaSet
Old ReplicaSet scales down while new ReplicaSet scales up

Common rollout strategies:

Strategy	Behavior
`RollingUpdate`	Gradually replace old Pods with new Pods
`Recreate`	Stop old Pods first, then start new Pods

Rolling update example:

yaml


spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

Meaning:

Field	Meaning
`maxSurge`	Extra Pods allowed above desired replicas during rollout
`maxUnavailable`	Pods allowed to be unavailable during rollout

maxUnavailable: 0 helps keep capacity during rollout, but the rollout may stall if new Pods cannot become Ready. Always combine rollout strategy with correct readiness probes and capacity planning.

Useful commands:

bash


kubectl rollout status deployment/api
kubectl rollout history deployment/api
kubectl rollout undo deployment/api

Common interview mistake: editing or deleting Pods directly. If a Pod is managed by a ReplicaSet, manual Pod changes are temporary. The controller will recreate or replace Pods based on the desired state.

A strong answer is:

“A Pod runs containers, a ReplicaSet keeps the desired number of Pods, and a Deployment manages ReplicaSets for rolling updates and rollback. In normal application deployments, I change the Deployment and let controllers handle Pod replacement.”

When do you use StatefulSet instead of Deployment?

Use a StatefulSet when each Pod needs a stable identity or stable storage.

A StatefulSet provides:

Requirement	What StatefulSet gives
Stable Pod name	`app-0`, `app-1`, `app-2`
Stable network identity	DNS identity through a headless Service
Stable storage	PVC per Pod through `volumeClaimTemplates`
Ordered rollout	Predictable create/update/delete order
Ordered scaling	Scale up/down in ordinal order by default

Example use cases:

Databases
Kafka brokers
ZooKeeper
Elasticsearch/OpenSearch
Redis cluster members
Stateful queue/broker systems

Deployment vs StatefulSet:

Deployment	StatefulSet
Pods are interchangeable	Pods have stable identity
Good for stateless APIs	Good for stateful clustered systems
Any Pod can serve any request	Pod identity may map to data/shard
Uses ReplicaSet	Uses stable ordinals
Storage usually shared/external	Each Pod can have its own PVC

StatefulSet Pods have predictable names:

text


mysql-0
mysql-1
mysql-2

With a headless Service, each Pod can get stable DNS such as:

text

mysql-0.mysql.default.svc.cluster.local
mysql-1.mysql.default.svc.cluster.local

Important storage point: if a StatefulSet Pod is rescheduled, Kubernetes can attach the same PVC to the replacement Pod with the same identity.

Do not use StatefulSet just because an app writes files. Many apps should store state in external databases/object storage and run as stateless Deployments.

A strong answer is:

“I use StatefulSet when Pod identity and storage must be stable across rescheduling, such as databases or brokers. For stateless APIs where replicas are interchangeable, I use Deployment.”

DaemonSet, Job, and CronJob — use cases?

DaemonSet, Job, and CronJob solve different workload problems.

Workload	Use case	Example
DaemonSet	Run a Pod on every selected node	Log agent, node exporter, CNI plugin
Job	Run a task to completion	DB migration, batch export
CronJob	Run Jobs on a schedule	Nightly report, cleanup, certificate check

DaemonSet:

Ensures each selected node runs a copy of a Pod
Adds Pods when nodes are added
Removes Pods when nodes are removed
Common for node-level agents

Examples:

text


Fluent Bit
node-exporter
CNI agents
storage plugins
security agents

Job:

Creates one or more Pods
Retries failed Pods depending on backoffLimit
Completes when the required number of successful completions is reached

CronJob:

Creates Jobs on a schedule
Uses cron syntax
Good for repeated batch tasks

Example CronJob idea:

yaml


apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-cleanup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: cleanup
            image: registry.example.com/cleanup:v1

Important interview points:

Use restartPolicy: OnFailure or Never for Jobs, not Always
Use backoffLimit to control retries
Use concurrencyPolicy for CronJobs when overlapping runs are dangerous
Use ttlSecondsAfterFinished if completed Job cleanup is needed
DaemonSets may need tolerations to run on tainted nodes
Control-plane nodes usually require explicit tolerations if you want DaemonSet Pods there

Common mistake:

Running a database migration as a bare Pod without thinking about retries, restart policy, idempotency, or whether it can run twice.

A strong answer is:

“I use DaemonSet for node-level agents, Job for run-to-completion tasks, and CronJob for scheduled Jobs. For batch work, I think carefully about restart policy, backoff, idempotency, and cleanup.”

Pod lifecycle phases and restart policies?

A Pod has a high-level phase, and each container inside the Pod has a more detailed state.

Pod phases:

Phase	Meaning
`Pending`	Pod accepted, but one or more containers are not running yet
`Running`	Pod assigned to a node and at least one container is running
`Succeeded`	All containers exited successfully and will not restart
`Failed`	At least one container exited in failure and will not restart
`Unknown`	Node state cannot be determined

Container states:

State	Meaning
`Waiting`	Container not running yet
`Running`	Container is running
`Terminated`	Container exited

Common container waiting/terminated reasons:

Reason	Likely cause
`ImagePullBackOff`	Wrong image, missing tag, registry auth, network, CA
`ErrImagePull`	Initial image pull failure
`CrashLoopBackOff`	Process starts then exits repeatedly
`CreateContainerConfigError`	Bad config, missing Secret/ConfigMap
`RunContainerError`	Runtime failed to start container
`OOMKilled`	Container exceeded memory limit
`Completed`	Process finished successfully
`Error`	Process exited with non-zero status

Restart policies:

Policy	Behavior	Common use
`Always`	Restart containers when they exit	Deployments
`OnFailure`	Restart only if non-zero exit	Jobs
`Never`	Do not restart	Debug/batch cases

For Deployments, restartPolicy is normally Always.

Debugging flow:

bash


kubectl get pod <pod-name>
kubectl describe pod <pod-name>
kubectl logs <pod-name>
kubectl logs <pod-name> --previous

How to read symptoms:

Symptom	First check
`Pending`	Events: scheduler, resources, taints, PVC
`ImagePullBackOff`	Image name/tag, pull secret, registry, CA
`CrashLoopBackOff`	App logs, command, env, config, probes
`OOMKilled`	Memory limit, app memory use, node pressure
`Completed` in Deployment	Wrong command for long-running service
Readiness false	App readiness endpoint or dependency

Important nuance: Pod phase alone is not enough. Always check container state, reason, restart count, events, and logs.

A strong answer is:

“I do not stop at the Pod phase. I check container states, reasons, restart count, events, and previous logs. CrashLoopBackOff points to app exit/probe/config issues, while ImagePullBackOff points to registry, image, or credential problems.”

Services, networking, and Ingress

Kubernetes Service types — ClusterIP, NodePort, LoadBalancer?

A Kubernetes Service provides a stable network endpoint for a set of Pods.

Pods are ephemeral and their IPs can change. A Service gives clients a stable DNS name and virtual IP while routing traffic to matching Pods.

Service types:

Type	Access pattern
`ClusterIP`	Internal cluster access only; default type
`NodePort`	Exposes Service on every node at a static port
`LoadBalancer`	Provisions external load balancer when infrastructure supports it
`ExternalName`	Maps Service name to external DNS name

Example Service:

yaml


apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  type: ClusterIP
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080

Important fields:

Field	Meaning
`selector`	Labels used to find backend Pods
`port`	Service port exposed to clients
`targetPort`	Port on the selected Pods
`nodePort`	Static node port for NodePort type
`clusterIP`	Internal virtual IP

Most common use:

Need	Usually use
Service-to-service inside cluster	`ClusterIP`
Temporary simple external access	`NodePort`
Cloud external access	`LoadBalancer`
HTTP/HTTPS host/path routing	Ingress or Gateway API

A Service selector mismatch creates a Service with no backends.

Debug commands:

bash


kubectl get svc api
kubectl get endpoints api
kubectl get endpointslice -l kubernetes.io/service-name=api
kubectl get pods -l app=api

If the Service has no endpoints, check:

Pod labels
Service selector
Pod readiness
Namespace
Target port
EndpointSlice objects

Important note: Services route only to ready endpoints by default. If Pods exist but are not Ready, traffic may not be sent to them.

A strong answer is:

“A Service gives stable networking for ephemeral Pods. I use ClusterIP for internal traffic, LoadBalancer or Ingress/Gateway for external access, and I always verify that Service selectors produce ready endpoints.”

How does DNS work inside Kubernetes?

Kubernetes creates DNS records for Services and Pods. CoreDNS commonly serves these records inside the cluster.

Common Service DNS formats:

Name	Meaning
`api`	Service in the same namespace
`api.dev`	Service in namespace `dev`
`api.dev.svc`	Service under cluster service domain
`api.dev.svc.cluster.local`	Fully qualified service name

Example:

text

my-svc.my-namespace.svc.cluster.local

From a Pod in the same namespace, short name usually works:

bash

curl http://api

From another namespace, use namespace-qualified name:

bash

curl http://api.dev

Headless Service:

yaml


spec:
  clusterIP: None

A headless Service does not get a normal ClusterIP. It can return individual Pod IPs, which is useful for StatefulSets and direct Pod identity.

StatefulSet DNS example:

text

mysql-0.mysql.default.svc.cluster.local

Common DNS troubleshooting:

bash


kubectl get svc -n kube-system kube-dns
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl exec -it <pod> -- nslookup kubernetes.default
kubectl exec -it <pod> -- cat /etc/resolv.conf

Check:

Service exists
Correct namespace
CoreDNS Pods running
NetworkPolicy allows DNS egress
Pod /etc/resolv.conf
Search domains and ndots
Service endpoints exist
CNI connectivity to CoreDNS

NetworkPolicy gotcha:

If egress is restricted, allow DNS to CoreDNS/kube-dns on UDP and TCP 53. DNS can use TCP for large responses or retries, so allowing only UDP may cause intermittent issues.

A strong answer is:

“CoreDNS gives Services stable names inside the cluster. I test short and fully qualified names, check Service endpoints, inspect /etc/resolv.conf, and make sure NetworkPolicy allows DNS egress to kube-dns/CoreDNS on port 53.”

What is Ingress and how does it differ from LoadBalancer?

An Ingress manages HTTP/HTTPS routing from outside the cluster to Services inside the cluster.

Ingress can provide:

Host-based routing
Path-based routing
TLS termination
HTTP routing rules
One external entry point for many Services

An Ingress requires an Ingress controller. Without a controller, the Ingress object exists but does not route traffic.

Common controllers:

NGINX Ingress Controller
Traefik
HAProxy
AWS Load Balancer Controller
GCE/GKE Ingress controller
Azure Application Gateway Ingress Controller

LoadBalancer Service vs Ingress:

LoadBalancer Service	Ingress
Exposes one Service externally	Routes to many Services
Usually L4 TCP/UDP style	L7 HTTP/HTTPS routing
May create one cloud LB per Service	Can consolidate many routes
Simple for one service	Better for host/path routing
No built-in path routing	Supports host/path rules
TLS depends on LB setup	Commonly terminates TLS at controller

Ingress example:

yaml


apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web
spec:
  ingressClassName: nginx
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web
            port:
              number: 80

Common Ingress troubleshooting:

bash


kubectl get ingress
kubectl describe ingress web
kubectl get ingressclass
kubectl get svc web
kubectl get endpoints web
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller

Check:

Ingress controller is installed
ingressClassName matches the controller
DNS points to the external address
TLS Secret exists and matches host
Backend Service exists
Service has ready endpoints
Path type and rewrite annotations are correct
Controller-specific annotations are valid

Modern senior note: Gateway API is the newer Kubernetes API family for more expressive traffic management. Ingress is still widely used, but Gateway API is increasingly relevant for advanced routing and platform/team ownership models.

A strong answer is:

“Ingress is L7 HTTP/HTTPS routing to Services and needs an Ingress controller. I use it when I need host/path routing and TLS termination instead of creating one LoadBalancer per microservice.”

kube-proxy and CNI — networking stack basics?

Kubernetes networking has multiple layers.

Layer	Component	Role
Pod networking	CNI plugin	Assigns Pod IPs and connects Pods across nodes
Service networking	kube-proxy or eBPF dataplane	Implements Service virtual IP/load balancing
DNS	CoreDNS	Resolves Service and Pod DNS names
Policy	NetworkPolicy + CNI support	Controls allowed ingress/egress traffic
Ingress/Gateway	Controller	Handles external L7 traffic

CNI plugin examples:

Calico
Cilium
Flannel
Weave Net
Antrea
AWS VPC CNI
Azure CNI

CNI responsibilities:

Allocate Pod IPs
Set up network interfaces
Configure routes
Enable Pod-to-Pod connectivity
Optionally enforce NetworkPolicy

kube-proxy responsibilities:

Watches Services and EndpointSlices
Programs node networking rules
Sends Service traffic to backend Pods
Common modes include iptables and IPVS

Some CNIs, such as Cilium, can replace kube-proxy functionality with an eBPF dataplane.

Important distinction:

Problem	Likely layer
Pod has no IP	CNI setup issue
Pod-to-Pod fails	CNI routing/policy issue
Service IP fails but Pod IP works	kube-proxy/eBPF/Service endpoints
DNS name fails but Service IP works	CoreDNS or DNS policy
External HTTP route fails	Ingress/Gateway/controller/LB
Policy not enforced	CNI may not support NetworkPolicy

Service mesh note:

A service mesh such as Istio or Linkerd adds proxies around application traffic. Sidecars affect:

Pod CPU/memory requests
Startup/shutdown behavior
Probes
mTLS
Traffic routing
HPA calculations if resources are not set properly

A strong answer is:

“CNI gives Pods network connectivity and IPs. kube-proxy or an eBPF replacement implements Service load balancing. CoreDNS handles names, and NetworkPolicy works only if the CNI enforces it.”

NetworkPolicy — what do interviewers test?

A NetworkPolicy controls allowed ingress and/or egress traffic for selected Pods.

Important baseline:

If no NetworkPolicy selects a Pod, traffic is allowed by default
Once a Pod is selected by an ingress policy, only allowed ingress traffic is permitted
Once a Pod is selected by an egress policy, only allowed egress traffic is permitted
NetworkPolicy requires CNI support; otherwise policies may not be enforced

Common rules:

Scenario	Behavior
No policies in namespace	Traffic allowed by default
Ingress policy selects Pod	Only allowed ingress reaches that Pod
Egress policy selects Pod	Only allowed egress leaves that Pod
`policyTypes: [Ingress]`	Only ingress restricted
`policyTypes: [Egress]`	Only egress restricted
Empty ingress list	Deny all ingress to selected Pods
Empty egress list	Deny all egress from selected Pods
Multiple policies apply	Allowed traffic is the union of all policies

Example policy allowing frontend to call API and API to resolve DNS:

yaml


apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-allow
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53

Important DNS gotcha:

If you restrict egress and forget DNS, the app may fail even though Services and Pods are healthy.

Allow:

UDP 53
TCP 53
Correct namespace/pod labels for CoreDNS/kube-dns

Another common interview gotcha: NetworkPolicy is not a firewall for all cluster traffic in every direction automatically. It applies to selected Pods and supported traffic types through the CNI implementation.

Troubleshooting checklist:

bash


kubectl get networkpolicy -n <ns>
kubectl describe networkpolicy <name> -n <ns>
kubectl get pods --show-labels -n <ns>
kubectl get ns --show-labels

Check:

Does the policy select the intended Pods?
Are namespace labels correct?
Are pod labels correct?
Is ingress allowed on the destination?
Is egress allowed from the source?
Is DNS allowed?
Does the CNI support NetworkPolicy?
Are ports matching container/application ports?

Advanced production patterns:

Default deny all ingress
Default deny all egress
Allow DNS explicitly
Allow only required service-to-service flows
Force external egress through gateway/proxy
Separate policies by app/team/namespace

A strong answer is:

“NetworkPolicy starts from default allow, then restricts selected Pods. I verify both destination ingress and source egress, allow DNS explicitly, and confirm the CNI actually enforces NetworkPolicy.”

Configuration, secrets, and storage

ConfigMap — how and when to use it?

A ConfigMap stores non-sensitive configuration separately from the container image.

Use ConfigMap for:

Feature flags
Log levels
App settings
Config files
Non-secret URLs
Runtime toggles

Do not store passwords, API keys, tokens, or certificates in ConfigMaps. Use Secrets or an external secret manager for sensitive values.

Example:

yaml


apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  LOG_LEVEL: info
  app.properties: |
    cache.ttl=300
    feature.newCheckout=true

Common ways to consume ConfigMap:

Method	Best use
Environment variable	Simple key/value settings
Volume mount	Config files read from disk
Command args	Startup flags
Projected volume	Combine ConfigMap with Secret/downward API

Environment variable example:

yaml


env:
- name: LOG_LEVEL
  valueFrom:
    configMapKeyRef:
      name: app-config
      key: LOG_LEVEL

Volume mount example:

yaml


volumes:
- name: config
  configMap:
    name: app-config

containers:
- name: app
  image: example/app:v1
  volumeMounts:
  - name: config
    mountPath: /etc/app

Important update behavior:

Usage	What happens when ConfigMap changes?
Env var	Existing Pod does not automatically get new value
Volume mount	File content updates eventually
`subPath` mount	Does not receive automatic updates
App config reload	App must watch/reload file or Pod must restart

In production, many teams trigger a Deployment rollout when ConfigMap changes so Pods restart with known config.

Common rollout pattern:

bash

kubectl rollout restart deployment/api

Interview mistake: saying “ConfigMap update immediately changes the app.” The mounted file may update eventually, but the application must re-read it. Environment variables require a new Pod.

A strong answer is:

“I use ConfigMap for non-secret runtime configuration. Env vars are simple but require Pod restart to change; mounted files can update eventually, but the app must reload them. For predictable production releases, I often trigger a rollout when config changes.”

Secrets — storage, mounting, and security?

A Kubernetes Secret stores sensitive data such as passwords, tokens, private keys, and certificates.

Example:

yaml


apiVersion: v1
kind: Secret
metadata:
  name: db-creds
type: Opaque
stringData:
  username: app
  password: changeme

Important security point:

Base64 is encoding, not encryption.

Secrets are stored as Kubernetes API objects. Depending on cluster configuration, they may be stored in etcd in a form that needs encryption at rest for stronger protection.

Good practices:

Practice	Why it matters
Enable encryption at rest	Protect Secret data in etcd
Restrict RBAC	Least privilege for `get/list/watch secrets`
Use external secret managers	Centralized rotation/audit
Avoid committing Secrets to Git	Prevent credential leakage
Prefer short-lived credentials	Reduce blast radius
Treat etcd backups as sensitive	Backups may contain Secret data
Rotate secrets	Handle leaks and lifecycle
Limit mounted keys	Expose only what the container needs

Secret as environment variable:

yaml


env:
- name: DB_PASSWORD
  valueFrom:
    secretKeyRef:
      name: db-creds
      key: password

Secret as volume:

yaml


volumes:
- name: db-creds
  secret:
    secretName: db-creds

containers:
- name: app
  volumeMounts:
  - name: db-creds
    mountPath: /etc/secrets
    readOnly: true

Volume mounts are often preferred for secrets because:

File permissions can be controlled
Some apps can reload files
They avoid putting secrets directly into environment variables
Rotation can be easier than env-based secrets

But do not overstate it: mounted Secrets can still be read by any process with filesystem access inside the container.

External secret options:

HashiCorp Vault
AWS Secrets Manager
Azure Key Vault
Google Secret Manager
External Secrets Operator
Secrets Store CSI Driver
Sealed Secrets for GitOps workflows

Common interview mistake: saying Kubernetes Secrets are secure because they are base64-encoded. They need RBAC, encryption at rest, secure backup handling, and secret rotation strategy.

A strong answer is:

“Kubernetes Secrets are sensitive API objects, not magically encrypted values. I restrict RBAC, enable encryption at rest, avoid Git commits, prefer external secret managers for production, and mount only the secret keys a container actually needs.”

PersistentVolume, PersistentVolumeClaim, and StorageClass?

Kubernetes separates storage into three main objects.

Object	Role
PersistentVolume	Actual cluster storage resource
PersistentVolumeClaim	User/application request for storage
StorageClass	Dynamic provisioning template

A PV is the storage resource. It may be backed by NFS, EBS, Azure Disk, Ceph, vSphere, local storage, or another CSI driver.

A PVC is the application’s request for storage.

A StorageClass tells Kubernetes how to dynamically provision storage.

PVC example:

yaml


apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 10Gi

Access modes:

Access mode	Meaning
`ReadWriteOnce`	Mounted read-write by one node
`ReadOnlyMany`	Mounted read-only by many nodes
`ReadWriteMany`	Mounted read-write by many nodes
`ReadWriteOncePod`	Mounted read-write by only one Pod

Common mapping:

Workload	Typical storage
Single database Pod	RWO block volume
Shared uploads	RWX filesystem such as NFS/CephFS/EFS
StatefulSet database	PVC per Pod
Cache/temp data	`emptyDir`, not PVC
Logs	Usually stdout + log pipeline, not PVC

Dynamic provisioning flow:

text


Pod references PVC
→ PVC references StorageClass
→ CSI provisioner creates PV
→ PVC binds to PV
→ Pod mounts volume

If a Pod stays Pending, check PVC status:

bash


kubectl get pvc
kubectl describe pvc data
kubectl describe pod <pod-name>

Common storage problems:

Symptom	Likely cause
PVC Pending	No matching PV or provisioner issue
Pod Pending	PVC not bound
Multi-attach error	RWO volume mounted on another node
Volume node affinity conflict	Volume tied to different zone
Permission denied	Filesystem ownership/security context
Data deleted unexpectedly	Reclaim policy or PVC deletion

Reclaim policy matters:

Policy	Behavior
`Delete`	Delete backing storage when PVC/PV is deleted
`Retain`	Keep backing storage for manual recovery

StatefulSets commonly use volumeClaimTemplates so each Pod gets its own PVC.

A strong answer is:

“A PVC is the app’s storage request, a PV is the actual storage, and a StorageClass defines dynamic provisioning. I match access mode and reclaim policy to the workload, and I debug Pending Pods by checking PVC binding and storage events.”

emptyDir, projected volumes, and downward API?

Kubernetes has several volume types for non-persistent or metadata-driven use cases.

Volume type	Use
`emptyDir`	Temporary storage shared by containers in one Pod
`projected`	Combine ConfigMap, Secret, downwardAPI, and service account token
downward API	Expose Pod/container metadata to the container
ConfigMap volume	Mount non-secret config files
Secret volume	Mount sensitive files

emptyDir is created when a Pod is assigned to a node and deleted when the Pod is removed from that node.

Use emptyDir for:

Scratch files
Shared files between app and sidecar
Temporary processing
Buffering
Logs consumed by sidecar

Example:

yaml


volumes:
- name: shared-logs
  emptyDir: {}

containers:
- name: app
  volumeMounts:
  - name: shared-logs
    mountPath: /var/log/app
- name: log-shipper
  volumeMounts:
  - name: shared-logs
    mountPath: /logs

Downward API exposes Pod metadata without requiring the app to call the Kubernetes API.

Environment variable example:

yaml


env:
- name: POD_NAME
  valueFrom:
    fieldRef:
      fieldPath: metadata.name
- name: POD_NAMESPACE
  valueFrom:
    fieldRef:
      fieldPath: metadata.namespace

Resource field example:

yaml


env:
- name: CPU_REQUEST
  valueFrom:
    resourceFieldRef:
      resource: requests.cpu

Projected volume example:

yaml


volumes:
- name: app-projected
  projected:
    sources:
    - configMap:
        name: app-config
    - secret:
        name: app-secret
    - downwardAPI:
        items:
        - path: pod-name
          fieldRef:
            fieldPath: metadata.name

Common interview distinctions:

Need	Use
Temporary shared files	`emptyDir`
Pod name/namespace/labels	Downward API
Combine multiple config sources	Projected volume
Persistent data	PVC
Sensitive config	Secret

A strong answer is:

“I use emptyDir for temporary files shared inside one Pod, projected volumes to combine config sources, and downward API when the app needs its own Pod metadata without calling the Kubernetes API.”

How does Kubernetes support 12-factor app configuration?

The 12-factor app model says configuration should be separated from code and injected at runtime.

Kubernetes supports this with:

12-factor idea	Kubernetes mechanism
Config separated from code	ConfigMap and Secret
Backing services as attached resources	Service DNS, credentials, ExternalName
Port binding	Container port + Service
Processes	One main process per container
Concurrency	Replicas and HPA
Disposability	Fast startup and graceful shutdown
Logs	stdout/stderr collected by platform
Dev/prod parity	Same image, different config

Config examples:

Config type	Kubernetes object
Log level	ConfigMap
Feature flag	ConfigMap
DB password	Secret
API URL	ConfigMap
TLS key/cert	Secret
Runtime environment	ConfigMap/fieldRef

Important shutdown behavior:

Pod deletion or rollout begins
Endpoint is removed when readiness fails/Pod terminates
Container receives SIGTERM
Kubernetes waits terminationGracePeriodSeconds
If still running, container receives SIGKILL

Default grace period is commonly 30 seconds unless changed.

Example:

yaml

terminationGracePeriodSeconds: 45

App responsibilities:

Handle SIGTERM
Stop accepting new requests
Finish or cancel in-flight work
Close DB connections cleanly
Flush logs/metrics
Exit before grace period ends

Common production additions:

Readiness probe for traffic gating
PreStop hook only when truly needed
Config rollout automation
Separate config per environment
Secret rotation plan
Avoid baking environment-specific values into images

Good interview distinction:

Build once, deploy many times with different runtime config.

A strong answer is:

“Kubernetes supports 12-factor apps by keeping config outside the image through ConfigMaps and Secrets, scaling with replicas/HPA, logging to stdout, and relying on fast startup plus graceful SIGTERM handling during rollouts.”

Probes, resources, and autoscaling

Liveness, readiness, and startup probes — differences?

Kubernetes probes tell the kubelet how healthy a container is and whether it should receive traffic.

Probe	Purpose	Failure action
Startup probe	Has the app finished starting?	Container is killed if startup probe keeps failing
Liveness probe	Is the app alive, not deadlocked?	Container is restarted
Readiness probe	Can the app serve traffic now?	Pod is removed from Service endpoints

Example:

yaml


startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 10

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  periodSeconds: 5

How to design endpoints:

Endpoint	Should check
Startup	App initialization complete
Liveness	Process/event loop not deadlocked
Readiness	App can safely receive traffic
Deep health	Dependencies, DB, cache, downstreams

Important trap: do not put downstream database checks in liveness unless the app truly cannot recover without restart. If the DB blips, liveness may restart every Pod and make the outage worse.

Better approach:

Liveness: shallow process health
Readiness: dependency readiness
Startup: slow initialization protection

Common probe mistakes:

Mistake	Impact
Liveness too aggressive	Crash loops during slow startup
Readiness missing	Traffic sent before app is ready
Liveness checks DB	Restart storm during dependency outage
Timeout too low	False failures under load
Same endpoint for all probes	Wrong failure behavior
No startup probe for slow app	Killed before startup completes

Troubleshooting:

bash


kubectl describe pod <pod-name>
kubectl logs <pod-name>
kubectl logs <pod-name> --previous

Look for Events such as:

text


Liveness probe failed
Readiness probe failed
Startup probe failed

A strong answer is:

“Startup protects slow boots, readiness controls whether a Pod receives Service traffic, and liveness restarts only truly unhealthy containers. I keep liveness shallow and put dependency checks in readiness.”

Resource requests and limits?

Resource requests and limits control scheduling and runtime resource behavior.

Field	Meaning
CPU request	CPU amount used for scheduling and guaranteed share
Memory request	Memory amount used for scheduling
CPU limit	Maximum CPU allowed; may cause throttling
Memory limit	Maximum memory allowed; can cause OOM kill

Example:

yaml


resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 256Mi

CPU vs memory:

Resource	Behavior
CPU	Compressible; container can be throttled
Memory	Incompressible; container can be OOMKilled
Ephemeral storage	Can trigger eviction if overused
HugePages	Must be requested/limited explicitly where used

Requests matter because the scheduler uses them to decide where a Pod can fit.

text


Pod requests
→ scheduler checks node allocatable
→ Pod assigned only if resources fit

Limits matter because the runtime enforces them.

Common issues:

Symptom	Possible cause
Pod Pending	Requests too high for available nodes
CPU throttling	CPU limit too low
OOMKilled	Memory limit too low or app leak/spike
HPA not scaling on CPU	Missing CPU requests
Node pressure eviction	Requests too low or node overcommitted
Sidecar consumes resources	Sidecar missing requests/limits

Quality of Service classes:

QoS	Condition
Guaranteed	Every container has equal request and limit for CPU/memory
Burstable	Some requests/limits set, but not Guaranteed
BestEffort	No CPU/memory requests or limits

Production guidance:

Set requests for every container, including sidecars
Set memory limits carefully
Avoid very low CPU limits for latency-sensitive apps
Use metrics to tune requests
Watch OOMKills, throttling, and node pressure
Remember HPA resource utilization depends on requests

A strong answer is:

“Requests are for scheduling and baseline capacity; limits are runtime caps. CPU over limit is throttled, memory over limit is killed. I set requests for every container because scheduling and HPA depend on them.”

Horizontal Pod Autoscaler (HPA) — how does it work?

The Horizontal Pod Autoscaler adjusts replica count based on metrics.

It can scale workloads such as Deployments, ReplicaSets, and StatefulSets through the scale subresource.

Example:

yaml


apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

How CPU utilization HPA works conceptually:

text


current CPU usage / requested CPU
→ compare with target utilization
→ calculate desired replicas
→ scale target up/down

Prerequisites:

Requirement	Why
metrics-server	Provides CPU/memory metrics
CPU/memory requests	Needed for utilization-based resource metrics
Scale target	Deployment/StatefulSet/etc. must support scaling
Enough cluster capacity	New Pods must schedule
Reasonable readiness/startup	Avoid unstable scaling during startup

Common HPA metrics:

Metric type	Example
Resource	CPU, memory
Pods	Requests per second per Pod
Object	Queue depth
External	Cloud metric, Kafka lag, custom adapter metric

Common troubleshooting:

bash


kubectl get hpa
kubectl describe hpa api
kubectl top pods
kubectl top nodes
kubectl get apiservice | grep metrics

Failure examples:

Symptom	Likely cause
`<unknown>` metric	metrics-server/custom metrics issue
HPA not scaling on CPU	CPU requests missing
Scaled up but Pods Pending	Cluster lacks capacity
Scaled up but traffic still slow	App startup slow or readiness failing
Too much scaling up/down	Metric noisy or behavior not tuned
Sidecar skews metrics	Scale based on container metric or custom metric

If metrics are missing, HPA cannot make normal scaling decisions from those metrics. Existing replicas continue running, but scale decisions may be skipped or degraded.

Senior point: HPA is reactive. It scales after metrics show load. For sudden spikes, use enough baseline replicas, fast startup, queue-based scaling, predictive scaling, or KEDA/custom metrics where appropriate.

A strong answer is:

“HPA watches metrics and changes replica count between min and max. CPU utilization scaling depends on container requests and metrics-server. I debug HPA by checking conditions, metrics availability, requests, pending Pods, and whether the metric actually represents user load.”

HPA vs VPA — interview trade-offs?

HPA and VPA solve different scaling problems.

Autoscaler	Scales	Best for
HPA	Number of Pods	Traffic/load changes
VPA	CPU/memory requests	Right-sizing workloads
Cluster Autoscaler	Number of nodes	Unschedulable Pods due to lack of capacity

HPA example:

text


More traffic
→ CPU/RPS/queue metric rises
→ HPA adds replicas

VPA example:

text

Pod consistently uses more memory than requested
→ VPA recommends or updates larger request

Cluster Autoscaler example:

text


HPA creates more Pods
→ Pods remain Pending
→ Cluster Autoscaler adds nodes

Trade-offs:

Topic	HPA	VPA
Works well for stateless apps	Yes	Helps tune resources
Handles traffic spikes	Yes	No, not directly
Changes Pod count	Yes	No
Changes request size	No	Yes
May need custom metrics	Often	Usually not
Can restart Pods	No direct restart for scaling	May evict/recreate Pods depending mode

Important conflict:

Do not blindly run HPA and VPA on the same CPU/memory signal for the same workload. HPA uses utilization relative to requests. If VPA changes requests while HPA scales on CPU utilization, the two controllers can fight or create confusing behavior.

Common best practice:

HPA for scaling replicas on traffic/load metrics
VPA in recommendation mode for right-sizing
VPA for workloads not horizontally scalable
Cluster Autoscaler for node capacity
Custom metrics for queue/RPS/latency-driven scaling

Good examples:

Workload	Better scaling
Stateless API	HPA on CPU/RPS/custom metric
Batch worker	HPA/KEDA on queue depth
Database	VPA recommendations/manual tuning
Memory-heavy singleton	VPA or manual sizing
Web app with sidecar	Container metric or custom metric

A strong answer is:

“HPA adds or removes Pods, VPA adjusts resource requests, and Cluster Autoscaler adds nodes. I avoid letting HPA and VPA fight over the same CPU utilization signal and usually use VPA recommendation mode for HPA-managed apps.”

Taints, tolerations, and node affinity?

Taints, tolerations, and affinity control where Pods run.

Simple rule:

Feature	Meaning
Taint	Node repels Pods
Toleration	Pod is allowed to tolerate a taint
Node affinity	Pod prefers or requires nodes with labels
Pod affinity	Pod prefers/requires running near other Pods
Pod anti-affinity	Pod prefers/requires avoiding other Pods

Taint example:

bash

kubectl taint nodes node1 dedicated=gpu:NoSchedule

Toleration example:

yaml


tolerations:
- key: dedicated
  operator: Equal
  value: gpu
  effect: NoSchedule

Taint effects:

Effect	Behavior
`NoSchedule`	Do not schedule new Pods unless tolerated
`PreferNoSchedule`	Try to avoid scheduling
`NoExecute`	Evict existing Pods that do not tolerate the taint

Important distinction:

A toleration does not force a Pod onto a node
It only allows the Pod to be scheduled there
Use node affinity/nodeSelector to attract the Pod to the desired node

Example dedicated GPU nodes:

yaml


tolerations:
- key: dedicated
  operator: Equal
  value: gpu
  effect: NoSchedule

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node-type
          operator: In
          values:
          - gpu

Node affinity examples:

Need	Use
Run only on SSD nodes	Required node affinity
Prefer same zone as dependency	Preferred node affinity
Spread replicas across nodes/zones	Pod anti-affinity or topology spread
Reserve nodes for platform agents	Taints/tolerations
Keep all replicas off one node	Pod anti-affinity

Common Pending event:

text

node(s) had taint {dedicated: gpu}, that the pod didn't tolerate

Troubleshooting:

bash


kubectl describe pod <pod-name>
kubectl describe node <node-name>
kubectl get nodes --show-labels

Check:

Node taints
Pod tolerations
Node labels
nodeSelector/affinity
Required vs preferred rules
Resource requests
PVC zone constraints
Topology spread constraints

Production pattern:

Taint special nodes to reserve them
Add tolerations only to approved workloads
Add node affinity so workloads actually target those nodes
Use pod anti-affinity/topology spread to avoid placing all replicas on one failure domain

A strong answer is:

“Taints repel Pods, tolerations allow Pods onto tainted nodes, and affinity attracts Pods to labeled nodes or spreads them relative to other Pods. For dedicated nodes, I usually combine taints, tolerations, and node affinity.”

Security, RBAC, and multi-tenancy

Kubernetes RBAC — Roles, ClusterRoles, Bindings?

Kubernetes RBAC controls who can perform which actions on which API resources.

There are four main RBAC objects:

Object	Scope	Purpose
`Role`	Namespace	Defines permissions inside one namespace
`ClusterRole`	Cluster-wide	Defines permissions for cluster-scoped resources or reusable namespaced rules
`RoleBinding`	Namespace	Grants a Role or ClusterRole to subjects in one namespace
`ClusterRoleBinding`	Cluster-wide	Grants a ClusterRole across the cluster

A Role defines permissions, but it does not grant them until a binding attaches it to a subject.

Example Role:

yaml


apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: dev
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]

Example RoleBinding:

yaml


apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: dev
  name: read-pods
subjects:
- kind: ServiceAccount
  name: app-reader
  namespace: dev
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Important interview details:

Need	Better choice
App reads Pods in one namespace	Role + RoleBinding
CI deploys one app namespace	Role + RoleBinding with limited verbs
Read Nodes or PersistentVolumes	ClusterRole
Grant cluster admin rights	ClusterRoleBinding, only for trusted admins
Reuse same permission set in many namespaces	ClusterRole + RoleBinding per namespace

Common verbs:

text

get, list, watch, create, update, patch, delete

Good debugging commands:

bash


kubectl auth can-i get pods -n dev
kubectl auth can-i create deployments -n dev --as=system:serviceaccount:dev:ci-deployer

Least privilege examples:

CI deploy ServiceAccount can update Deployments in one namespace
App ServiceAccount can read only required ConfigMaps or custom resources
Monitoring can list/watch metrics-related resources
Avoid giving app workloads list secrets unless truly required
Avoid cluster-admin for pipelines and applications

Common mistake:

Giving a namespace deploy pipeline cluster-admin because one permission was missing.

Better: identify the exact API group, resource, namespace, and verb needed.

A strong answer is:

“RBAC separates permission definition from permission binding. I use Roles for namespace-scoped access, ClusterRoles for cluster-wide or reusable rules, and bind them to users/groups/ServiceAccounts with least privilege.”

ServiceAccount and Pod identity?

A ServiceAccount gives a Pod an identity inside Kubernetes.

Every Pod runs as a ServiceAccount. If you do not specify one, Kubernetes uses the namespace’s default ServiceAccount.

Example:

yaml


apiVersion: v1
kind: ServiceAccount
metadata:
  name: api-sa
  namespace: prod

Pod using a ServiceAccount:

yaml


apiVersion: v1
kind: Pod
metadata:
  name: api
spec:
  serviceAccountName: api-sa
  containers:
  - name: api
    image: registry.example.com/api:v1

Why ServiceAccounts matter:

Topic	Explanation
Kubernetes API access	Pod can authenticate to the API server
RBAC	Permissions are granted to the ServiceAccount
Token projection	Tokens can be mounted into Pods
Cloud IAM integration	Map Pod identity to cloud identity
Auditability	API actions can be traced to workload identity

Older clusters often used long-lived ServiceAccount tokens. Modern Kubernetes favors projected, bounded, expiring tokens.

Good production practices:

Create a dedicated ServiceAccount per app/workload
Avoid using the default ServiceAccount for production apps
Bind only required RBAC permissions
Disable token mounting if the app does not need Kubernetes API access
Use cloud workload identity instead of static cloud keys in Secrets
Rotate and audit credentials

Disable automount when not needed:

yaml


apiVersion: v1
kind: ServiceAccount
metadata:
  name: no-api-access
automountServiceAccountToken: false

Pod-level override:

yaml


spec:
  automountServiceAccountToken: false

Cloud identity examples:

Cloud	Common mechanism
AWS	IRSA / EKS Pod Identity
GCP	Workload Identity
Azure	Workload Identity / managed identity integration

Interview trap:

A Kubernetes ServiceAccount is not the same as a Linux user inside the container.

The ServiceAccount controls Kubernetes API identity. Linux user settings are controlled by container image and securityContext.

A strong answer is:

“A Pod runs as a ServiceAccount for Kubernetes API identity. I use one dedicated ServiceAccount per workload, bind minimal RBAC, avoid default accounts, and prefer cloud workload identity over long-lived cloud keys stored in Secrets.”

Namespaces — isolation and organization?

Namespaces divide a Kubernetes cluster into logical scopes.

Use namespaces for:

Teams
Applications
Environments
Tenants
System components
Blast-radius control

Examples:

text


dev
staging
prod
team-payments
team-platform
ingress-nginx
monitoring

Namespaces provide namespacing for many Kubernetes objects:

Namespaced objects	Cluster-scoped objects
Pods	Nodes
Services	PersistentVolumes
Deployments	StorageClasses
ConfigMaps	ClusterRoles
Secrets	ClusterRoleBindings
PVCs	CustomResourceDefinitions
Roles	Namespaces

Important point:

Namespaces are not complete security boundaries by themselves.

For real isolation, combine namespaces with:

Control	Why
RBAC	Who can access objects
ResourceQuota	Limit resource consumption
LimitRange	Default/min/max requests and limits
NetworkPolicy	Restrict traffic between namespaces/apps
Pod Security Admission	Enforce Pod hardening
Admission policies	Enforce org standards
Separate node pools	Stronger workload separation
Separate clusters	Strongest isolation for high-risk tenants

ResourceQuota example idea:

yaml


apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-payments
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    pods: "50"

LimitRange example use:

Default CPU/memory requests
Default CPU/memory limits
Minimum/maximum container resources
Prevent BestEffort Pods in shared namespaces

Good namespace practices:

Avoid running production workloads in default
Use consistent labels
Apply quotas for shared clusters
Apply NetworkPolicies for sensitive namespaces
Use separate namespaces for platform components
Keep environment separation clear

Common mistake:

Creating dev, staging, and prod namespaces but giving everyone cluster-admin and allowing all network traffic.

That is organization, not secure multi-tenancy.

A strong answer is:

“Namespaces scope names and organize resources, but they are not hard security boundaries alone. I pair them with RBAC, quotas, NetworkPolicy, Pod Security Admission, and sometimes separate node pools or clusters.”

Pod Security Standards and security contexts?

Kubernetes Pod Security Standards define three policy levels:

Level	Meaning
`privileged`	Unrestricted; trusted system workloads only
`baseline`	Prevents known privilege escalation patterns
`restricted`	Stronger hardening for security-sensitive workloads

Pod Security Admission can enforce, warn, or audit these standards through namespace labels.

Example namespace labels:

yaml


metadata:
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

A securityContext controls Linux/security settings for Pods and containers.

Example hardened container:

yaml


securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop: ["ALL"]
  seccompProfile:
    type: RuntimeDefault

Common hardening controls:

Control	Why
`runAsNonRoot`	Avoid root inside container
`runAsUser`	Run as specific UID
`allowPrivilegeEscalation: false`	Prevent privilege escalation
`readOnlyRootFilesystem: true`	Reduce writable attack surface
Drop capabilities	Remove unnecessary Linux privileges
`seccompProfile: RuntimeDefault`	Restrict syscalls
AppArmor/SELinux	Extra MAC enforcement where supported
Avoid privileged Pods	Prevent broad host access
Avoid hostPath	Prevent direct host filesystem exposure
Avoid hostNetwork/hostPID	Reduce node-level exposure

Pod-level example:

yaml


spec:
  securityContext:
    runAsNonRoot: true
    fsGroup: 2000

Container-level settings override or complement Pod-level settings.

Interview caveat:

Some workloads need exceptions, such as CNI, CSI, monitoring agents, or node-level security agents. Exceptions should be isolated and reviewed.

Common production approach:

Enforce restricted for app namespaces
Allow baseline or controlled exceptions for platform namespaces
Use admission policies to block privileged containers
Scan images and manifests in CI
Avoid root containers unless justified

A strong answer is:

“I harden Pods with non-root users, dropped capabilities, no privilege escalation, read-only root filesystems, and RuntimeDefault seccomp. Pod Security Standards help enforce these controls consistently at namespace level.”

Admission controllers and policy engines?

Admission control runs after authentication and authorization but before an object is persisted.

Request flow:

text


request
→ authentication
→ authorization
→ admission
→ validation
→ etcd

Admission can:

Mutate an object
Validate an object
Reject an object
Apply defaults
Enforce organization policy

Webhook types:

Type	Behavior
Mutating admission webhook	Changes the object before storage
Validating admission webhook	Allows or rejects the object
ValidatingAdmissionPolicy	Built-in declarative validation without external webhook callouts

Examples of admission policy:

Require resource requests/limits
Block :latest image tag
Require approved registries
Require labels/annotations
Enforce non-root containers
Require probes
Restrict hostPath
Block privileged containers
Add sidecars or default labels
Enforce image signature policy through external tools

Common policy engines:

Tool	Style
OPA Gatekeeper	Rego-based policy
Kyverno	Kubernetes/YAML-native policy
ValidatingAdmissionPolicy	Kubernetes-native CEL validation
Custom webhooks	Application/platform-specific logic

Example interview scenario:

“A developer applies a Pod without resource requests. CI missed it. Admission policy rejects it before etcd stores it.”

Good practices:

Test policies in audit/warn mode before enforce
Avoid slow or unreliable webhooks
Configure failure policy intentionally
Monitor webhook latency and errors
Keep policies version controlled
Use CI policy checks before admission rejection
Avoid mutating too much invisibly
Document exceptions

Common failure:

text

failed calling webhook

Check:

bash


kubectl get validatingwebhookconfiguration
kubectl get mutatingwebhookconfiguration
kubectl describe validatingwebhookconfiguration <name>
kubectl get pods -n <webhook-namespace>
kubectl logs -n <webhook-namespace> deploy/<webhook>

A strong answer is:

“Admission controllers enforce policy before objects are stored. I use validating policies to reject unsafe manifests, mutating policies for controlled defaults, and policy-as-code in CI plus admission so bad config is caught before production.”

Advanced topics, troubleshooting, and scenarios

Helm, Kustomize, and GitOps basics?

Helm, Kustomize, and GitOps solve related but different deployment problems.

Tool	Role
Helm	Packages Kubernetes resources as charts with templates and values
Kustomize	Customizes plain YAML using bases, patches, and overlays
Argo CD / Flux	Reconciles cluster state from declared sources such as Git or OCI artifacts

Helm concepts:

Concept	Meaning
Chart	Package of Kubernetes templates/files
Values	User-provided configuration for templates
Release	Installed instance of a chart
Revision	Versioned release history
Rollback	Return a release to a previous revision

Common Helm commands:

bash


helm install api ./chart -n prod
helm upgrade api ./chart -n prod -f values-prod.yaml
helm history api -n prod
helm rollback api 3 -n prod

Kustomize concepts:

text


base/
  deployment.yaml
  service.yaml

overlays/
  dev/
    kustomization.yaml
  prod/
    kustomization.yaml

Example:

bash

kubectl apply -k overlays/prod

GitOps flow:

text


developer merges change
→ CI builds image
→ manifest/chart values updated in Git
→ GitOps controller detects desired state
→ controller reconciles cluster
→ drift is corrected or reported

Good GitOps practices:

Git is source of truth
Avoid manual kubectl edit in production
Use pull requests for environment changes
Keep secrets encrypted or externally referenced
Pin image tags/digests intentionally
Separate app source repo and environment config repo if needed
Use health checks and sync status
Roll back by Git revert or Helm rollback depending workflow

Helm vs Kustomize interview answer:

Use Helm when	Use Kustomize when
You need packaged reusable app charts	You want plain YAML overlays
You need templating and values	You prefer patching existing manifests
Third-party app already ships Helm chart	You manage environment-specific deltas
Release history matters	You want kubectl-native customization

Common senior point:

Helm can be used with GitOps. Argo CD and Flux can deploy Helm charts while still using Git as the desired-state source.

A strong answer is:

“Helm packages and templates applications, Kustomize patches plain manifests, and GitOps controllers reconcile the cluster from version-controlled desired state. In production, I treat Git as source of truth and avoid manual drift.”

Scenario: Pod stuck Pending — how do you debug?

A Pod stuck in Pending usually means it has not been scheduled or a required dependency is not ready.

Start with Events:

bash


kubectl describe pod <pod-name> -n <namespace>
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

Then classify the issue.

Event clue	Likely cause	Fix direction
`Insufficient cpu`	Requests do not fit nodes	Add nodes, reduce request, scale down other workloads
`Insufficient memory`	Memory request too high	Add capacity or tune requests
`had taint ... didn't tolerate`	Missing toleration	Add toleration or use correct node pool
`didn't match node selector`	Label/selector mismatch	Fix node labels or selector
`didn't match node affinity`	Affinity too strict	Relax/fix affinity
`pod has unbound immediate PersistentVolumeClaims`	PVC not bound	Check PVC/StorageClass/PV
`volume node affinity conflict`	Volume in wrong zone	Schedule in same zone or reprovision storage
`Too many pods`	Node Pod density limit	Add nodes or reduce Pods per node

Useful commands:

bash


kubectl get pod <pod-name> -n <namespace> -o wide
kubectl describe pod <pod-name> -n <namespace>
kubectl get nodes --show-labels
kubectl describe node <node-name>
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>
kubectl top nodes

How to reason:

text


NODE column empty
→ scheduling failed
→ read Events
→ classify resource / taint / affinity / volume

Important distinction:

Pending with empty NODE usually points to scheduling or PVC binding
Pod assigned to a node but stuck in image pull or container creation is a kubelet/runtime issue
Image pull failures may briefly show waiting states, then become ImagePullBackOff or ErrImagePull

Common interview mistake:

Guessing resource issue without reading Events.

Better narrative:

Describe Pod
Read Events
Check resources/taints/affinity/PVC
Confirm node capacity
Fix the exact constraint
Recheck scheduling

A strong answer is:

“I start with kubectl describe pod and Events. Pending usually points to scheduling constraints such as insufficient resources, taints, affinity, or unbound PVCs, so I classify the Event before changing anything.”

Scenario: CrashLoopBackOff — debugging steps?

CrashLoopBackOff means a container starts, exits, and Kubernetes backs off before restarting it again.

Start with:

bash


kubectl get pod <pod-name> -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --previous

Use --previous because the current container may have already restarted.

Check termination reason:

bash

kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.containerStatuses[*].lastState.terminated.reason}'

Common causes:

Cause	Signal	Fix direction
App exception on startup	Stack trace in logs	Fix app/config/dependency
Missing env/Secret/ConfigMap	Config error in logs or Events	Fix mounted/env config
Wrong command/args	Process exits immediately	Fix container command
Liveness too aggressive	Restarts soon after start	Tune probes/add startupProbe
OOMKilled	Termination reason `OOMKilled`	Increase memory or fix memory use
Permission denied	Logs show file/user issue	Fix image user, volume permissions, securityContext
Port conflict/app bind issue	Logs show bind failure	Fix port/config
Dependency unavailable	Readiness/liveness confusion	Keep dependency checks out of liveness

Probe-related command:

bash

kubectl describe pod <pod-name> -n <namespace> | grep -A5 -i probe

Resource check:

bash


kubectl top pod <pod-name> -n <namespace>
kubectl describe pod <pod-name> -n <namespace> | grep -A20 -i resources

Config check:

bash


kubectl get cm -n <namespace>
kubectl get secret -n <namespace>
kubectl describe pod <pod-name> -n <namespace>

Important interview distinctions:

Symptom	Meaning
`CrashLoopBackOff`	Process starts but exits repeatedly
`ImagePullBackOff`	Image cannot be pulled
`CreateContainerConfigError`	Config reference problem before container starts
`RunContainerError`	Runtime failed to start container
`OOMKilled`	Memory limit exceeded

Do not immediately increase limits without checking logs. If the app crashes due to missing config, more memory will not help.

A strong answer is:

“For CrashLoopBackOff, I check previous logs, describe Events, termination reason, probes, config mounts, and OOMKilled status. I separate app startup failure from probe misconfiguration and resource exhaustion.”

Scenario: Deployment rollout stuck — what do you check?

A Deployment rollout can get stuck when the new ReplicaSet cannot produce Ready Pods.

Start with:

bash


kubectl rollout status deployment/<name> -n <namespace>
kubectl describe deployment <name> -n <namespace>
kubectl get rs -n <namespace>
kubectl get pods -n <namespace> -l app=<label>

Find the new ReplicaSet:

bash

kubectl get rs -n <namespace> --sort-by=.metadata.creationTimestamp

Then inspect new Pods:

bash


kubectl describe pod <new-pod> -n <namespace>
kubectl logs <new-pod> -n <namespace>
kubectl logs <new-pod> -n <namespace> --previous

Common rollout blockers:

Issue	Signal	Fix direction
New image pull fails	`ImagePullBackOff`	Fix tag, registry, pull secret
New Pods crash	`CrashLoopBackOff`	Check logs/config/probes
Readiness never passes	Ready `0/1`, readiness Events	Fix readiness endpoint/dependency
No capacity	Pending new Pods	Add nodes or adjust requests
`maxUnavailable: 0` and no surge/capacity	Old Pods stay, new Pods cannot schedule	Allow surge or add capacity
Progress deadline exceeded	`ProgressDeadlineExceeded`	Fix failure or rollback
PVC/volume issue	Pending/mount errors	Fix storage binding/mount
Admission denied	Events/webhook errors	Fix policy violation
Bad ConfigMap/Secret	Env/mount errors	Restore config or rollout new config

Useful Deployment fields:

Field	Why it matters
`maxSurge`	Allows extra Pods during rollout
`maxUnavailable`	Controls unavailable Pods
`progressDeadlineSeconds`	Marks rollout as failed after deadline
`minReadySeconds`	Requires Pods to stay ready before progress
readinessProbe	Determines whether new Pods receive traffic

Rollback:

bash

kubectl rollout undo deployment/<name> -n <namespace>

Rollback to a specific revision:

bash


kubectl rollout history deployment/<name> -n <namespace>
kubectl rollout undo deployment/<name> -n <namespace> --to-revision=<revision>

Good production response:

Check whether users are impacted
Confirm old Pods are still serving
Inspect new ReplicaSet Pods
Roll back if impact is high
Fix image/config/probe/resource issue
Re-deploy with smaller blast radius if needed

Common interview mistake:

Looking only at the Deployment object and not the new ReplicaSet Pods.

The failing evidence is usually in the new Pods.

A strong answer is:

“For a stuck rollout, I inspect the new ReplicaSet and its Pods. Most failures are readiness, image pull, crash, capacity, or admission issues. If production traffic is at risk, I rollback first, then fix and redeploy safely.”

What should you rehearse before Kubernetes interviews?

Use the final week to practice both concepts and live troubleshooting narration.

Docker/container basics:

Image vs container
Dockerfile instructions
Multi-stage build
Compose vs Kubernetes
Container runtime, CRI, containerd, CRI-O
OCI image, tag vs digest, registry pull errors

Kubernetes architecture:

Control plane vs worker node
API server request path
etcd role and quorum
Scheduler decisions
Controllers and reconciliation
kubelet/runtime responsibilities

Workloads:

Pod basics and sidecars
Deployment vs ReplicaSet vs Pod
StatefulSet vs Deployment
DaemonSet, Job, CronJob
Pod phases and container states
Rollout and rollback commands

Networking:

ClusterIP, NodePort, LoadBalancer
Service selector and endpoints
CoreDNS name formats
Ingress vs LoadBalancer
Gateway API awareness
CNI vs kube-proxy/eBPF
NetworkPolicy default allow/deny behavior
DNS egress on UDP/TCP 53

Configuration and storage:

ConfigMap env vs mounted file behavior
Secret security and encryption at rest
External secret managers
PV, PVC, StorageClass
Access modes: RWO, RWX, ROX, RWOP
emptyDir, projected volumes, downward API
Graceful shutdown and SIGTERM

Resources and scaling:

Liveness vs readiness vs startup probes
CPU requests/limits and throttling
Memory limits and OOMKilled
QoS classes
HPA prerequisites
HPA vs VPA vs Cluster Autoscaler
Taints, tolerations, affinity, anti-affinity

Security and platform:

RBAC Role vs ClusterRole
ServiceAccount and workload identity
Namespaces with quotas and policies
Pod Security Standards
securityContext hardening
Admission webhooks and policy engines
Helm, Kustomize, and GitOps

Troubleshooting drills:

Pod Pending
CrashLoopBackOff
ImagePullBackOff
OOMKilled
Service has no endpoints
DNS resolution failure
Ingress 404/502
HPA not scaling
Deployment rollout stuck
PVC Pending
NetworkPolicy blocks app traffic

Must-practice commands:

bash


kubectl get pods -A
kubectl describe pod <pod> -n <ns>
kubectl logs <pod> -n <ns>
kubectl logs <pod> -n <ns> --previous
kubectl get events -n <ns> --sort-by='.lastTimestamp'
kubectl get deploy,rs,pod -n <ns>
kubectl rollout status deployment/<name> -n <ns>
kubectl rollout undo deployment/<name> -n <ns>
kubectl get svc,endpoints,endpointslice -n <ns>
kubectl auth can-i <verb> <resource> -n <ns>
kubectl top pods -n <ns>
kubectl get hpa -n <ns>

Scenario stories to prepare:

A rollout failed because readiness never passed
A Pod stayed Pending due to PVC or taints
HPA did not scale because CPU requests were missing
A NetworkPolicy broke DNS
A CrashLoop was caused by missing Secret or wrong command
A Deployment was fixed safely with rollback
A migration from manual YAML to Helm/GitOps improved release safety

Good final close:

“I do not only know Kubernetes objects by definition. I can explain how they reconcile, how traffic flows, how security is enforced, and how to debug failures from Events, logs, rollout status, and resource conditions.”

Pattern cheat sheet (quick reference)

Task	Kubernetes approach
Stateless web app	Deployment + Service + Ingress
Stable Pod identity	StatefulSet + headless Service
Per-node agent	DaemonSet
Internal service DNS	ClusterIP + CoreDNS
Scale on CPU	HPA v2 + metrics-server + requests
Non-secret config	ConfigMap volume or env
Credentials	Secret or external secrets operator
Block lateral movement	NetworkPolicy + default deny egress
Debug scheduling	`kubectl describe pod` Events
Rollback deploy	`kubectl rollout undo` or GitOps revert
Private registry	imagePullSecrets + digest pin
Graceful shutdown	`preStop` hook + SIGTERM handling

References

Kubernetes interview prep

On-site prep

Summary

Kubernetes interviews test Pod lifecycle, Service networking, probe semantics, and HPA prerequisites—not the definition of orchestration alone. Deploy a sample app, break it, fix it from Events, and narrate kubectl describe findings aloud. Pair with Docker foundations in Q6–10, AWS for EKS operations, and Git for GitOps delivery.

Interview context and how to prepare

Docker and container foundations

Kubernetes architecture

Pods and workload controllers

Services, networking, and Ingress

Configuration, secrets, and storage

Probes, resources, and autoscaling

Security, RBAC, and multi-tenancy

Advanced topics, troubleshooting, and scenarios

Pattern cheat sheet (quick reference)

References

Summary

Related Articles

Kubernetes Sidecar Pattern Explained with Real Examples (Multi-Container Pod YAML)

Set ulimit in Kubernetes Pods: open files, nproc, and process limits

Kubernetes DNS Troubleshooting: Fix CoreDNS, NXDOMAIN, SERVFAIL, ndots, and DNS Timeouts

Search GoLinuxCloud