Kafka Interview Questions and Answers

Apache Kafka interview questions and answers for 2026: topics, partitions, consumer groups, acks, exactly-once, KRaft, and Java Spring Boot integration for experienced developers.

Published

Updated

Tech reviewed byDeepak Prasad

Kafka Interview Questions and Answers

Kafka interview questions in 2026 go far past "what is a message broker?" Teams using Apache Kafka for event-driven microservices, audit logs, and real-time pipelines expect you to explain partitions and consumer groups, defend acks=all with min.insync.replicas, and describe how you would fix consumer lag or design exactly-once processing. Kafka interview questions for Java developer loops often pair broker theory with Spring Kafka configuration—producer retries, listener concurrency, and error handlers.

Below are 45 questions for interview questions on Kafka, kafka interview questions and answers, and senior backend or data-engineering roles. Answers are written so you can learn while prepping—tables, diagrams in prose, small simulations, and a strong answer after each topic. Pair this guide with PostgreSQL interview questions for logical replication and CDC slot behavior on source databases, Kubernetes interview questions when brokers run on StatefulSets, Spring Boot interview questions for microservice integration, Java interview questions part 2 for concurrency fundamentals, Python developer interviews when consumers are written in Python, and SQL technical interviews for downstream analytics stores.

NOTE
Prep target: Experienced loops favor scenario answers—partition key choice for ordering, rebalance incident during deploy, poison message handling, or an EOS consume-transform-produce pipeline. Keep one event flow you can whiteboard from producer to consumer group to sink.

Tested on: Ubuntu 25.04 (Plucky Puffin); kernel 6.14.0-37-generic; Python 3.13.3.


Interview context and how to prepare

What do Kafka interviews actually test?

Apache Kafka interviews test whether you understand a distributed commit log—not a classic queue that deletes messages after one consumer reads them.

Layer What interviewers probe
Architecture Brokers, topics, partitions, leaders, replicas
Producers Keys, partitioning, acks, idempotence
Consumers Groups, offsets, rebalancing, lag
Durability Replication, ISR, unclean leader election
Semantics At-most-once, at-least-once, exactly-once
Operations Retention, compaction, monitoring, capacity
Integration Spring Kafka, Schema Registry, Connect
Role Emphasis
Java backend Spring producer/consumer config, error handling
Platform / SRE Broker tuning, KRaft, multi-AZ, upgrades
Data engineer Connect, Streams, schema evolution, pipelines
Kafka interview questions for Java developer — what is different from generic messaging questions?

Java developer loops add client integration depth on top of broker concepts:

Generic Kafka question Java-specific follow-up
Consumer groups @KafkaListener concurrency vs partitions
Serialization JsonSerializer, Avro + Schema Registry
Error handling DefaultErrorHandler, DLT (dead-letter topic)
Transactions KafkaTransactionManager with Spring
Testing @EmbeddedKafka, Testcontainers

Interviewers often ask you to sketch Spring Boot properties for a producer and consumer in the same service—see Spring Boot interviews for broader microservice context.

What is a typical Kafka interview loop?
Round Duration Focus
Recruiter / HM 30 min Event-driven experience, throughput, on-call
Fundamentals 45–60 min Topics, partitions, consumer groups, offsets
Deep technical 60–90 min acks, EOS, replication, failure scenarios
System design 45–60 min Order service events, CDC, metrics pipeline
Live exercise 45 min Partition assignment math, config fix, lag triage
Behavioral 30 min Outage stories, rollout discipline

Live rounds often stress consumer groups and partitions together with delivery semantics—at-least-once, at-most-once, and exactly-once trade-offs.

What is a realistic 4–6 week prep plan?
Week Focus Output
1 Core model — log, topic, partition, broker Draw producer → topic → consumer group
2 Producers — keys, acks, retries, idempotence Configure a test producer; list trade-offs
3 Consumers — groups, commits, rebalancing Run two consumers; observe partition assignment
4 Durability — ISR, min.insync.replicas, replication Explain leader failure walkthrough
5 EOS + Schema Registry / Avro basics Document read-process-write with transactions
6 Java integration + scenarios Spring @KafkaListener sample; rehearse lag debug

Run a local KRaft broker or Docker Compose stack and produce/consume with kafka-console-producer / kafka-console-consumer at least once.


Architecture and core concepts

What is Apache Kafka and how is it different from a traditional message queue?

Kafka is a distributed event streaming platform built around an append-only partitioned log. Producers append records; consumers pull and track position via offsets.

Aspect Traditional queue (e.g. RabbitMQ) Kafka
Message lifecycle Often deleted after ack Retained by policy (time/size/compaction)
Consumption Push to consumers Consumers pull at their pace
Replay Usually not supported Seek to older offsets
Fan-out Queues or exchanges Multiple independent consumer groups
Ordering Per queue Per partition only

Kafka fits event sourcing, audit trails, stream processing, and decoupled microservices where many teams read the same history.

A strong answer is:

Kafka is a durable, replayable log with pub/sub via consumer groups—not a delete-on-read queue—so multiple services can consume the same topic at different speeds and rewind when needed.

Explain Kafka architecture and its main components.
Component Role
Broker Server storing partitions, serving produce/fetch requests
Cluster Multiple brokers for scale and fault tolerance
Topic Named logical stream (e.g. orders)
Partition Ordered, immutable log shard—unit of parallelism
Replica Copy of a partition for durability
Leader / follower One leader serves reads/writes; followers replicate
Producer Publishes records with optional key
Consumer Reads from assigned partitions
Consumer group Cooperative consumers sharing load
Controller Manages leader election (KRaft or legacy ZK mode)
ZooKeeper / KRaft Cluster metadata (KRaft is standard in new deployments)

Data flow: producer → partition leader → followers replicate → consumers fetch.

A strong answer is:

Brokers host partition leaders and replicas; producers write to leaders; consumer groups divide partitions among members; the controller handles metadata and leader election.

What is a topic and what is a partition?

A topic is a logical category of records. Each topic is split into partitions—physical logs on disk.

Concept Detail
Partition Ordered sequence with monotonic offsets (0, 1, 2, …)
Parallelism More partitions → more concurrent consumers (up to 1:1 per group)
Ordering Guaranteed within a partition, not across partitions
Key routing Same key → same partition (default murmur2 hash)

Partition count is chosen at topic creation—increasing partitions is possible but does not reorder existing keys' history.

Partition assignment by key (simulation):

python
def partition_for_key(key: str, num_partitions: int) -> int:
    # Kafka uses murmur2; this models stable "same key → same partition"
    return hash(key) % num_partitions

keys = ["user-42", "user-42", "user-99", "user-42"]
parts = [partition_for_key(k, 6) for k in keys]
print(parts[0] == parts[1] == parts[3], parts)
Output

When you click Run, the first value should be True—the same key maps to the same partition index, which is how per-key ordering is preserved.

A strong answer is:

A topic is the stream name; partitions are the ordered shards that enable scale—I choose partition count and keys so ordering and throughput match product rules.

How does Kafka handle message ordering and delivery guarantees?

Ordering: Kafka guarantees order per partition. If you need all events for order-123 ordered, use key = order-123 so they land in one partition.

Delivery is not one setting—it is a stack of choices:

Layer Knob
Producer acks, retries, idempotence
Broker Replication, ISR, min.insync.replicas
Consumer When offsets are committed vs when work runs

There is no global order across partitions—design for partition-local order or use a single partition (limits throughput).

A strong answer is:

Ordering is per partition via key routing; end-to-end delivery semantics come from producer acks, replication, and consumer offset timing together.

What is KRaft and how does it differ from ZooKeeper?

KRaft (Kafka Raft) stores cluster metadata in Kafka itself using a Raft quorum—ZooKeeper is no longer required for new Kafka versions (3.3+ production-ready path; 4.x removes ZK).

ZooKeeper mode (legacy) KRaft
Metadata External ZK ensemble Internal metadata log
Operations Two systems to patch/secure Single Kafka operational model
Scaling metadata ZK limits at very large scale Designed for millions of partitions

Interviewers in 2026 expect you to know new clusters use KRaft and migration projects move off ZK.

A strong answer is:

KRaft embeds metadata in Kafka with Raft consensus, simplifying operations— I treat ZooKeeper as legacy and KRaft as the default for greenfield clusters.


Producers

What happens internally when a producer sends a message to Kafka?

Simplified produce path:

  1. Serializer turns key/value into bytes
  2. Partitioner picks partition (key hash or sticky batching)
  3. Producer batches records for throughput
  4. Request sent to partition leader broker
  5. Leader appends to log; followers replicate
  6. Broker responds based on acks setting
  7. On failure, producer retries (may duplicate without idempotence)

Batching (linger.ms, batch.size) trades latency for throughput.

A strong answer is:

The producer serializes, partitions, batches, and sends to the leader; replication and ack level determine when the produce call succeeds and whether retries can create duplicates.

Difference between acks=0, acks=1, and acks=all?
Setting Behavior Trade-off
acks=0 Fire-and-forget; no broker ack Highest throughput; may lose data
acks=1 Leader persisted ack Fast; loss if leader dies before replication
acks=all All ISR replicas must ack Strongest durability; higher latency

Pair acks=all with min.insync.replicas ≥ 2 on the broker and topic replication factor ≥ 3 for meaningful safety—otherwise all can still ack with one replica.

Use acks=0 only for metrics or logs where loss is acceptable.

A strong answer is:

acks=0 is fire-and-forget, acks=1 waits for the leader, acks=all waits for the full ISR—I use all for critical events with proper replication and min.insync.replicas.

Why do producer keys matter?

The key determines partition (when non-null):

Key Effect
Set hash(key) % numPartitions — colocate related events
Null Round-robin / sticky partitioning — no key-based order

Use cases:

  • orderId as key — all lifecycle events for one order stay ordered
  • userId — per-user ordering
  • Null key — maximum spread when order does not matter

Hot keys can create partition skew—one partition overloaded while others idle.

A strong answer is:

Keys route related events to the same partition for ordering; I avoid hot keys and null keys when I need predictable locality.

What is an idempotent producer?

Enable with enable.idempotence=true (Kafka producer). Broker assigns Producer ID (PID) and tracks sequence numbers per partition—duplicate retries are deduplicated within that producer session.

Scope Guarantee
Idempotent producer No duplicate writes to one partition from retries
Transactions Atomic writes across partitions + offset commits

Idempotence requires acks=all, retries > 0, and appropriate max.in.flight.requests.per.connection (≤ 5 with idempotence enabled).

A strong answer is:

Idempotent producers dedupe retry batches per partition via sequence numbers—necessary but not sufficient alone for end-to-end exactly-once.

Which producer settings do interviewers expect you to know?
Property Purpose
bootstrap.servers Broker seed list
key.serializer / value.serializer Byte format
acks Durability vs latency
retries / delivery.timeout.ms Resilience
enable.idempotence Dedupe retries
compression.type lz4, zstd, gzip — bandwidth vs CPU
linger.ms / batch.size Batching tuning

Misconfigured max.in.flight.requests.per.connection with retries and no idempotence can reorder batches—classic interview trap.

A strong answer is:

I tune acks, idempotence, and in-flight requests together, and I batch with linger/batch.size only after measuring latency impact.


Consumers, groups, and offsets

What is a consumer group and how does it distribute work?

A consumer group shares one group.id. Each partition is assigned to exactly one consumer in the group at a time.

Consumers Partitions Result
3 consumers 6 partitions ~2 partitions each
6 consumers 6 partitions 1:1 — max parallelism for this group
8 consumers 6 partitions 2 idle consumers

Different groups reading the same topic are independent—each maintains its own offsets (fan-out).

Consumer group assignment simulation:

python
def assign_partitions(partitions, consumers):
    assignments = {c: [] for c in consumers}
    for i, p in enumerate(partitions):
        assignments[consumers[i % len(consumers)]].append(p)
    return assignments

parts = [f"P{i}" for i in range(6)]
print(assign_partitions(parts, ["C0", "C1", "C2"]))
Output

When you click Run, you should see each consumer name mapped to two partition labels—round-robin style assignment similar in spirit to range/round-robin assignors.

A strong answer is:

A consumer group divides partitions among members so each partition is processed once per group; extra consumers stay idle unless I add partitions.

How are consumer offsets managed?

An offset is the consumer's position in a partition log. Committed offsets are stored in the internal topic __consumer_offsets.

Commit mode Behavior
Auto commit Periodic commit — simple; risk if process after commit
Manual commit commitSync / commitAsync after successful processing
Transactional Offsets committed atomically with producer transaction

At-least-once pattern: process record, then commit offset.

At-most-once pattern: commit offset, then process (may lose on crash).

A strong answer is:

Offsets are the consumer's bookmark—I commit after successful processing for at-least-once, and I disable careless auto-commit on critical pipelines.

What triggers a rebalance and how do you minimize impact?

Rebalance redistributes partitions when group membership changes:

  • Consumer joins or leaves
  • Consumer exceeds session.timeout.ms / max.poll.interval.ms
  • Topic partition count changes
  • Cooperative sticky assignor revokes subset incrementally
Protocol Impact
Eager (range) Revoke all, reassign — stop-the-world pauses
Cooperative sticky Incremental revoke — preferred in production

Mitigations:

  • Right-size session.timeout.ms, heartbeat.interval.ms, max.poll.interval.ms
  • Avoid long processing in poll loop—pause and resume or decouple processing
  • Static membership (group.instance.id) for rolling restarts

A strong answer is:

Rebalances happen on membership or timeout changes—I use cooperative rebalancing, tune poll intervals, and keep processing off the poll thread for long work.

What is consumer lag and how do you debug it?

Lag = difference between log end offset and consumer committed/current offset—how far behind a consumer is.

Cause Investigation
Slow processing JVM GC, DB calls in listener, thread pool exhaustion
Too few consumers Consumers < partitions
Hot partition Skewed key distribution
Rebalance storm Frequent join/leave during deploy
Downstream bottleneck Sink cannot keep pace

Tools: kafka-consumer-groups.sh --describe, Burrow, Datadog, Prometheus exporters.

Fix: scale consumers (up to partition count), optimize handler, increase partitions with key strategy review, fix poison messages.

A strong answer is:

Lag is how far behind consumption is—I find whether the bottleneck is processing time, partition skew, or rebalance churn, then scale or fix handlers with metrics, not guesses.

What are max.poll.interval.ms and session.timeout.ms?
Setting Purpose
session.timeout.ms Heartbeat failure → consumer considered dead
heartbeat.interval.ms Must be < session timeout (typically ~1/3)
max.poll.interval.ms Max time between poll() calls before rebalance

If your listener processes one message for five minutes without polling, the consumer is kicked out—even if heartbeats still run (depending on version/config).

Pattern: poll often, hand off to worker pool, use pause/resume for backpressure.

A strong answer is:

session.timeout detects dead members; max.poll.interval caps time between polls—I never block the poll loop on long synchronous work.

What is isolation.level=read_committed?

Consumers default to read_uncommitted—see all messages including those from aborted transactions.

read_committed hides aborted transactional batches—required for exactly-once consume-transform-produce loops so consumers do not read rolled-back data.

Pair with transactional producers and disabled auto-commit.

A strong answer is:

read_committed lets consumers see only committed transactional records—part of exactly-once pipelines with transactional producers.


Replication, durability, and fault tolerance

How does Kafka ensure data durability and fault tolerance?

Each partition has replication factor N—one leader and N−1 followers on different brokers.

Mechanism Role
Leader replication Followers fetch from leader
ISR (in-sync replicas) Replicas caught up enough to be promotable
Leader election New leader chosen from ISR on failure
unclean.leader.election.enable=false Avoid data loss from out-of-sync promotion

Producer acks=all waits for ISR acks—not merely any replica.

A strong answer is:

Replication across brokers plus ISR tracking and safe leader election keeps partitions available after node loss—I pair that with acks=all and min.insync.replicas.

What is the in-sync replica set (ISR)?

ISR = replicas whose lag is below replica.lag.time.max.ms (or byte lag thresholds in older configs).

Only ISR members can become leader if unclean.leader.election.enable=false.

If ISR shrinks to one replica:

  • Cluster still serves traffic
  • No redundancy until followers catch up
  • Risk if that single broker fails

Monitor ISR shrink events—they predict durability risk.

A strong answer is:

ISR is the set of replicas safe to promote— I alert when ISR size drops and I never enable unclean election on critical topics.

What is min.insync.replicas?

Broker/topic setting min.insync.replicas defines how many replicas must ack a write for acks=all to succeed.

Example: replication.factor=3, min.insync.replicas=2

  • Tolerates one broker loss without stopping writes
  • Producer with acks=all fails if only one ISR member available—prefer failing writes over silent data loss

Interview trap: acks=all with min.insync.replicas=1 and RF=3 still allows single-replica commits.

A strong answer is:

min.insync.replicas enforces how many replicas must ack before a write counts—I set it to 2 with RF=3 so one broker can die without weakening durability.

What happens when a Kafka broker fails?
Scenario Effect
Follower dies ISR may shrink; leaders continue
Leader dies Controller elects new leader from ISR
Multiple brokers die Partitions with no ISR leader go offline
Controller failure Failover to standby controller (KRaft quorum)

Clients refresh metadata and discover new leaders—brief produce/fetch errors during election.

Operations: rack awareness (broker.rack) for cross-AZ placement, monitor under-replicated partitions.

A strong answer is:

Follower loss reduces redundancy; leader loss triggers ISR election; clients retry after metadata refresh—I design RF and rack awareness so single-AZ loss does not take topics offline.


Delivery semantics and exactly-once

Explain at-most-once, at-least-once, and exactly-once semantics.
Semantic Guarantee Typical implementation
At-most-once May lose; no duplicates Commit offset before process; acks=0
At-least-once No loss; may duplicate Process then commit; retries without idempotence
Exactly-once Neither loss nor duplicate Idempotent producer + transactions / Streams EOS

Delivery semantics simulation:

python
def at_most_once(process, commit, events):
    lost = []
    for e in events:
        commit(e)
        if e == "crash":
            lost.append("unprocessed-after-commit")
            break
        process(e)
    return lost

def at_least_once(process, commit, events):
    dup = []
    processed = set()
    for e in events:
        if e != "crash":
            process(e)
            processed.add(e)
        else:
            for p in processed:
                process(p)
                dup.append(p)
            break
        commit(e)
    return dup

events = ["A", "B", "crash"]
print("at-most-once lost:", at_most_once(lambda x: None, lambda x: None, events))
print("at-least-once dups:", at_least_once(lambda x: None, lambda x: None, events))
Output

When you click Run, you should see at-most-once report a lost-after-commit case and at-least-once list duplicates after a simulated crash—illustrating why idempotent handlers matter.

A strong answer is:

At-most-once commits early; at-least-once commits after work and may retry duplicates; exactly-once needs broker and application cooperation—I default to at-least-once with idempotent consumers unless EOS is justified.

How do you achieve exactly-once processing in Kafka?

Broker-side pieces:

  1. Idempotent producer — dedupe per-partition retries
  2. TransactionsinitTransactions, beginTransaction, commitTransaction
  3. sendOffsetsToTransaction — atomic offset commit with output records
  4. Consumer isolation.level=read_committed

Application-side: processing must be deterministic; external sinks need idempotent writes or transactional stores—EOS in Kafka does not magically dedupe your database.

Kafka Streams offers processing.guarantee=exactly_once_v2 packaging the pattern.

A strong answer is:

Exactly-once combines idempotent producers, transactional writes, and committed offset reads—I still make downstream sinks idempotent because EOS stops at Kafka's boundary.

Walk through a consume-transform-produce transaction.

Steps:

  1. Producer initTransactions()
  2. Consumer polls records
  3. beginTransaction()
  4. Process and produce output records
  5. sendOffsetsToTransaction with consumed offsets
  6. commitTransaction() — all visible or none

On failure: abortTransaction() — consumers with read_committed never see partial output.

Requires unique transactional.id per producer instance (fence zombies after failover).

A strong answer is:

I wrap output records and input offset commits in one transaction so a crash cannot double-publish or skip commits visible to downstream readers.

Why do you still need idempotent consumers with at-least-once?

Most teams run at-least-once (simpler than full transactions). Retries and rebalance redelivery mean duplicate delivery is normal.

Consumer strategies:

Strategy Example
Natural idempotence Upsert by primary key
Dedup store Redis/DB of processed event IDs
Transactional DB Unique constraint on event_id

"Exactly-once" in interviews often means effective exactly-once—at-least-once transport + idempotent processing.

A strong answer is:

At-least-once will redeliver—I design handlers to dedupe on business keys or store processed IDs so duplicates are harmless.


Schema Registry, Connect, and stream processing

What is Schema Registry and why use Avro with Kafka?

Schema Registry (often Confluent) stores Avro/JSON/Protobuf schemas with versioning.

Benefit Detail
Compatibility BACKWARD, FORWARD, FULL modes
Evolution Add fields with defaults safely
Compact payloads Avro binary + schema id in message

Producers send schema ID; consumers fetch schema from registry—contract between teams.

Interview talking point: breaking schema change blocked by compatibility mode—coordinate with consumers before deploy.

A strong answer is:

Schema Registry versions contracts between producers and consumers—I use backward-compatible Avro evolution and test compatibility in CI.

What is Kafka Connect?

Kafka Connect is a framework for source and sink connectors:

Type Example
Source Debezium CDC from PostgreSQL → Kafka
Sink S3, Elasticsearch, JDBC sink

Runs as distributed workers with offset tracking in Kafka—fits ETL without custom consumer boilerplate.

Pair with SQL interviews when discussing CDC and warehouse loads.

A strong answer is:

Kafka Connect moves data in and out with connectors—I use CDC sources and managed sinks instead of one-off consumers when the pattern is standard.

What is Kafka Streams at interview level?

Kafka Streams is a Java library for stream processing on top of Kafka:

  • Stateful operations (aggregations, joins) with changelog topics
  • Exactly-once v2 processing guarantee option
  • No separate cluster like Flink—runs as your app

Use when team is JVM-heavy and logic fits Kafka-centric topologies; use Flink/Spark when you need complex event-time windowing at massive scale.

A strong answer is:

Kafka Streams embeds processing in Java apps with state stores and EOS options—I choose it for JVM microservices, Flink when operability needs a dedicated cluster.

What is log compaction vs time-based retention?
Policy Behavior
delete (time/size) Old segments removed after retention.ms
compact Keep latest record per key; tombstones remove keys

Compaction suits changelog topics—config snapshots, __consumer_offsets, compacted state topics in Streams.

Not a substitute for infinite raw event history—compacted topics are key-value latest views.

A strong answer is:

Time retention drops old data by age; compaction keeps the latest value per key for changelog-style topics like config or state stores.


Operations, tuning, and monitoring

How do retention settings affect topics?
Setting Effect
retention.ms Max age before segment delete
retention.bytes Max size per partition
segment.ms / segment.bytes File roll size

Long retention enables replay for new consumer groups or reprocessing—costs disk.

Tiered storage (vendor/cloud features) moves old segments to object storage.

A strong answer is:

Retention balances replay needs and disk cost—I size retention per topic based on compliance and consumer onboarding, not one global default.

What metrics do you monitor in production Kafka?
Metric Signals
Under-replicated partitions Replication lag
Offline partitions Availability incident
Consumer lag Processing backlog
Request latency (produce/fetch) Broker load
ISR shrink/expand Durability risk
Disk usage Retention pressure

Alert on lag SLO breach and URP > 0 sustained—not only broker up/down.

A strong answer is:

I monitor lag, under-replication, and disk—I tie alerts to consumer SLOs and broker health, not just process running.

How do you plan topic partition count and broker capacity?
Factor Guidance
Target throughput Partitions parallelize consumers
Ordering Fewer partitions if strict global order (bottleneck)
Consumer count Partitions ≥ max consumers in group
Broker load Each partition has file handles and leader CPU
Future growth Easier to add partitions than shrink

Load test with expected message size and compression; watch disk I/O and network.

A strong answer is:

I size partitions for peak consumer parallelism and key ordering needs, then load-test brokers—partition count is hard to reduce later.

How do you handle poison messages?

Poison message — fails processing every retry, blocks partition or causes infinite loop.

Patterns:

Pattern Detail
Retry topic Limited retries with backoff
DLT (dead-letter topic) Spring DeadLetterPublishingRecoverer
Quarantine store Manual triage
Skip with metric Only for non-critical data

Always log key, offset, partition, stack trace; alert on DLT rate.

A strong answer is:

I cap retries, route failures to a dead-letter topic with metadata, and alert—never spin forever on the same offset without visibility.

How do you secure Kafka in production?
Layer Control
Network TLS encryption in transit
Authentication SASL (SCRAM, OAuth/OIDC)
Authorization ACLs or RBAC (managed offerings)
Multi-tenant Separate topics + ACLs per team

Never expose plaintext brokers to the public internet; rotate credentials; least-privilege ACLs per producer/consumer principal.

A strong answer is:

TLS plus SASL authentication and topic-level ACLs—I give each service only produce or consume rights on the topics it needs.


Comparisons and system design

Kafka vs RabbitMQ — when do you pick each?
Factor RabbitMQ Kafka
Model Smart broker, queues Dumb broker, smart consumers
Routing Exchanges, bindings Topics/partitions
Retention Delete on ack Configurable log
Replay Limited First-class
Throughput Moderate Very high horizontal scale
Task queues Excellent Possible but not primary fit

Pick RabbitMQ for RPC-style work queues; Kafka for event streams, audit logs, and multiple independent readers.

A strong answer is:

RabbitMQ for task routing and low-latency queues; Kafka for durable event streams, replay, and high-throughput fan-out—I do not force all messaging through one tool.

How do you design events for microservices?

Practices:

Practice Why
Past-tense names OrderPlaced, not PlaceOrder
Versioned payloads Schema evolution
Include metadata eventId, occurredAt, correlationId
Idempotent handlers At-least-once reality
Bounded context topics Avoid god topic

Align with full stack and Spring Boot service boundaries.

A strong answer is:

Events are contracts—I name them as facts, version schemas, include correlation IDs, and design consumers to tolerate redelivery.

What is change data capture (CDC) with Kafka?

CDC streams database row changes to Kafka—often Debezium on Kafka Connect.

Use Benefit
Cache invalidation Downstream read models update
Search index Elasticsearch sync
Warehouse Snowflake/BigQuery ingestion
Decouple services Without dual writes

Challenge: ordering per key, schema changes, initial snapshot load.

A strong answer is:

CDC publishes database changes as events—I use Debezium for near-real-time sync instead of brittle dual-write patterns between services and DBs.


Java and Spring Boot integration

How do you configure a Kafka producer in Spring Boot?

Add spring-kafka and configure:

properties
spring.kafka.bootstrap-servers=localhost:9092
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.springframework.kafka.support.serializer.JsonSerializer
spring.kafka.producer.acks=all
spring.kafka.producer.properties.enable.idempotence=true
java
@Service
@RequiredArgsConstructor
public class OrderEventPublisher {
    private final KafkaTemplate<String, OrderPlacedEvent> kafkaTemplate;

    public void publish(OrderPlacedEvent event) {
        kafkaTemplate.send("orders", event.orderId(), event);
    }
}

Use orderId as key for partition locality. Configure retries and delivery timeout explicitly for critical topics.

A strong answer is:

Spring Kafka wraps KafkaTemplate with serializers and producer props—I set acks, idempotence, and keys explicitly for durable ordered per-order events.

How do you configure a @KafkaListener consumer in Spring Boot?
java
@Component
@Slf4j
public class OrderListener {

    @KafkaListener(topics = "orders", groupId = "billing-service", concurrency = "3")
    public void onOrder(OrderPlacedEvent event) {
        log.info("billing {}", event.orderId());
    }
}
Setting Note
concurrency Listener threads—≤ partitions for efficiency
groupId Consumer group per logical service
autoStartup Control lifecycle
Error handler DefaultErrorHandler + DLT

Disable enable-auto-commit for manual ack mode when you need at-least-once discipline:

java
@KafkaListener(...)
public void listen(ConsumerRecord<String, OrderPlacedEvent> record, Acknowledgment ack) {
    process(record.value());
    ack.acknowledge();
}

A strong answer is:

I match listener concurrency to partitions, use explicit ack when needed, and wire error handlers to dead-letter topics instead of infinite retry loops.

How do you test Kafka integration in Java?
Approach Use
@EmbeddedKafka Fast unit/integration tests in JVM
Testcontainers Real broker behavior in CI
KafkaTemplate + @KafkaListener End-to-end in test context

Assert with KafkaTestUtils.getSingleRecord or awaitility on side effects.

Mirror Spring Boot testing pyramid—many handler unit tests, few full broker tests.

A strong answer is:

I unit-test handlers without Kafka, use EmbeddedKafka or Testcontainers for wiring tests, and keep broker tests focused on serialization and ack behavior.


Senior scenarios and final prep

Scenario: Consumer lag spiked after a deployment — how do you respond?
Step Action
1 Confirm which group/topic/partition—dashboard or kafka-consumer-groups
2 Correlate with deploy time—new code slower? rebalance storm?
3 Check hot partitions — skewed keys
4 Thread/GC logs on consumers; DB latency in handler
5 Roll back if regression; scale consumers if CPU-bound and partitions allow
6 Temporary partition increase only with key strategy review
7 Post-incident: add lag alert, load test, poison message guard

A strong answer is:

I isolate the group and partition skew, compare release timing, inspect handler latency and rebalances, then fix code or scale consumers with metrics proving the bottleneck.

What should you rehearse the week before a Kafka interview?

Checklist:

  • Whiteboard producer → topic partitions → consumer group
  • Explain acks=all + min.insync.replicas with numbers
  • Contrast at-least-once vs exactly-once with idempotent consumer
  • Walk rebalance triggers and cooperative protocol
  • Partition key strategy for one real domain (orders, payments)
  • Spring Kafka producer + @KafkaListener + DLT pattern
  • One lag or broker failure STAR story
  • Java concurrency refresh for poll/thread issues
  • Spring Boot microservice context

A strong answer is:

I rehearse one event pipeline end to end—keys, acks, consumer group, failure handling, and Java config—so answers sound like production war stories, not glossary recitation.


Pattern cheat sheet (quick reference)

Need Kafka starting point
Per-key ordering Producer key → partition
Max consumer parallelism Partition count ≥ consumers
Durability acks=all, RF=3, min.insync.replicas=2
Dedupe retries enable.idempotence=true
At-least-once Process then commit offset
EOS in Kafka Transactions + read_committed
Schema evolution Schema Registry + backward compatible Avro
Poison messages DLT + limited retries
Lag triage Group describe, skew, handler time, rebalances
Java integration Spring KafkaTemplate + @KafkaListener
New clusters KRaft metadata mode

References

Official Apache Kafka documentation

On-site prep


Summary

Kafka interviews connect partitions, consumer groups, and offsets to delivery semantics and operational trade-offs—not broker definitions alone. Java-heavy loops add Spring Kafka configuration, listener concurrency, and dead-letter handling. Answer aloud and compare your structure to each section. Pair with Spring Boot and SQL when events land in analytics stores.

Deepak Prasad

R&D Engineer

Founder of GoLinuxCloud with more than 15 years of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive …