45+ Apache Kafka Interview Questions and Answers for Java Developers 2026

Kafka interview questions in 2026 go far past "what is a message broker?" Teams using Apache Kafka for event-driven microservices, audit logs, and real-time pipelines expect you to explain partitions and consumer groups, defend acks=all with min.insync.replicas, and describe how you would fix consumer lag or design exactly-once processing. Kafka interview questions for Java developer loops often pair broker theory with Spring Kafka configuration—producer retries, listener concurrency, and error handlers.

Below are 45 questions for interview questions on Kafka, kafka interview questions and answers, and senior backend or data-engineering roles. Answers are written so you can learn while prepping—tables, diagrams in prose, small simulations, and a strong answer after each topic. Pair this guide with PostgreSQL interview questions for logical replication and CDC slot behavior on source databases, Kubernetes interview questions when brokers run on StatefulSets, Spring Boot interview questions for microservice integration, Java interview questions part 2 for concurrency fundamentals, Python developer interviews when consumers are written in Python, and SQL technical interviews for downstream analytics stores.

NOTE

Prep target: Experienced loops favor scenario answers—partition key choice for ordering, rebalance incident during deploy, poison message handling, or an EOS consume-transform-produce pipeline. Keep one event flow you can whiteboard from producer to consumer group to sink.

Tested on: Ubuntu 25.04 (Plucky Puffin); kernel 6.14.0-37-generic; Python 3.13.3.

Interview context and how to prepare

What do Kafka interviews actually test?

Apache Kafka interviews test whether you understand a distributed commit log—not a classic queue that deletes messages after one consumer reads them.

Layer	What interviewers probe
Architecture	Brokers, topics, partitions, leaders, replicas
Producers	Keys, partitioning, `acks`, idempotence
Consumers	Groups, offsets, rebalancing, lag
Durability	Replication, ISR, unclean leader election
Semantics	At-most-once, at-least-once, exactly-once
Operations	Retention, compaction, monitoring, capacity
Integration	Spring Kafka, Schema Registry, Connect

Role	Emphasis
Java backend	Spring producer/consumer config, error handling
Platform / SRE	Broker tuning, KRaft, multi-AZ, upgrades
Data engineer	Connect, Streams, schema evolution, pipelines

Kafka interview questions for Java developer — what is different from generic messaging questions?

Java developer loops add client integration depth on top of broker concepts:

Generic Kafka question	Java-specific follow-up
Consumer groups	`@KafkaListener` concurrency vs partitions
Serialization	`JsonSerializer`, Avro + Schema Registry
Error handling	`DefaultErrorHandler`, DLT (dead-letter topic)
Transactions	`KafkaTransactionManager` with Spring
Testing	`@EmbeddedKafka`, Testcontainers

Interviewers often ask you to sketch Spring Boot properties for a producer and consumer in the same service—see Spring Boot interviews for broader microservice context.

What is a typical Kafka interview loop?

Round	Duration	Focus
Recruiter / HM	30 min	Event-driven experience, throughput, on-call
Fundamentals	45–60 min	Topics, partitions, consumer groups, offsets
Deep technical	60–90 min	acks, EOS, replication, failure scenarios
System design	45–60 min	Order service events, CDC, metrics pipeline
Live exercise	45 min	Partition assignment math, config fix, lag triage
Behavioral	30 min	Outage stories, rollout discipline

Live rounds often stress consumer groups and partitions together with delivery semantics—at-least-once, at-most-once, and exactly-once trade-offs.

What is a realistic 4–6 week prep plan?

Week	Focus	Output
1	Core model — log, topic, partition, broker	Draw producer → topic → consumer group
2	Producers — keys, `acks`, retries, idempotence	Configure a test producer; list trade-offs
3	Consumers — groups, commits, rebalancing	Run two consumers; observe partition assignment
4	Durability — ISR, `min.insync.replicas`, replication	Explain leader failure walkthrough
5	EOS + Schema Registry / Avro basics	Document read-process-write with transactions
6	Java integration + scenarios	Spring `@KafkaListener` sample; rehearse lag debug

Run a local KRaft broker or Docker Compose stack and produce/consume with kafka-console-producer / kafka-console-consumer at least once.

Architecture and core concepts

What is Apache Kafka and how is it different from a traditional message queue?

Kafka is a distributed event streaming platform built around an append-only partitioned log. Producers append records; consumers pull and track position via offsets.

Aspect	Traditional queue (e.g. RabbitMQ)	Kafka
Message lifecycle	Often deleted after ack	Retained by policy (time/size/compaction)
Consumption	Push to consumers	Consumers pull at their pace
Replay	Usually not supported	Seek to older offsets
Fan-out	Queues or exchanges	Multiple independent consumer groups
Ordering	Per queue	Per partition only

Kafka fits event sourcing, audit trails, stream processing, and decoupled microservices where many teams read the same history.

A strong answer is:

Kafka is a durable, replayable log with pub/sub via consumer groups—not a delete-on-read queue—so multiple services can consume the same topic at different speeds and rewind when needed.

Explain Kafka architecture and its main components.

Component	Role
Broker	Server storing partitions, serving produce/fetch requests
Cluster	Multiple brokers for scale and fault tolerance
Topic	Named logical stream (e.g. `orders`)
Partition	Ordered, immutable log shard—unit of parallelism
Replica	Copy of a partition for durability
Leader / follower	One leader serves reads/writes; followers replicate
Producer	Publishes records with optional key
Consumer	Reads from assigned partitions
Consumer group	Cooperative consumers sharing load
Controller	Manages leader election (KRaft or legacy ZK mode)
ZooKeeper / KRaft	Cluster metadata (KRaft is standard in new deployments)

Data flow: producer → partition leader → followers replicate → consumers fetch.

A strong answer is:

Brokers host partition leaders and replicas; producers write to leaders; consumer groups divide partitions among members; the controller handles metadata and leader election.

What is a topic and what is a partition?

A topic is a logical category of records. Each topic is split into partitions—physical logs on disk.

Concept	Detail
Partition	Ordered sequence with monotonic offsets (0, 1, 2, …)
Parallelism	More partitions → more concurrent consumers (up to 1:1 per group)
Ordering	Guaranteed within a partition, not across partitions
Key routing	Same key → same partition (default `murmur2` hash)

Partition count is chosen at topic creation—increasing partitions is possible but does not reorder existing keys' history.

Partition assignment by key (simulation):

python


def partition_for_key(key: str, num_partitions: int) -> int:
    # Kafka uses murmur2; this models stable "same key → same partition"
    return hash(key) % num_partitions

keys = ["user-42", "user-42", "user-99", "user-42"]
parts = [partition_for_key(k, 6) for k in keys]
print(parts[0] == parts[1] == parts[3], parts)

Output

When you click Run, the first value should be True—the same key maps to the same partition index, which is how per-key ordering is preserved.

A strong answer is:

A topic is the stream name; partitions are the ordered shards that enable scale—I choose partition count and keys so ordering and throughput match product rules.

How does Kafka handle message ordering and delivery guarantees?

Ordering: Kafka guarantees order per partition. If you need all events for order-123 ordered, use key = order-123 so they land in one partition.

Delivery is not one setting—it is a stack of choices:

Layer	Knob
Producer	`acks`, retries, idempotence
Broker	Replication, ISR, `min.insync.replicas`
Consumer	When offsets are committed vs when work runs

There is no global order across partitions—design for partition-local order or use a single partition (limits throughput).

A strong answer is:

Ordering is per partition via key routing; end-to-end delivery semantics come from producer acks, replication, and consumer offset timing together.

What is KRaft and how does it differ from ZooKeeper?

KRaft (Kafka Raft) stores cluster metadata in Kafka itself using a Raft quorum—ZooKeeper is no longer required for new Kafka versions (3.3+ production-ready path; 4.x removes ZK).

	ZooKeeper mode (legacy)	KRaft
Metadata	External ZK ensemble	Internal metadata log
Operations	Two systems to patch/secure	Single Kafka operational model
Scaling metadata	ZK limits at very large scale	Designed for millions of partitions

Interviewers in 2026 expect you to know new clusters use KRaft and migration projects move off ZK.

A strong answer is:

KRaft embeds metadata in Kafka with Raft consensus, simplifying operations— I treat ZooKeeper as legacy and KRaft as the default for greenfield clusters.

Producers

What happens internally when a producer sends a message to Kafka?

Simplified produce path:

Serializer turns key/value into bytes
Partitioner picks partition (key hash or sticky batching)
Producer batches records for throughput
Request sent to partition leader broker
Leader appends to log; followers replicate
Broker responds based on acks setting
On failure, producer retries (may duplicate without idempotence)

Batching (linger.ms, batch.size) trades latency for throughput.

A strong answer is:

The producer serializes, partitions, batches, and sends to the leader; replication and ack level determine when the produce call succeeds and whether retries can create duplicates.

Difference between acks=0, acks=1, and acks=all?

Setting	Behavior	Trade-off
`acks=0`	Fire-and-forget; no broker ack	Highest throughput; may lose data
`acks=1`	Leader persisted ack	Fast; loss if leader dies before replication
`acks=all`	All ISR replicas must ack	Strongest durability; higher latency

Pair acks=all with min.insync.replicas ≥ 2 on the broker and topic replication factor ≥ 3 for meaningful safety—otherwise all can still ack with one replica.

Use acks=0 only for metrics or logs where loss is acceptable.

A strong answer is:

acks=0 is fire-and-forget, acks=1 waits for the leader, acks=all waits for the full ISR—I use all for critical events with proper replication and min.insync.replicas.

Why do producer keys matter?

The key determines partition (when non-null):

Key	Effect
Set	`hash(key) % numPartitions` — colocate related events
Null	Round-robin / sticky partitioning — no key-based order

Use cases:

orderId as key — all lifecycle events for one order stay ordered
userId — per-user ordering
Null key — maximum spread when order does not matter

Hot keys can create partition skew—one partition overloaded while others idle.

A strong answer is:

Keys route related events to the same partition for ordering; I avoid hot keys and null keys when I need predictable locality.

What is an idempotent producer?

Enable with enable.idempotence=true (Kafka producer). Broker assigns Producer ID (PID) and tracks sequence numbers per partition—duplicate retries are deduplicated within that producer session.

Scope	Guarantee
Idempotent producer	No duplicate writes to one partition from retries
Transactions	Atomic writes across partitions + offset commits

Idempotence requires acks=all, retries > 0, and appropriate max.in.flight.requests.per.connection (≤ 5 with idempotence enabled).

A strong answer is:

Idempotent producers dedupe retry batches per partition via sequence numbers—necessary but not sufficient alone for end-to-end exactly-once.

Which producer settings do interviewers expect you to know?

Property	Purpose
`bootstrap.servers`	Broker seed list
`key.serializer` / `value.serializer`	Byte format
`acks`	Durability vs latency
`retries` / `delivery.timeout.ms`	Resilience
`enable.idempotence`	Dedupe retries
`compression.type`	`lz4`, `zstd`, `gzip` — bandwidth vs CPU
`linger.ms` / `batch.size`	Batching tuning

Misconfigured max.in.flight.requests.per.connection with retries and no idempotence can reorder batches—classic interview trap.

A strong answer is:

I tune acks, idempotence, and in-flight requests together, and I batch with linger/batch.size only after measuring latency impact.

Consumers, groups, and offsets

What is a consumer group and how does it distribute work?

A consumer group shares one group.id. Each partition is assigned to exactly one consumer in the group at a time.

Consumers	Partitions	Result
3 consumers	6 partitions	~2 partitions each
6 consumers	6 partitions	1:1 — max parallelism for this group
8 consumers	6 partitions	2 idle consumers

Different groups reading the same topic are independent—each maintains its own offsets (fan-out).

Consumer group assignment simulation:

python


def assign_partitions(partitions, consumers):
    assignments = {c: [] for c in consumers}
    for i, p in enumerate(partitions):
        assignments[consumers[i % len(consumers)]].append(p)
    return assignments

parts = [f"P{i}" for i in range(6)]
print(assign_partitions(parts, ["C0", "C1", "C2"]))

Output

When you click Run, you should see each consumer name mapped to two partition labels—round-robin style assignment similar in spirit to range/round-robin assignors.

A strong answer is:

A consumer group divides partitions among members so each partition is processed once per group; extra consumers stay idle unless I add partitions.

How are consumer offsets managed?

An offset is the consumer's position in a partition log. Committed offsets are stored in the internal topic __consumer_offsets.

Commit mode	Behavior
Auto commit	Periodic commit — simple; risk if process after commit
Manual commit	`commitSync` / `commitAsync` after successful processing
Transactional	Offsets committed atomically with producer transaction

At-least-once pattern: process record, then commit offset.

At-most-once pattern: commit offset, then process (may lose on crash).

A strong answer is:

Offsets are the consumer's bookmark—I commit after successful processing for at-least-once, and I disable careless auto-commit on critical pipelines.

What triggers a rebalance and how do you minimize impact?

Rebalance redistributes partitions when group membership changes:

Consumer joins or leaves
Consumer exceeds session.timeout.ms / max.poll.interval.ms
Topic partition count changes
Cooperative sticky assignor revokes subset incrementally

Protocol	Impact
Eager (range)	Revoke all, reassign — stop-the-world pauses
Cooperative sticky	Incremental revoke — preferred in production

Mitigations:

Right-size session.timeout.ms, heartbeat.interval.ms, max.poll.interval.ms
Avoid long processing in poll loop—pause and resume or decouple processing
Static membership (group.instance.id) for rolling restarts

A strong answer is:

Rebalances happen on membership or timeout changes—I use cooperative rebalancing, tune poll intervals, and keep processing off the poll thread for long work.

What is consumer lag and how do you debug it?

Lag = difference between log end offset and consumer committed/current offset—how far behind a consumer is.

Cause	Investigation
Slow processing	JVM GC, DB calls in listener, thread pool exhaustion
Too few consumers	Consumers < partitions
Hot partition	Skewed key distribution
Rebalance storm	Frequent join/leave during deploy
Downstream bottleneck	Sink cannot keep pace

Tools: kafka-consumer-groups.sh --describe, Burrow, Datadog, Prometheus exporters.

Fix: scale consumers (up to partition count), optimize handler, increase partitions with key strategy review, fix poison messages.

A strong answer is:

Lag is how far behind consumption is—I find whether the bottleneck is processing time, partition skew, or rebalance churn, then scale or fix handlers with metrics, not guesses.

What are max.poll.interval.ms and session.timeout.ms?

Setting	Purpose
`session.timeout.ms`	Heartbeat failure → consumer considered dead
`heartbeat.interval.ms`	Must be < session timeout (typically ~1/3)
`max.poll.interval.ms`	Max time between `poll()` calls before rebalance

If your listener processes one message for five minutes without polling, the consumer is kicked out—even if heartbeats still run (depending on version/config).

Pattern: poll often, hand off to worker pool, use pause/resume for backpressure.

A strong answer is:

session.timeout detects dead members; max.poll.interval caps time between polls—I never block the poll loop on long synchronous work.

What is isolation.level=read_committed?

Consumers default to read_uncommitted—see all messages including those from aborted transactions.

read_committed hides aborted transactional batches—required for exactly-once consume-transform-produce loops so consumers do not read rolled-back data.

Pair with transactional producers and disabled auto-commit.

A strong answer is:

read_committed lets consumers see only committed transactional records—part of exactly-once pipelines with transactional producers.

Replication, durability, and fault tolerance

How does Kafka ensure data durability and fault tolerance?

Each partition has replication factor N—one leader and N−1 followers on different brokers.

Mechanism	Role
Leader replication	Followers fetch from leader
ISR (in-sync replicas)	Replicas caught up enough to be promotable
Leader election	New leader chosen from ISR on failure
`unclean.leader.election.enable=false`	Avoid data loss from out-of-sync promotion

Producer acks=all waits for ISR acks—not merely any replica.

A strong answer is:

Replication across brokers plus ISR tracking and safe leader election keeps partitions available after node loss—I pair that with acks=all and min.insync.replicas.

What is the in-sync replica set (ISR)?

ISR = replicas whose lag is below replica.lag.time.max.ms (or byte lag thresholds in older configs).

Only ISR members can become leader if unclean.leader.election.enable=false.

If ISR shrinks to one replica:

Cluster still serves traffic
No redundancy until followers catch up
Risk if that single broker fails

Monitor ISR shrink events—they predict durability risk.

A strong answer is:

ISR is the set of replicas safe to promote— I alert when ISR size drops and I never enable unclean election on critical topics.

What is min.insync.replicas?

Broker/topic setting min.insync.replicas defines how many replicas must ack a write for acks=all to succeed.

Example: replication.factor=3, min.insync.replicas=2

Tolerates one broker loss without stopping writes
Producer with acks=all fails if only one ISR member available—prefer failing writes over silent data loss

Interview trap: acks=all with min.insync.replicas=1 and RF=3 still allows single-replica commits.

A strong answer is:

min.insync.replicas enforces how many replicas must ack before a write counts—I set it to 2 with RF=3 so one broker can die without weakening durability.

What happens when a Kafka broker fails?

Scenario	Effect
Follower dies	ISR may shrink; leaders continue
Leader dies	Controller elects new leader from ISR
Multiple brokers die	Partitions with no ISR leader go offline
Controller failure	Failover to standby controller (KRaft quorum)

Clients refresh metadata and discover new leaders—brief produce/fetch errors during election.

Operations: rack awareness (broker.rack) for cross-AZ placement, monitor under-replicated partitions.

A strong answer is:

Follower loss reduces redundancy; leader loss triggers ISR election; clients retry after metadata refresh—I design RF and rack awareness so single-AZ loss does not take topics offline.

Delivery semantics and exactly-once

Explain at-most-once, at-least-once, and exactly-once semantics.

Semantic	Guarantee	Typical implementation
At-most-once	May lose; no duplicates	Commit offset before process; `acks=0`
At-least-once	No loss; may duplicate	Process then commit; retries without idempotence
Exactly-once	Neither loss nor duplicate	Idempotent producer + transactions / Streams EOS

Delivery semantics simulation:

python


def at_most_once(process, commit, events):
    lost = []
    for e in events:
        commit(e)
        if e == "crash":
            lost.append("unprocessed-after-commit")
            break
        process(e)
    return lost

def at_least_once(process, commit, events):
    dup = []
    processed = set()
    for e in events:
        if e != "crash":
            process(e)
            processed.add(e)
        else:
            for p in processed:
                process(p)
                dup.append(p)
            break
        commit(e)
    return dup

events = ["A", "B", "crash"]
print("at-most-once lost:", at_most_once(lambda x: None, lambda x: None, events))
print("at-least-once dups:", at_least_once(lambda x: None, lambda x: None, events))

Output

When you click Run, you should see at-most-once report a lost-after-commit case and at-least-once list duplicates after a simulated crash—illustrating why idempotent handlers matter.

A strong answer is:

At-most-once commits early; at-least-once commits after work and may retry duplicates; exactly-once needs broker and application cooperation—I default to at-least-once with idempotent consumers unless EOS is justified.

How do you achieve exactly-once processing in Kafka?

Broker-side pieces:

Idempotent producer — dedupe per-partition retries
Transactions — initTransactions, beginTransaction, commitTransaction
sendOffsetsToTransaction — atomic offset commit with output records
Consumer isolation.level=read_committed

Application-side: processing must be deterministic; external sinks need idempotent writes or transactional stores—EOS in Kafka does not magically dedupe your database.

Kafka Streams offers processing.guarantee=exactly_once_v2 packaging the pattern.

A strong answer is:

Exactly-once combines idempotent producers, transactional writes, and committed offset reads—I still make downstream sinks idempotent because EOS stops at Kafka's boundary.

Walk through a consume-transform-produce transaction.

Steps:

Producer initTransactions()
Consumer polls records
beginTransaction()
Process and produce output records
sendOffsetsToTransaction with consumed offsets
commitTransaction() — all visible or none

On failure: abortTransaction() — consumers with read_committed never see partial output.

Requires unique transactional.id per producer instance (fence zombies after failover).

A strong answer is:

I wrap output records and input offset commits in one transaction so a crash cannot double-publish or skip commits visible to downstream readers.

Why do you still need idempotent consumers with at-least-once?

Most teams run at-least-once (simpler than full transactions). Retries and rebalance redelivery mean duplicate delivery is normal.

Consumer strategies:

Strategy	Example
Natural idempotence	Upsert by primary key
Dedup store	Redis/DB of processed event IDs
Transactional DB	Unique constraint on `event_id`

"Exactly-once" in interviews often means effective exactly-once—at-least-once transport + idempotent processing.

A strong answer is:

At-least-once will redeliver—I design handlers to dedupe on business keys or store processed IDs so duplicates are harmless.

Schema Registry, Connect, and stream processing

What is Schema Registry and why use Avro with Kafka?

Schema Registry (often Confluent) stores Avro/JSON/Protobuf schemas with versioning.

Benefit	Detail
Compatibility	BACKWARD, FORWARD, FULL modes
Evolution	Add fields with defaults safely
Compact payloads	Avro binary + schema id in message

Producers send schema ID; consumers fetch schema from registry—contract between teams.

Interview talking point: breaking schema change blocked by compatibility mode—coordinate with consumers before deploy.

A strong answer is:

Schema Registry versions contracts between producers and consumers—I use backward-compatible Avro evolution and test compatibility in CI.

What is Kafka Connect?

Kafka Connect is a framework for source and sink connectors:

Type	Example
Source	Debezium CDC from PostgreSQL → Kafka
Sink	S3, Elasticsearch, JDBC sink

Runs as distributed workers with offset tracking in Kafka—fits ETL without custom consumer boilerplate.

Pair with SQL interviews when discussing CDC and warehouse loads.

A strong answer is:

Kafka Connect moves data in and out with connectors—I use CDC sources and managed sinks instead of one-off consumers when the pattern is standard.

What is Kafka Streams at interview level?

Kafka Streams is a Java library for stream processing on top of Kafka:

Stateful operations (aggregations, joins) with changelog topics
Exactly-once v2 processing guarantee option
No separate cluster like Flink—runs as your app

Use when team is JVM-heavy and logic fits Kafka-centric topologies; use Flink/Spark when you need complex event-time windowing at massive scale.

A strong answer is:

Kafka Streams embeds processing in Java apps with state stores and EOS options—I choose it for JVM microservices, Flink when operability needs a dedicated cluster.

What is log compaction vs time-based retention?

Policy	Behavior
delete (time/size)	Old segments removed after `retention.ms`
compact	Keep latest record per key; tombstones remove keys

Compaction suits changelog topics—config snapshots, __consumer_offsets, compacted state topics in Streams.

Not a substitute for infinite raw event history—compacted topics are key-value latest views.

A strong answer is:

Time retention drops old data by age; compaction keeps the latest value per key for changelog-style topics like config or state stores.

Operations, tuning, and monitoring

How do retention settings affect topics?

Setting	Effect
`retention.ms`	Max age before segment delete
`retention.bytes`	Max size per partition
`segment.ms` / `segment.bytes`	File roll size

Long retention enables replay for new consumer groups or reprocessing—costs disk.

Tiered storage (vendor/cloud features) moves old segments to object storage.

A strong answer is:

Retention balances replay needs and disk cost—I size retention per topic based on compliance and consumer onboarding, not one global default.

What metrics do you monitor in production Kafka?

Metric	Signals
Under-replicated partitions	Replication lag
Offline partitions	Availability incident
Consumer lag	Processing backlog
Request latency (produce/fetch)	Broker load
ISR shrink/expand	Durability risk
Disk usage	Retention pressure

Alert on lag SLO breach and URP > 0 sustained—not only broker up/down.

A strong answer is:

I monitor lag, under-replication, and disk—I tie alerts to consumer SLOs and broker health, not just process running.

How do you plan topic partition count and broker capacity?

Factor	Guidance
Target throughput	Partitions parallelize consumers
Ordering	Fewer partitions if strict global order (bottleneck)
Consumer count	Partitions ≥ max consumers in group
Broker load	Each partition has file handles and leader CPU
Future growth	Easier to add partitions than shrink

Load test with expected message size and compression; watch disk I/O and network.

A strong answer is:

I size partitions for peak consumer parallelism and key ordering needs, then load-test brokers—partition count is hard to reduce later.

How do you handle poison messages?

Poison message — fails processing every retry, blocks partition or causes infinite loop.

Patterns:

Pattern	Detail
Retry topic	Limited retries with backoff
DLT (dead-letter topic)	Spring `DeadLetterPublishingRecoverer`
Quarantine store	Manual triage
Skip with metric	Only for non-critical data

Always log key, offset, partition, stack trace; alert on DLT rate.

A strong answer is:

I cap retries, route failures to a dead-letter topic with metadata, and alert—never spin forever on the same offset without visibility.

How do you secure Kafka in production?

Layer	Control
Network	TLS encryption in transit
Authentication	SASL (SCRAM, OAuth/OIDC)
Authorization	ACLs or RBAC (managed offerings)
Multi-tenant	Separate topics + ACLs per team

Never expose plaintext brokers to the public internet; rotate credentials; least-privilege ACLs per producer/consumer principal.

A strong answer is:

TLS plus SASL authentication and topic-level ACLs—I give each service only produce or consume rights on the topics it needs.

Comparisons and system design

Kafka vs RabbitMQ — when do you pick each?

Factor	RabbitMQ	Kafka
Model	Smart broker, queues	Dumb broker, smart consumers
Routing	Exchanges, bindings	Topics/partitions
Retention	Delete on ack	Configurable log
Replay	Limited	First-class
Throughput	Moderate	Very high horizontal scale
Task queues	Excellent	Possible but not primary fit

Pick RabbitMQ for RPC-style work queues; Kafka for event streams, audit logs, and multiple independent readers.

A strong answer is:

RabbitMQ for task routing and low-latency queues; Kafka for durable event streams, replay, and high-throughput fan-out—I do not force all messaging through one tool.

How do you design events for microservices?

Practices:

Practice	Why
Past-tense names	`OrderPlaced`, not `PlaceOrder`
Versioned payloads	Schema evolution
Include metadata	`eventId`, `occurredAt`, `correlationId`
Idempotent handlers	At-least-once reality
Bounded context topics	Avoid god topic

Align with full stack and Spring Boot service boundaries.

A strong answer is:

Events are contracts—I name them as facts, version schemas, include correlation IDs, and design consumers to tolerate redelivery.

What is change data capture (CDC) with Kafka?

CDC streams database row changes to Kafka—often Debezium on Kafka Connect.

Use	Benefit
Cache invalidation	Downstream read models update
Search index	Elasticsearch sync
Warehouse	Snowflake/BigQuery ingestion
Decouple services	Without dual writes

Challenge: ordering per key, schema changes, initial snapshot load.

A strong answer is:

CDC publishes database changes as events—I use Debezium for near-real-time sync instead of brittle dual-write patterns between services and DBs.

Java and Spring Boot integration

How do you configure a Kafka producer in Spring Boot?

Add spring-kafka and configure:

properties


spring.kafka.bootstrap-servers=localhost:9092
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.springframework.kafka.support.serializer.JsonSerializer
spring.kafka.producer.acks=all
spring.kafka.producer.properties.enable.idempotence=true

java


@Service
@RequiredArgsConstructor
public class OrderEventPublisher {
    private final KafkaTemplate<String, OrderPlacedEvent> kafkaTemplate;

    public void publish(OrderPlacedEvent event) {
        kafkaTemplate.send("orders", event.orderId(), event);
    }
}

Use orderId as key for partition locality. Configure retries and delivery timeout explicitly for critical topics.

A strong answer is:

Spring Kafka wraps KafkaTemplate with serializers and producer props—I set acks, idempotence, and keys explicitly for durable ordered per-order events.

How do you configure a @KafkaListener consumer in Spring Boot?

java


@Component
@Slf4j
public class OrderListener {

    @KafkaListener(topics = "orders", groupId = "billing-service", concurrency = "3")
    public void onOrder(OrderPlacedEvent event) {
        log.info("billing {}", event.orderId());
    }
}

Setting	Note
`concurrency`	Listener threads—≤ partitions for efficiency
`groupId`	Consumer group per logical service
`autoStartup`	Control lifecycle
Error handler	`DefaultErrorHandler` + DLT

Disable enable-auto-commit for manual ack mode when you need at-least-once discipline:

java


@KafkaListener(...)
public void listen(ConsumerRecord<String, OrderPlacedEvent> record, Acknowledgment ack) {
    process(record.value());
    ack.acknowledge();
}

A strong answer is:

I match listener concurrency to partitions, use explicit ack when needed, and wire error handlers to dead-letter topics instead of infinite retry loops.

How do you test Kafka integration in Java?

Approach	Use
`@EmbeddedKafka`	Fast unit/integration tests in JVM
Testcontainers	Real broker behavior in CI
`KafkaTemplate` + `@KafkaListener`	End-to-end in test context

Assert with KafkaTestUtils.getSingleRecord or awaitility on side effects.

Mirror Spring Boot testing pyramid—many handler unit tests, few full broker tests.

A strong answer is:

I unit-test handlers without Kafka, use EmbeddedKafka or Testcontainers for wiring tests, and keep broker tests focused on serialization and ack behavior.

Senior scenarios and final prep

Scenario: Consumer lag spiked after a deployment — how do you respond?

Step	Action
1	Confirm which group/topic/partition—dashboard or `kafka-consumer-groups`
2	Correlate with deploy time—new code slower? rebalance storm?
3	Check hot partitions — skewed keys
4	Thread/GC logs on consumers; DB latency in handler
5	Roll back if regression; scale consumers if CPU-bound and partitions allow
6	Temporary partition increase only with key strategy review
7	Post-incident: add lag alert, load test, poison message guard

A strong answer is:

I isolate the group and partition skew, compare release timing, inspect handler latency and rebalances, then fix code or scale consumers with metrics proving the bottleneck.

What should you rehearse the week before a Kafka interview?

Checklist:

Whiteboard producer → topic partitions → consumer group
Explain acks=all + min.insync.replicas with numbers
Contrast at-least-once vs exactly-once with idempotent consumer
Walk rebalance triggers and cooperative protocol
Partition key strategy for one real domain (orders, payments)
Spring Kafka producer + @KafkaListener + DLT pattern
One lag or broker failure STAR story
Java concurrency refresh for poll/thread issues
Spring Boot microservice context

A strong answer is:

I rehearse one event pipeline end to end—keys, acks, consumer group, failure handling, and Java config—so answers sound like production war stories, not glossary recitation.

Pattern cheat sheet (quick reference)

Need	Kafka starting point
Per-key ordering	Producer key → partition
Max consumer parallelism	Partition count ≥ consumers
Durability	`acks=all`, RF=3, `min.insync.replicas=2`
Dedupe retries	`enable.idempotence=true`
At-least-once	Process then commit offset
EOS in Kafka	Transactions + `read_committed`
Schema evolution	Schema Registry + backward compatible Avro
Poison messages	DLT + limited retries
Lag triage	Group describe, skew, handler time, rebalances
Java integration	Spring `KafkaTemplate` + `@KafkaListener`
New clusters	KRaft metadata mode

References

Official Apache Kafka documentation

On-site prep

Summary

Kafka interviews connect partitions, consumer groups, and offsets to delivery semantics and operational trade-offs—not broker definitions alone. Java-heavy loops add Spring Kafka configuration, listener concurrency, and dead-letter handling. Answer aloud and compare your structure to each section. Pair with Spring Boot and SQL when events land in analytics stores.

Interview context and how to prepare

Architecture and core concepts

Producers

Consumers, groups, and offsets

Replication, durability, and fault tolerance

Delivery semantics and exactly-once

Schema Registry, Connect, and stream processing

Operations, tuning, and monitoring

Comparisons and system design

Java and Spring Boot integration

Senior scenarios and final prep

Pattern cheat sheet (quick reference)

References

Summary

Related Articles

OOP Interview Questions and Answers

Selenium Interview Questions and Answers

Spring Boot Interview Questions and Answers for Experienced Developers

Search GoLinuxCloud