PostgreSQL interview questions, postgresql developer interview questions, and postgresql dba interview questions appear in backend, full-stack, data engineering, and platform roles where Postgres is the system of record. Interviewers go beyond generic SQL—they probe MVCC and vacuum, index type choice (B-tree vs GIN vs BRIN), EXPLAIN ANALYZE literacy, JSONB trade-offs, replication topology, and how you debug bloat or slow queries in production.
Below are 45 questions with elaborate answers; technical sections include a strong answer sample you can say aloud. Pair with DBMS interview questions for ER modeling, normalization, and ACID fundamentals, SQL technical interview questions for joins, window functions, and query-writing drills, MongoDB interview questions when interviewers compare document vs relational stores, pandas interview questions for analytics pipelines, Django interview questions for experienced developers and Spring Boot interview questions for ORM integration, Kafka interview questions for CDC and event sourcing, and data science interview questions for analytics workloads.
EXPLAIN (ANALYZE, BUFFERS) on a slow query, add the right index type for a JSONB filter, and explain what autovacuum does when updates are heavy—out loud, as if in a senior loop.
Tested on: Ubuntu 25.04 (Plucky Puffin); kernel 6.14.0-37-generic; PostgreSQL 17.7.
Interview context and how to prepare
What do PostgreSQL interviews actually test?
PostgreSQL interviews test whether you understand how the database behaves under concurrency and scale—not only SELECT syntax.
| Layer | What interviewers probe |
|---|---|
| SQL fluency | Joins, CTEs, window functions, aggregates |
| Postgres internals | MVCC, vacuum, transaction IDs |
| Indexing | B-tree, GIN, partial/expression indexes |
| Performance | EXPLAIN plans, stats, work_mem |
| Types | JSONB, arrays, enums, ranges |
| Operations (DBA) | Replication, backups, PITR, partitioning |
| Security | Roles, RLS, least privilege |
| Role | Emphasis |
|---|---|
| Developer | ORM-safe queries, migrations, JSONB when appropriate |
| Backend | Connection pooling, transactions, locking behavior |
| Data engineer | Logical replication, CDC slots, bulk load |
| DBA | Vacuum tuning, HA, backup/restore, capacity |
Beyond generic SQL, senior loops lean on MVCC and vacuum, index type selection, and reading EXPLAIN plans—the topics that separate application CRUD from database engineering.
PostgreSQL developer vs DBA interview focus?
| Topic | Developer loop | DBA loop |
|---|---|---|
| Queries | ORM pitfalls, N+1, pagination | Slow query triage, pg_stat_statements |
| Schema | Migrations, constraints, indexes | Partitioning, tablespaces, extensions |
| Concurrency | Transaction boundaries, isolation | Lock monitoring, long transactions |
| Storage | JSONB vs columns | Bloat, autovacuum, fillfactor |
| HA | Connection strings, read replicas | Streaming replication, failover, PITR |
| Security | Parameterized queries | Roles, RLS, pg_hba.conf |
Many postgresql developer interview questions still touch vacuum and indexes because app code causes bloat and bad plans.
What is a typical PostgreSQL interview loop?
| Round | Duration | Focus |
|---|---|---|
| Screening | 30 min | Stack, Postgres version, scale |
| SQL live coding | 45–60 min | Joins, windows, CTEs—see SQL prep |
| Postgres depth | 45 min | MVCC, indexes, EXPLAIN |
| System design | 45–60 min | Read replicas, caching, sharding vs partitioning |
| DBA scenario (senior) | 45 min | Replication lag, backup restore, vacuum crisis |
System-design rounds often ask how Postgres fits multi-tenant SaaS or event-sourced architectures—read replicas, connection pooling, and logical replication slots.
What is a realistic 4–6 week PostgreSQL prep plan?
| Week | Focus | Output |
|---|---|---|
| 1 | SQL joins, windows, CTEs | 20 problems from SQL guide |
| 2 | MVCC, isolation, locks | Explain phantom vs repeatable read on Postgres |
| 3 | Indexes + EXPLAIN | Fix a seq scan with partial or composite index |
| 4 | JSONB, arrays, types | Model flexible attributes with GIN index |
| 5 | Replication, backup concepts | Draw primary/replica + PITR flow |
| 6 | Mock scenarios | Slow query + bloat + failover narrative |
Run a local PostgreSQL 16+ instance and practice \d+, \di, EXPLAIN ANALYZE.
Why PostgreSQL over MySQL in interviews?
Interviewers want trade-off awareness, not fanboy answers.
| Area | PostgreSQL strength |
|---|---|
| SQL standard | Richer SQL, window functions, CTEs (historically stronger) |
| Concurrency | MVCC model, fewer reader/writer surprises |
| Types | JSONB, arrays, ranges, custom types |
| Extensions | PostGIS, pgvector, citext |
| Replication | Physical + logical replication flexible |
MySQL/InnoDB also uses MVCC—Postgres interviews go deeper on vacuum, index types, and planner behavior.
Architecture and PostgreSQL fundamentals
PostgreSQL architecture — process model basics?
PostgreSQL uses a multi-process model (not multi-threaded like some databases):
| Process | Role |
|---|---|
| postmaster | Parent; listens for connections |
| backend | One per client connection (runs queries) |
| background workers | autovacuum, checkpointer, WAL writer, stats collector |
| WAL | Write-Ahead Log for durability and replication |
Data files live under PGDATA (often /var/lib/postgresql/<version>/main on Ubuntu). Each database is a collection of schemas and tables stored in heap files.
Connection cost motivates PgBouncer pooling in production.
A strong answer is:
One backend process per connection—so pooling matters at scale—and WAL makes commits durable and feeds replication.
Database, schema, table — how does naming work?
| Level | Example |
|---|---|
| Cluster | One PostgreSQL server instance |
| Database | Isolated namespace of schemas (app_prod) |
| Schema | Logical grouping (public, billing) |
| Table | billing.invoices |
search_path controls unqualified name resolution—migrations should set explicit schemas to avoid public surprises.
CREATE SCHEMA billing;
CREATE TABLE billing.invoices (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
amount numeric(12,2) NOT NULL
);A strong answer is:
I use schemas to separate domains in one database and qualify names in migrations so search_path never hides the wrong table.
Important PostgreSQL data types for interviews?
| Type | Use |
|---|---|
bigint / int |
IDs, counters |
numeric(p,s) |
Money—avoid float |
text vs varchar |
text has no perf penalty in Postgres |
timestamptz |
Always prefer for instants (timezone-aware) |
uuid |
Distributed IDs (gen_random_uuid()) |
jsonb |
Semi-structured documents |
array |
Native arrays with operators |
enum |
Fixed label sets (migration caution) |
Using timestamp without time zone for global apps is a common interview trap.
A strong answer is:
timestamptz for events, numeric for money, bigint or uuid for keys—I avoid float for currency and document why jsonb is not a default for every column.
Constraints and referential integrity?
| Constraint | Purpose |
|---|---|
PRIMARY KEY |
Unique + not null row identity |
FOREIGN KEY |
Referential integrity |
UNIQUE |
Alternate keys |
NOT NULL |
Required columns |
CHECK |
Row-level rules |
EXCLUDE |
Prevent overlapping ranges (scheduling) |
CREATE TABLE order_items (
order_id bigint REFERENCES orders(id) ON DELETE CASCADE,
sku text NOT NULL,
qty int CHECK (qty > 0),
PRIMARY KEY (order_id, sku)
);ON DELETE actions: CASCADE, SET NULL, RESTRICT, NO ACTION—know ORM defaults.
A strong answer is:
I enforce business rules in CHECK and FK constraints close to data—migrations add indexes on FK columns to avoid slow cascades and joins.
SERIAL vs IDENTITY vs UUID?
| Approach | Notes |
|---|---|
SERIAL |
Legacy shorthand for sequence + default |
GENERATED … AS IDENTITY |
SQL standard; preferred in new schemas |
uuid |
No coordination across services; larger indexes |
CREATE TABLE users (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email citext UNIQUE NOT NULL
);Sequences can have gaps (rollback, crash)—interviewers check you do not assume gapless IDs.
A strong answer is:
IDENTITY for monotonic numeric PKs in single-database apps; UUID when services generate IDs independently—I never assume sequences are gap-free.
MVCC, transactions, and concurrency
What is MVCC in PostgreSQL?
Multi-Version Concurrency Control keeps multiple row versions so readers do not block writers and writers do not block readers.
Each row version has system columns:
| Column | Meaning |
|---|---|
xmin |
Inserting transaction ID |
xmax |
Deleting/updating transaction ID (if set) |
A transaction sees a snapshot of committed rows visible at snapshot start. UPDATE creates a new row version; old version becomes a dead tuple until vacuum reclaims space.
This differs from in-place overwrite databases and explains why heavy UPDATE causes bloat.
A strong answer is:
MVCC gives snapshot isolation behavior—reads see a consistent view while writes create new tuple versions; dead versions need vacuum, which is core Postgres DBA knowledge.
Transaction isolation levels in PostgreSQL?
Postgres implements:
| Level | Behavior |
|---|---|
| Read committed (default) | Each statement sees latest committed data |
| Repeatable read | Snapshot for whole transaction |
| Serializable | Serializable snapshot isolation (SSI) |
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- consistent reads within this transaction
COMMIT;Anomalies:
- Read committed allows non-repeatable reads between statements
- Repeatable read blocks phantom inserts in Postgres (stronger than standard RR)
- Serializable may raise serialization failures—apps must retry
A strong answer is:
Default read committed is fine for most OLTP; I use repeatable read or serializable when invariants need stronger guarantees and handle retry on 40001.
Locks and blocking — what should developers know?
Postgres uses row-level locks for UPDATE/DELETE and table-level locks for DDL.
Useful views:
SELECT * FROM pg_locks WHERE NOT granted;
SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock';| Pattern | Risk |
|---|---|
| Long transaction holding rows | Blocks vacuum, causes bloat |
SELECT … FOR UPDATE |
Explicit row locks for workflows |
| DDL on hot tables | ACCESS EXCLUSIVE blocks reads/writes |
ORM sessions left open across HTTP requests are a common production blocker.
A strong answer is:
I keep transactions short, index FK columns, and check pg_stat_activity for lock chains—long open transactions hurt vacuum and throughput.
How does PostgreSQL handle deadlocks?
Deadlock detector picks a victim transaction and aborts it with SQLSTATE 40P01. Application must retry.
Prevention:
- Lock rows in consistent order
- Keep transactions small
- Use advisory locks for application-level ordering
BEGIN;
SELECT * FROM accounts WHERE id IN (1, 2) ORDER BY id FOR UPDATE;
-- updates
COMMIT;Interviewers want you to mention retry with backoff in app code.
A strong answer is:
Postgres detects deadlocks and kills one transaction—I retry idempotent operations and design lock order to prevent cycles.
WAL and durability — what happens on COMMIT?
Write-Ahead Logging: changes go to WAL on disk before heap pages. COMMIT waits for WAL flush (depending on synchronous_commit).
| Setting | Trade-off |
|---|---|
synchronous_commit = on |
Durability default |
off / local |
Faster commits, risk on crash |
WAL enables PITR, streaming replication, and crash recovery.
A strong answer is:
COMMIT is durable when WAL is flushed—I do not disable synchronous_commit without understanding data-loss risk; WAL is also the replication stream.
Indexing strategies
PostgreSQL index types — when to use each?
| Index | Best for |
|---|---|
| B-tree (default) | =, <, >, BETWEEN, ORDER BY |
| Hash | Equality only (niche) |
| GIN | JSONB, arrays, full-text |
| GiST | Geometry, ranges, full-text |
| BRIN | Very large, naturally ordered data (time-series) |
| SP-GiST | Partitioned/non-balanced structures |
CREATE INDEX idx_orders_created ON orders USING brin (created_at);
CREATE INDEX idx_docs_body ON docs USING gin (body jsonb_path_ops);Wrong index type wastes space and slows writes—interviewers ask you to match operator class to query.
A strong answer is:
B-tree by default; GIN for jsonb @> and full-text; BRIN for append-only timestamps on huge tables—I justify type from the WHERE clause operators.
Composite, partial, and covering indexes?
Composite — multiple columns; leftmost prefix rule applies.
CREATE INDEX idx_orders_user_created ON orders (user_id, created_at DESC);Partial — indexes subset of rows:
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';Smaller index, faster writes, perfect for hot filtered queries.
Covering index (INCLUDE):
CREATE INDEX idx_orders_user_inc ON orders (user_id) INCLUDE (total_cents);Enables index-only scans when visibility map allows.
A strong answer is:
Partial indexes for skewed filters like status='pending'; composite column order matches equality filters first, then range/sort.
Indexing JSONB — jsonb_ops vs jsonb_path_ops?
JSONB stores binary JSON with efficient operators.
-- Containment query
SELECT * FROM products
WHERE attrs @> '{"color": "red"}';
CREATE INDEX idx_products_attrs ON products
USING gin (attrs jsonb_path_ops);| Operator class | Supports | Size |
|---|---|---|
jsonb_ops |
More operators | Larger |
jsonb_path_ops |
Mainly @> |
Smaller, faster for containment |
Tested containment on PostgreSQL 17:
SELECT '{"tags": ["a","b"]}'::jsonb @> '{"tags": ["a"]}'::jsonb;
-- returns truePrefer real columns for stable typed fields; JSONB for dynamic attributes.
A strong answer is:
jsonb_path_ops GIN when I only need @> containment; typed columns when the field is stable and filter-heavy.
Index trade-offs — when not to index?
Indexes speed reads and slow writes (every INSERT/UPDATE maintains index pages).
Skip or delay indexes when:
- Table is tiny (seq scan cheaper)
- Write-heavy with rare reads
- Low selectivity column alone (e.g. boolean flag)
- Wrong type for query pattern
Monitor unused indexes:
SELECT * FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND schemaname = 'public';A strong answer is:
I index for proven query patterns from pg_stat_statements, drop unused indexes, and accept seq scans on small tables.
CREATE INDEX CONCURRENTLY — why use it?
Normal CREATE INDEX takes SHARE lock blocking writes. CONCURRENTLY builds without blocking writes—slower, two-pass build.
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);Failures leave invalid index—check pg_index.indisvalid and REINDEX if needed.
Production migrations on large tables almost always use CONCURRENTLY.
A strong answer is:
CONCURRENTLY for online index adds on live tables—I monitor for invalid indexes and reindex if the build failed mid-flight.
Query optimization and EXPLAIN
EXPLAIN and EXPLAIN ANALYZE — how do you read plans?
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 42 AND created_at > now() - interval '30 days';| Node | Concern |
|---|---|
| Seq Scan | Full table read—OK if small or most rows match |
| Index Scan / Bitmap | Index used—check rows removed by filter |
| Nested Loop | Good for small outer sets |
| Hash Join | Large equi-joins |
| Sort | May spill to disk if work_mem exceeded |
Compare estimated vs actual rows—large gaps mean stale statistics → ANALYZE.
A strong answer is:
EXPLAIN ANALYZE shows real timings; I look for seq scans on big tables, bad row estimates, and buffer hits versus reads.
Why might Postgres choose a sequential scan?
Reasons:
| Cause | Detail |
|---|---|
| Small table | Cheaper to read whole heap |
| Low selectivity | Reading most rows—index + heap fetch costs more |
| Missing stats | Run ANALYZE |
| Function on column | WHERE lower(email) = blocks plain index |
| Wide result set | Index scan + heap visits lose |
Fix patterns: expression index, partial index, increase stats target, rewrite predicate.
A strong answer is:
Seq scan is not always wrong—I check row counts and selectivity; fix stats or add a targeted index when the scan is hot and large.
pg_stat_statements and slow query triage?
Extension pg_stat_statements aggregates query stats (calls, total time, mean time, rows).
SELECT query, calls, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;Triage flow:
- Find top total time queries
EXPLAIN (ANALYZE)the normalized query- Add index or rewrite SQL
- Verify with load test
Pair with log_min_duration_statement for raw slow logs.
A strong answer is:
I rank by total_exec_time in pg_stat_statements, EXPLAIN the winner, fix plan or index, and re-measure—not guess from ORM logs alone.
Join strategies and statistics?
Planner picks join order using table statistics (histograms,ndistinct).
Tips:
ANALYZEafter large data changes- Join on indexed columns
- Avoid cross joins accidentally in ORMs
JOINsmaller filtered set first when rewriting manually
-- Filter early in CTE
WITH recent AS (
SELECT id FROM orders WHERE created_at > now() - interval '7 days'
)
SELECT u.email, COUNT(*)
FROM recent r
JOIN orders o ON o.id = r.id
JOIN users u ON u.id = o.user_id
GROUP BY u.email;A strong answer is:
Accurate stats drive join choices—I analyze after bulk loads and ensure join keys are indexed.
Pagination — OFFSET vs keyset?
OFFSET pagination degrades on large offsets (scans skipped rows):
-- Slow at high page numbers
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;Keyset (seek) pagination:
SELECT * FROM orders
WHERE id > 100000
ORDER BY id
LIMIT 20;Requires stable sort key; handle tie-breaker column.
A strong answer is:
Keyset pagination on indexed columns for infinite scroll; OFFSET only for small admin pages.
JSONB, advanced SQL, and extensions
JSON vs JSONB in PostgreSQL?
| JSON | JSONB | |
|---|---|---|
| Storage | Text, preserves whitespace | Binary decomposed |
| Queries | Re-parsed each time | Faster operators |
| Indexing | Limited | GIN indexes |
| Writes | Faster insert | Slight overhead |
Use JSONB for querying; JSON only if you need exact text preservation.
SELECT attrs->>'sku' AS sku,
attrs->'dimensions'->>'width' AS width
FROM products;A strong answer is:
JSONB for application payloads I filter on; extract hot keys to typed columns when queries stabilize.
CTEs and window functions — Postgres strengths?
PostgreSQL optimizes CTEs (inline or materialize depending on version/planner)—use for readable pipelines.
WITH monthly AS (
SELECT date_trunc('month', created_at) AS m, SUM(amount) AS revenue
FROM orders
GROUP BY 1
)
SELECT m, revenue,
revenue - LAG(revenue) OVER (ORDER BY m) AS mom_delta
FROM monthly;Window functions: ROW_NUMBER, RANK, LEAD/LAG, SUM() OVER (PARTITION BY …)—see SQL technical guide for more patterns.
A strong answer is:
CTEs clarify multi-step analytics; windows for ranking and period-over-period without self-join explosion.
LATERAL joins — practical use?
LATERAL lets subquery reference columns from preceding FROM item—great for top-N per group:
SELECT c.name, r.*
FROM customers c
CROSS JOIN LATERAL (
SELECT *
FROM orders o
WHERE o.customer_id = c.id
ORDER BY o.created_at DESC
LIMIT 3
) r;Alternative to window + filter patterns; planner can use indexes per customer.
A strong answer is:
LATERAL for correlated top-N per group—I reach for it when a subquery needs outer row values.
Common PostgreSQL extensions in interviews?
| Extension | Use |
|---|---|
| pg_stat_statements | Query performance stats |
| citext | Case-insensitive text |
| pg_trgm | Fuzzy text search (LIKE acceleration) |
| PostGIS | Geospatial |
| pgvector | Vector embeddings / similarity search |
| uuid-ossp / pgcrypto | UUID and crypto helpers |
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_users_name_trgm ON users USING gin (name gin_trgm_ops);A strong answer is:
I enable extensions deliberately—pg_stat_statements in every prod cluster, pg_trgm or pgvector when product needs demand it.
UPSERT — ON CONFLICT?
INSERT INTO inventory (sku, qty)
VALUES ('ABC', 10)
ON CONFLICT (sku) DO UPDATE
SET qty = inventory.qty + EXCLUDED.qty;Requires UNIQUE or PRIMARY KEY on conflict target.
EXCLUDED refers to proposed insert row. Watch lock contention on hot keys.
A strong answer is:
ON CONFLICT for idempotent writes—I define unique constraints explicitly and handle conflict updates atomically.
DBA — vacuum, bloat, and maintenance
VACUUM and autovacuum — why are they critical?
UPDATE/DELETE leave dead tuples. VACUUM reclaims space for reuse, updates visibility map, prevents transaction ID wraparound.
| Command | Effect |
|---|---|
VACUUM |
Reclaim dead tuple space (non-blocking) |
VACUUM ANALYZE |
Vacuum + refresh planner stats |
VACUUM FULL |
Rewrites table—locks exclusively (rare) |
Autovacuum runs automatically—tune, do not disable.
Long transactions block vacuum cleanup → bloat.
A strong answer is:
Vacuum reclaims MVCC dead tuples and protects against xid wraparound—I tune autovacuum on high-churn tables and kill long idle transactions.
Table bloat — causes and fixes?
Bloat = heap pages holding dead tuples or sparse space.
| Cause | Fix |
|---|---|
| Heavy updates | Tune autovacuum (scale_factor, threshold) |
| Long open transactions | End idle sessions |
| Bulk delete | VACUUM or pg_repack online |
Check bloat estimates with pg_stat_user_tables (n_dead_tup) and tools like pgstattuple.
pg_repack rewrites online without VACUUM FULL exclusive lock.
A strong answer is:
Bloat from dead tuples and blocked vacuum—I lower autovacuum thresholds on hot tables and use pg_repack before VACUUM FULL in production.
ANALYZE and statistics?
ANALYZE samples table data to update pg_statistic for the planner.
When to run:
- After large COPY/INSERT
- When EXPLAIN estimates diverge from actual
- Autovacuum analyze handles routine cases
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;
ANALYZE orders;Higher statistics target improves estimates on skewed columns (costs more analyze time).
A strong answer is:
Bad plans often mean stale stats—I ANALYZE after bulk changes and raise statistics target on skewed filter columns.
Table partitioning — range, list, hash?
Native declarative partitioning (Postgres 10+):
CREATE TABLE measurements (
logdate date NOT NULL,
value numeric
) PARTITION BY RANGE (logdate);
CREATE TABLE measurements_2026_06
PARTITION OF measurements
FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');| Strategy | Use |
|---|---|
| RANGE | Dates, monotonic IDs |
| LIST | Region, tenant bucket |
| HASH | Even spread when no natural key |
Partition pruning skips irrelevant child tables—critical for performance.
A strong answer is:
Range partition time-series by month, attach new partitions ahead of time, and verify pruning in EXPLAIN.
Connection pooling — why PgBouncer?
Postgres backends are memory-heavy (~MB each). Apps opening thousands of connections exhaust RAM.
PgBouncer pools connections:
| Mode | Behavior |
|---|---|
| Session | Client holds server for session |
| Transaction | Server returned after each transaction |
| Statement | Aggressive—breaks some session features |
ORMs + serverless need pooling (PgBouncer, RDS Proxy, Supavisor).
A strong answer is:
I pool at PgBouncer transaction mode for web apps—raw Postgres connections do not scale to thousands of lambdas or pods.
Replication, backup, and high availability
Streaming replication — how does it work?
Primary streams WAL records to standby; standby replays WAL—physical replication, byte-for-byte copy.
| Mode | Trade-off |
|---|---|
| Asynchronous | Lower latency, possible small loss on failover |
| Synchronous | Commit waits for replica ack—stronger durability |
Standbys are hot (read-only) with hot_standby = on.
Monitor replication lag via pg_stat_replication.
A strong answer is:
WAL streaming keeps replica identical to primary—I monitor lag bytes and choose sync vs async based on RPO requirements.
Logical vs physical replication?
| Physical | Logical |
|---|---|
| WAL byte replay | Decoded row changes |
| Same schema required | Selective tables, version upgrades |
| Whole cluster image | CDC, migrations, multi-tenant fan-out |
CREATE PUBLICATION orders_pub FOR TABLE orders;
CREATE SUBSCRIPTION orders_sub
CONNECTION 'host=primary dbname=app'
PUBLICATION orders_pub;Logical slots retain WAL until consumed—lagging subscriber causes WAL bloat on primary (CDC interview topic).
A strong answer is:
Physical for HA failover; logical for selective sync and upgrades—I watch replication slot lag to prevent WAL disk fill on primary.
Backups and point-in-time recovery (PITR)?
| Method | Role |
|---|---|
| pg_dump / pg_dumpall | Logical backup—portable, slower restore |
| Base backup + WAL archive | Physical; enables PITR |
PITR flow: restore base backup to time T0, replay WAL to target timestamp.
Tools: WAL-G, pgBackRest, cloud automated backups.
Test restores regularly—untested backups are folklore.
A strong answer is:
pg_dump for logical portability; base backup plus WAL archiving for PITR—I schedule restore drills, not just backups.
Failover and high availability patterns?
Options:
| Pattern | Notes |
|---|---|
| Manual promote | pg_ctl promote on standby |
| Patroni / repmgr | Automated leader election |
| Cloud managed | RDS Multi-AZ, Cloud SQL HA |
Application needs connection targeting to new primary and handling brief errors during failover.
Read replicas offload read scaling; writes still go to primary (unless sharded).
A strong answer is:
HA needs automated failover plus app retry—I use managed HA or Patroni and test promote without losing replication slots blindly.
Roles, grants, and Row Level Security?
Postgres roles can be users or groups:
CREATE ROLE app_read NOLOGIN;
GRANT CONNECT ON DATABASE shop TO app_read;
GRANT USAGE ON SCHEMA public TO app_read;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_read;Row Level Security (RLS) filters rows per session:
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON orders
USING (tenant_id = current_setting('app.tenant_id')::bigint);Least privilege: app role without SUPERUSER, CREATEDB, or broad DDL.
A strong answer is:
Separate roles per app with minimal grants; RLS for multi-tenant row isolation when defense in depth matters.
Scenarios and final prep
Scenario: Query suddenly slow after deploy — debug steps?
Checklist:
- pg_stat_statements — new query shape or regressed mean time?
- EXPLAIN (ANALYZE, BUFFERS) — plan change? seq scan?
- Statistics — run
ANALYZEafter migration bulk load - Indexes — migration drop index accidentally?
- Data volume — crossed threshold where plan flips
- Locks — blocking DDL or long transaction?
- Connection storm — pool saturation?
Narrate before diving into \d and catalogs.
A strong answer is:
I compare EXPLAIN before and after, check stats and indexes first, then locks—deploys that bulk-load without ANALYZE are a frequent culprit.
Scenario: Replica lag growing — what do you check?
| Check | Action |
|---|---|
pg_stat_replication |
Lag bytes, replay pause |
| Primary load | Spike in writes or WAL generation |
| Long queries on replica | Cancel heavy reporting queries |
| Missing hot_standby_feedback | Conflicts on replica |
| Network / disk IO | Infrastructure bottleneck |
| Logical slot consumer down | WAL retention on primary |
For CDC slots (Debezium), lag blocks vacuum on primary tables.
A strong answer is:
I read lag in bytes and replay paused state—kill runaway reporting on replica and fix stalled logical consumers before primary WAL disk fills.
Scenario: Zero-downtime schema migration?
Patterns:
| Technique | Example |
|---|---|
| Expand-contract | Add nullable column → backfill → enforce NOT NULL |
| CONCURRENTLY indexes | Avoid write locks |
| Dual write / read | App writes both schemas during transition |
| Triggers | Sync old/new tables temporarily |
Dangerous: ALTER TABLE … ADD COLUMN DEFAULT on huge table (rewrite history in older versions—know version behavior), blocking ACCESS EXCLUSIVE DDL.
Use Flyway/Liquibase with review for lock modes.
A strong answer is:
Expand-contract with concurrent index builds—I avoid blocking DDL on hot tables and backfill in batches with throttling.
PostgreSQL vs MongoDB — when which?
| Choose Postgres | Choose MongoDB |
|---|---|
| Strong relations, ACID transactions | Flexible schema, document model |
| Complex joins, reporting SQL | Rapid nested document iteration |
| JSONB when hybrid needed | Horizontal shard patterns native |
| Mature constraints, RLS | Workload already document-centric |
See MongoDB interview questions for document-store depth.
JSONB does not make Postgres a document DB—it lacks Mongo's sharding story out of the box.
A strong answer is:
Postgres for relational integrity and SQL analytics with optional JSONB; Mongo when document model and horizontal scale patterns are central—I do not store everything in jsonb by default.
What should you rehearse before PostgreSQL interviews?
Checklist:
- MVCC and why vacuum exists
- Isolation levels and deadlock retry
- Index types — B-tree, GIN, BRIN, partial
- EXPLAIN ANALYZE reading
- JSONB operators and GIN ops class
- Autovacuum and bloat story
- Streaming vs logical replication
- Backup / PITR basics
- Connection pooling rationale
- RLS and role grants
- One slow query debug narrative
- SQL technical interviews for live coding
- Spring Boot / Django for ORM layer
- Kafka for CDC slots
- Pandas for analytics downstream
A strong answer is:
I rehearse EXPLAIN on a real query weekly, explain MVCC to a rubber duck, and tie one production story to vacuum, indexes, and replication lag.
Pattern cheat sheet (quick reference)
| Task | PostgreSQL approach |
|---|---|
| Primary keys | GENERATED ALWAYS AS IDENTITY or uuid |
| Money | numeric, not float |
| Timestamps | timestamptz |
| Flexible attributes | jsonb + GIN (jsonb_path_ops) |
| Hot filtered queries | Partial index |
| Online index add | CREATE INDEX CONCURRENTLY |
| Slow query triage | pg_stat_statements → EXPLAIN ANALYZE |
| Dead tuple cleanup | autovacuum (tuned, not disabled) |
| HA reads | Streaming replica |
| Selective sync / CDC | Logical replication + slot monitoring |
| Multi-tenant rows | RLS policies |
| App connection storm | PgBouncer transaction pooling |
| Top-N per group | LATERAL or window functions |
References
On-site prep
- SQL technical interview questions
- MongoDB interview questions
- Pandas interview questions
- Data science interview questions
- Django interview questions for experienced developers
- Spring Boot interview questions for experienced developers
- Kafka interview questions
- Python developer interviews
- Full stack developer interviews
- Interview Questions category
Summary
PostgreSQL interviews separate candidates who only write SELECTs from those who understand MVCC, vacuum, index types, and EXPLAIN plans under real load. Use this guide as a self-test: run EXPLAIN on a slow query, describe autovacuum tuning, and practice explaining answers aloud. Pair Postgres depth with SQL technical interviews for live coding, Django or Spring Boot for ORM integration, and Kafka when logical replication feeds event pipelines.

