PostgreSQL Interview Questions and Answers

PostgreSQL interview questions, postgresql developer interview questions, and postgresql dba interview questions for 2026: MVCC, indexes, EXPLAIN, JSONB, vacuum, replication, and tuning with elaborate answers.

Published

Updated

Tech reviewed byDeepak Prasad

PostgreSQL Interview Questions and Answers

PostgreSQL interview questions, postgresql developer interview questions, and postgresql dba interview questions appear in backend, full-stack, data engineering, and platform roles where Postgres is the system of record. Interviewers go beyond generic SQL—they probe MVCC and vacuum, index type choice (B-tree vs GIN vs BRIN), EXPLAIN ANALYZE literacy, JSONB trade-offs, replication topology, and how you debug bloat or slow queries in production.

Below are 45 questions with elaborate answers; technical sections include a strong answer sample you can say aloud. Pair with DBMS interview questions for ER modeling, normalization, and ACID fundamentals, SQL technical interview questions for joins, window functions, and query-writing drills, MongoDB interview questions when interviewers compare document vs relational stores, pandas interview questions for analytics pipelines, Django interview questions for experienced developers and Spring Boot interview questions for ORM integration, Kafka interview questions for CDC and event sourcing, and data science interview questions for analytics workloads.

NOTE
Prep target: Run EXPLAIN (ANALYZE, BUFFERS) on a slow query, add the right index type for a JSONB filter, and explain what autovacuum does when updates are heavy—out loud, as if in a senior loop.

Tested on: Ubuntu 25.04 (Plucky Puffin); kernel 6.14.0-37-generic; PostgreSQL 17.7.


Interview context and how to prepare

What do PostgreSQL interviews actually test?

PostgreSQL interviews test whether you understand how the database behaves under concurrency and scale—not only SELECT syntax.

Layer What interviewers probe
SQL fluency Joins, CTEs, window functions, aggregates
Postgres internals MVCC, vacuum, transaction IDs
Indexing B-tree, GIN, partial/expression indexes
Performance EXPLAIN plans, stats, work_mem
Types JSONB, arrays, enums, ranges
Operations (DBA) Replication, backups, PITR, partitioning
Security Roles, RLS, least privilege
Role Emphasis
Developer ORM-safe queries, migrations, JSONB when appropriate
Backend Connection pooling, transactions, locking behavior
Data engineer Logical replication, CDC slots, bulk load
DBA Vacuum tuning, HA, backup/restore, capacity

Beyond generic SQL, senior loops lean on MVCC and vacuum, index type selection, and reading EXPLAIN plans—the topics that separate application CRUD from database engineering.

PostgreSQL developer vs DBA interview focus?
Topic Developer loop DBA loop
Queries ORM pitfalls, N+1, pagination Slow query triage, pg_stat_statements
Schema Migrations, constraints, indexes Partitioning, tablespaces, extensions
Concurrency Transaction boundaries, isolation Lock monitoring, long transactions
Storage JSONB vs columns Bloat, autovacuum, fillfactor
HA Connection strings, read replicas Streaming replication, failover, PITR
Security Parameterized queries Roles, RLS, pg_hba.conf

Many postgresql developer interview questions still touch vacuum and indexes because app code causes bloat and bad plans.

What is a typical PostgreSQL interview loop?
Round Duration Focus
Screening 30 min Stack, Postgres version, scale
SQL live coding 45–60 min Joins, windows, CTEs—see SQL prep
Postgres depth 45 min MVCC, indexes, EXPLAIN
System design 45–60 min Read replicas, caching, sharding vs partitioning
DBA scenario (senior) 45 min Replication lag, backup restore, vacuum crisis

System-design rounds often ask how Postgres fits multi-tenant SaaS or event-sourced architectures—read replicas, connection pooling, and logical replication slots.

What is a realistic 4–6 week PostgreSQL prep plan?
Week Focus Output
1 SQL joins, windows, CTEs 20 problems from SQL guide
2 MVCC, isolation, locks Explain phantom vs repeatable read on Postgres
3 Indexes + EXPLAIN Fix a seq scan with partial or composite index
4 JSONB, arrays, types Model flexible attributes with GIN index
5 Replication, backup concepts Draw primary/replica + PITR flow
6 Mock scenarios Slow query + bloat + failover narrative

Run a local PostgreSQL 16+ instance and practice \d+, \di, EXPLAIN ANALYZE.

Why PostgreSQL over MySQL in interviews?

Interviewers want trade-off awareness, not fanboy answers.

Area PostgreSQL strength
SQL standard Richer SQL, window functions, CTEs (historically stronger)
Concurrency MVCC model, fewer reader/writer surprises
Types JSONB, arrays, ranges, custom types
Extensions PostGIS, pgvector, citext
Replication Physical + logical replication flexible

MySQL/InnoDB also uses MVCC—Postgres interviews go deeper on vacuum, index types, and planner behavior.


Architecture and PostgreSQL fundamentals

PostgreSQL architecture — process model basics?

PostgreSQL uses a multi-process model (not multi-threaded like some databases):

Process Role
postmaster Parent; listens for connections
backend One per client connection (runs queries)
background workers autovacuum, checkpointer, WAL writer, stats collector
WAL Write-Ahead Log for durability and replication

Data files live under PGDATA (often /var/lib/postgresql/<version>/main on Ubuntu). Each database is a collection of schemas and tables stored in heap files.

Connection cost motivates PgBouncer pooling in production.

A strong answer is:

One backend process per connection—so pooling matters at scale—and WAL makes commits durable and feeds replication.

Database, schema, table — how does naming work?
Level Example
Cluster One PostgreSQL server instance
Database Isolated namespace of schemas (app_prod)
Schema Logical grouping (public, billing)
Table billing.invoices

search_path controls unqualified name resolution—migrations should set explicit schemas to avoid public surprises.

sql
CREATE SCHEMA billing;
CREATE TABLE billing.invoices (
  id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  amount numeric(12,2) NOT NULL
);

A strong answer is:

I use schemas to separate domains in one database and qualify names in migrations so search_path never hides the wrong table.

Important PostgreSQL data types for interviews?
Type Use
bigint / int IDs, counters
numeric(p,s) Money—avoid float
text vs varchar text has no perf penalty in Postgres
timestamptz Always prefer for instants (timezone-aware)
uuid Distributed IDs (gen_random_uuid())
jsonb Semi-structured documents
array Native arrays with operators
enum Fixed label sets (migration caution)

Using timestamp without time zone for global apps is a common interview trap.

A strong answer is:

timestamptz for events, numeric for money, bigint or uuid for keys—I avoid float for currency and document why jsonb is not a default for every column.

Constraints and referential integrity?
Constraint Purpose
PRIMARY KEY Unique + not null row identity
FOREIGN KEY Referential integrity
UNIQUE Alternate keys
NOT NULL Required columns
CHECK Row-level rules
EXCLUDE Prevent overlapping ranges (scheduling)
sql
CREATE TABLE order_items (
  order_id bigint REFERENCES orders(id) ON DELETE CASCADE,
  sku text NOT NULL,
  qty int CHECK (qty > 0),
  PRIMARY KEY (order_id, sku)
);

ON DELETE actions: CASCADE, SET NULL, RESTRICT, NO ACTION—know ORM defaults.

A strong answer is:

I enforce business rules in CHECK and FK constraints close to data—migrations add indexes on FK columns to avoid slow cascades and joins.

SERIAL vs IDENTITY vs UUID?
Approach Notes
SERIAL Legacy shorthand for sequence + default
GENERATED … AS IDENTITY SQL standard; preferred in new schemas
uuid No coordination across services; larger indexes
sql
CREATE TABLE users (
  id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  email citext UNIQUE NOT NULL
);

Sequences can have gaps (rollback, crash)—interviewers check you do not assume gapless IDs.

A strong answer is:

IDENTITY for monotonic numeric PKs in single-database apps; UUID when services generate IDs independently—I never assume sequences are gap-free.


MVCC, transactions, and concurrency

What is MVCC in PostgreSQL?

Multi-Version Concurrency Control keeps multiple row versions so readers do not block writers and writers do not block readers.

Each row version has system columns:

Column Meaning
xmin Inserting transaction ID
xmax Deleting/updating transaction ID (if set)

A transaction sees a snapshot of committed rows visible at snapshot start. UPDATE creates a new row version; old version becomes a dead tuple until vacuum reclaims space.

This differs from in-place overwrite databases and explains why heavy UPDATE causes bloat.

A strong answer is:

MVCC gives snapshot isolation behavior—reads see a consistent view while writes create new tuple versions; dead versions need vacuum, which is core Postgres DBA knowledge.

Transaction isolation levels in PostgreSQL?

Postgres implements:

Level Behavior
Read committed (default) Each statement sees latest committed data
Repeatable read Snapshot for whole transaction
Serializable Serializable snapshot isolation (SSI)
sql
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- consistent reads within this transaction
COMMIT;

Anomalies:

  • Read committed allows non-repeatable reads between statements
  • Repeatable read blocks phantom inserts in Postgres (stronger than standard RR)
  • Serializable may raise serialization failures—apps must retry

A strong answer is:

Default read committed is fine for most OLTP; I use repeatable read or serializable when invariants need stronger guarantees and handle retry on 40001.

Locks and blocking — what should developers know?

Postgres uses row-level locks for UPDATE/DELETE and table-level locks for DDL.

Useful views:

sql
SELECT * FROM pg_locks WHERE NOT granted;
SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock';
Pattern Risk
Long transaction holding rows Blocks vacuum, causes bloat
SELECT … FOR UPDATE Explicit row locks for workflows
DDL on hot tables ACCESS EXCLUSIVE blocks reads/writes

ORM sessions left open across HTTP requests are a common production blocker.

A strong answer is:

I keep transactions short, index FK columns, and check pg_stat_activity for lock chains—long open transactions hurt vacuum and throughput.

How does PostgreSQL handle deadlocks?

Deadlock detector picks a victim transaction and aborts it with SQLSTATE 40P01. Application must retry.

Prevention:

  • Lock rows in consistent order
  • Keep transactions small
  • Use advisory locks for application-level ordering
sql
BEGIN;
SELECT * FROM accounts WHERE id IN (1, 2) ORDER BY id FOR UPDATE;
-- updates
COMMIT;

Interviewers want you to mention retry with backoff in app code.

A strong answer is:

Postgres detects deadlocks and kills one transaction—I retry idempotent operations and design lock order to prevent cycles.

WAL and durability — what happens on COMMIT?

Write-Ahead Logging: changes go to WAL on disk before heap pages. COMMIT waits for WAL flush (depending on synchronous_commit).

Setting Trade-off
synchronous_commit = on Durability default
off / local Faster commits, risk on crash

WAL enables PITR, streaming replication, and crash recovery.

A strong answer is:

COMMIT is durable when WAL is flushed—I do not disable synchronous_commit without understanding data-loss risk; WAL is also the replication stream.


Indexing strategies

PostgreSQL index types — when to use each?
Index Best for
B-tree (default) =, <, >, BETWEEN, ORDER BY
Hash Equality only (niche)
GIN JSONB, arrays, full-text
GiST Geometry, ranges, full-text
BRIN Very large, naturally ordered data (time-series)
SP-GiST Partitioned/non-balanced structures
sql
CREATE INDEX idx_orders_created ON orders USING brin (created_at);
CREATE INDEX idx_docs_body ON docs USING gin (body jsonb_path_ops);

Wrong index type wastes space and slows writes—interviewers ask you to match operator class to query.

A strong answer is:

B-tree by default; GIN for jsonb @> and full-text; BRIN for append-only timestamps on huge tables—I justify type from the WHERE clause operators.

Composite, partial, and covering indexes?

Composite — multiple columns; leftmost prefix rule applies.

sql
CREATE INDEX idx_orders_user_created ON orders (user_id, created_at DESC);

Partial — indexes subset of rows:

sql
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';

Smaller index, faster writes, perfect for hot filtered queries.

Covering index (INCLUDE):

sql
CREATE INDEX idx_orders_user_inc ON orders (user_id) INCLUDE (total_cents);

Enables index-only scans when visibility map allows.

A strong answer is:

Partial indexes for skewed filters like status='pending'; composite column order matches equality filters first, then range/sort.

Indexing JSONB — jsonb_ops vs jsonb_path_ops?

JSONB stores binary JSON with efficient operators.

sql
-- Containment query
SELECT * FROM products
WHERE attrs @> '{"color": "red"}';

CREATE INDEX idx_products_attrs ON products
USING gin (attrs jsonb_path_ops);
Operator class Supports Size
jsonb_ops More operators Larger
jsonb_path_ops Mainly @> Smaller, faster for containment

Tested containment on PostgreSQL 17:

sql
SELECT '{"tags": ["a","b"]}'::jsonb @> '{"tags": ["a"]}'::jsonb;
-- returns true

Prefer real columns for stable typed fields; JSONB for dynamic attributes.

A strong answer is:

jsonb_path_ops GIN when I only need @> containment; typed columns when the field is stable and filter-heavy.

Index trade-offs — when not to index?

Indexes speed reads and slow writes (every INSERT/UPDATE maintains index pages).

Skip or delay indexes when:

  • Table is tiny (seq scan cheaper)
  • Write-heavy with rare reads
  • Low selectivity column alone (e.g. boolean flag)
  • Wrong type for query pattern

Monitor unused indexes:

sql
SELECT * FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND schemaname = 'public';

A strong answer is:

I index for proven query patterns from pg_stat_statements, drop unused indexes, and accept seq scans on small tables.

CREATE INDEX CONCURRENTLY — why use it?

Normal CREATE INDEX takes SHARE lock blocking writes. CONCURRENTLY builds without blocking writes—slower, two-pass build.

sql
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);

Failures leave invalid index—check pg_index.indisvalid and REINDEX if needed.

Production migrations on large tables almost always use CONCURRENTLY.

A strong answer is:

CONCURRENTLY for online index adds on live tables—I monitor for invalid indexes and reindex if the build failed mid-flight.


Query optimization and EXPLAIN

EXPLAIN and EXPLAIN ANALYZE — how do you read plans?
sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 42 AND created_at > now() - interval '30 days';
Node Concern
Seq Scan Full table read—OK if small or most rows match
Index Scan / Bitmap Index used—check rows removed by filter
Nested Loop Good for small outer sets
Hash Join Large equi-joins
Sort May spill to disk if work_mem exceeded

Compare estimated vs actual rows—large gaps mean stale statistics → ANALYZE.

A strong answer is:

EXPLAIN ANALYZE shows real timings; I look for seq scans on big tables, bad row estimates, and buffer hits versus reads.

Why might Postgres choose a sequential scan?

Reasons:

Cause Detail
Small table Cheaper to read whole heap
Low selectivity Reading most rows—index + heap fetch costs more
Missing stats Run ANALYZE
Function on column WHERE lower(email) = blocks plain index
Wide result set Index scan + heap visits lose

Fix patterns: expression index, partial index, increase stats target, rewrite predicate.

A strong answer is:

Seq scan is not always wrong—I check row counts and selectivity; fix stats or add a targeted index when the scan is hot and large.

pg_stat_statements and slow query triage?

Extension pg_stat_statements aggregates query stats (calls, total time, mean time, rows).

sql
SELECT query, calls, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Triage flow:

  1. Find top total time queries
  2. EXPLAIN (ANALYZE) the normalized query
  3. Add index or rewrite SQL
  4. Verify with load test

Pair with log_min_duration_statement for raw slow logs.

A strong answer is:

I rank by total_exec_time in pg_stat_statements, EXPLAIN the winner, fix plan or index, and re-measure—not guess from ORM logs alone.

Join strategies and statistics?

Planner picks join order using table statistics (histograms,ndistinct).

Tips:

  • ANALYZE after large data changes
  • Join on indexed columns
  • Avoid cross joins accidentally in ORMs
  • JOIN smaller filtered set first when rewriting manually
sql
-- Filter early in CTE
WITH recent AS (
  SELECT id FROM orders WHERE created_at > now() - interval '7 days'
)
SELECT u.email, COUNT(*)
FROM recent r
JOIN orders o ON o.id = r.id
JOIN users u ON u.id = o.user_id
GROUP BY u.email;

A strong answer is:

Accurate stats drive join choices—I analyze after bulk loads and ensure join keys are indexed.

Pagination — OFFSET vs keyset?

OFFSET pagination degrades on large offsets (scans skipped rows):

sql
-- Slow at high page numbers
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;

Keyset (seek) pagination:

sql
SELECT * FROM orders
WHERE id > 100000
ORDER BY id
LIMIT 20;

Requires stable sort key; handle tie-breaker column.

A strong answer is:

Keyset pagination on indexed columns for infinite scroll; OFFSET only for small admin pages.


JSONB, advanced SQL, and extensions

JSON vs JSONB in PostgreSQL?
JSON JSONB
Storage Text, preserves whitespace Binary decomposed
Queries Re-parsed each time Faster operators
Indexing Limited GIN indexes
Writes Faster insert Slight overhead

Use JSONB for querying; JSON only if you need exact text preservation.

sql
SELECT attrs->>'sku' AS sku,
       attrs->'dimensions'->>'width' AS width
FROM products;

A strong answer is:

JSONB for application payloads I filter on; extract hot keys to typed columns when queries stabilize.

CTEs and window functions — Postgres strengths?

PostgreSQL optimizes CTEs (inline or materialize depending on version/planner)—use for readable pipelines.

sql
WITH monthly AS (
  SELECT date_trunc('month', created_at) AS m, SUM(amount) AS revenue
  FROM orders
  GROUP BY 1
)
SELECT m, revenue,
       revenue - LAG(revenue) OVER (ORDER BY m) AS mom_delta
FROM monthly;

Window functions: ROW_NUMBER, RANK, LEAD/LAG, SUM() OVER (PARTITION BY …)—see SQL technical guide for more patterns.

A strong answer is:

CTEs clarify multi-step analytics; windows for ranking and period-over-period without self-join explosion.

LATERAL joins — practical use?

LATERAL lets subquery reference columns from preceding FROM item—great for top-N per group:

sql
SELECT c.name, r.*
FROM customers c
CROSS JOIN LATERAL (
  SELECT *
  FROM orders o
  WHERE o.customer_id = c.id
  ORDER BY o.created_at DESC
  LIMIT 3
) r;

Alternative to window + filter patterns; planner can use indexes per customer.

A strong answer is:

LATERAL for correlated top-N per group—I reach for it when a subquery needs outer row values.

Common PostgreSQL extensions in interviews?
Extension Use
pg_stat_statements Query performance stats
citext Case-insensitive text
pg_trgm Fuzzy text search (LIKE acceleration)
PostGIS Geospatial
pgvector Vector embeddings / similarity search
uuid-ossp / pgcrypto UUID and crypto helpers
sql
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_users_name_trgm ON users USING gin (name gin_trgm_ops);

A strong answer is:

I enable extensions deliberately—pg_stat_statements in every prod cluster, pg_trgm or pgvector when product needs demand it.

UPSERT — ON CONFLICT?
sql
INSERT INTO inventory (sku, qty)
VALUES ('ABC', 10)
ON CONFLICT (sku) DO UPDATE
SET qty = inventory.qty + EXCLUDED.qty;

Requires UNIQUE or PRIMARY KEY on conflict target.

EXCLUDED refers to proposed insert row. Watch lock contention on hot keys.

A strong answer is:

ON CONFLICT for idempotent writes—I define unique constraints explicitly and handle conflict updates atomically.


DBA — vacuum, bloat, and maintenance

VACUUM and autovacuum — why are they critical?

UPDATE/DELETE leave dead tuples. VACUUM reclaims space for reuse, updates visibility map, prevents transaction ID wraparound.

Command Effect
VACUUM Reclaim dead tuple space (non-blocking)
VACUUM ANALYZE Vacuum + refresh planner stats
VACUUM FULL Rewrites table—locks exclusively (rare)

Autovacuum runs automatically—tune, do not disable.

Long transactions block vacuum cleanup → bloat.

A strong answer is:

Vacuum reclaims MVCC dead tuples and protects against xid wraparound—I tune autovacuum on high-churn tables and kill long idle transactions.

Table bloat — causes and fixes?

Bloat = heap pages holding dead tuples or sparse space.

Cause Fix
Heavy updates Tune autovacuum (scale_factor, threshold)
Long open transactions End idle sessions
Bulk delete VACUUM or pg_repack online

Check bloat estimates with pg_stat_user_tables (n_dead_tup) and tools like pgstattuple.

pg_repack rewrites online without VACUUM FULL exclusive lock.

A strong answer is:

Bloat from dead tuples and blocked vacuum—I lower autovacuum thresholds on hot tables and use pg_repack before VACUUM FULL in production.

ANALYZE and statistics?

ANALYZE samples table data to update pg_statistic for the planner.

When to run:

  • After large COPY/INSERT
  • When EXPLAIN estimates diverge from actual
  • Autovacuum analyze handles routine cases
sql
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;
ANALYZE orders;

Higher statistics target improves estimates on skewed columns (costs more analyze time).

A strong answer is:

Bad plans often mean stale stats—I ANALYZE after bulk changes and raise statistics target on skewed filter columns.

Table partitioning — range, list, hash?

Native declarative partitioning (Postgres 10+):

sql
CREATE TABLE measurements (
  logdate date NOT NULL,
  value numeric
) PARTITION BY RANGE (logdate);

CREATE TABLE measurements_2026_06
  PARTITION OF measurements
  FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');
Strategy Use
RANGE Dates, monotonic IDs
LIST Region, tenant bucket
HASH Even spread when no natural key

Partition pruning skips irrelevant child tables—critical for performance.

A strong answer is:

Range partition time-series by month, attach new partitions ahead of time, and verify pruning in EXPLAIN.

Connection pooling — why PgBouncer?

Postgres backends are memory-heavy (~MB each). Apps opening thousands of connections exhaust RAM.

PgBouncer pools connections:

Mode Behavior
Session Client holds server for session
Transaction Server returned after each transaction
Statement Aggressive—breaks some session features

ORMs + serverless need pooling (PgBouncer, RDS Proxy, Supavisor).

A strong answer is:

I pool at PgBouncer transaction mode for web apps—raw Postgres connections do not scale to thousands of lambdas or pods.


Replication, backup, and high availability

Streaming replication — how does it work?

Primary streams WAL records to standby; standby replays WAL—physical replication, byte-for-byte copy.

Mode Trade-off
Asynchronous Lower latency, possible small loss on failover
Synchronous Commit waits for replica ack—stronger durability

Standbys are hot (read-only) with hot_standby = on.

Monitor replication lag via pg_stat_replication.

A strong answer is:

WAL streaming keeps replica identical to primary—I monitor lag bytes and choose sync vs async based on RPO requirements.

Logical vs physical replication?
Physical Logical
WAL byte replay Decoded row changes
Same schema required Selective tables, version upgrades
Whole cluster image CDC, migrations, multi-tenant fan-out
sql
CREATE PUBLICATION orders_pub FOR TABLE orders;
CREATE SUBSCRIPTION orders_sub
  CONNECTION 'host=primary dbname=app'
  PUBLICATION orders_pub;

Logical slots retain WAL until consumed—lagging subscriber causes WAL bloat on primary (CDC interview topic).

A strong answer is:

Physical for HA failover; logical for selective sync and upgrades—I watch replication slot lag to prevent WAL disk fill on primary.

Backups and point-in-time recovery (PITR)?
Method Role
pg_dump / pg_dumpall Logical backup—portable, slower restore
Base backup + WAL archive Physical; enables PITR

PITR flow: restore base backup to time T0, replay WAL to target timestamp.

Tools: WAL-G, pgBackRest, cloud automated backups.

Test restores regularly—untested backups are folklore.

A strong answer is:

pg_dump for logical portability; base backup plus WAL archiving for PITR—I schedule restore drills, not just backups.

Failover and high availability patterns?

Options:

Pattern Notes
Manual promote pg_ctl promote on standby
Patroni / repmgr Automated leader election
Cloud managed RDS Multi-AZ, Cloud SQL HA

Application needs connection targeting to new primary and handling brief errors during failover.

Read replicas offload read scaling; writes still go to primary (unless sharded).

A strong answer is:

HA needs automated failover plus app retry—I use managed HA or Patroni and test promote without losing replication slots blindly.

Roles, grants, and Row Level Security?

Postgres roles can be users or groups:

sql
CREATE ROLE app_read NOLOGIN;
GRANT CONNECT ON DATABASE shop TO app_read;
GRANT USAGE ON SCHEMA public TO app_read;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_read;

Row Level Security (RLS) filters rows per session:

sql
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON orders
  USING (tenant_id = current_setting('app.tenant_id')::bigint);

Least privilege: app role without SUPERUSER, CREATEDB, or broad DDL.

A strong answer is:

Separate roles per app with minimal grants; RLS for multi-tenant row isolation when defense in depth matters.


Scenarios and final prep

Scenario: Query suddenly slow after deploy — debug steps?

Checklist:

  1. pg_stat_statements — new query shape or regressed mean time?
  2. EXPLAIN (ANALYZE, BUFFERS) — plan change? seq scan?
  3. Statistics — run ANALYZE after migration bulk load
  4. Indexes — migration drop index accidentally?
  5. Data volume — crossed threshold where plan flips
  6. Locks — blocking DDL or long transaction?
  7. Connection storm — pool saturation?

Narrate before diving into \d and catalogs.

A strong answer is:

I compare EXPLAIN before and after, check stats and indexes first, then locks—deploys that bulk-load without ANALYZE are a frequent culprit.

Scenario: Replica lag growing — what do you check?
Check Action
pg_stat_replication Lag bytes, replay pause
Primary load Spike in writes or WAL generation
Long queries on replica Cancel heavy reporting queries
Missing hot_standby_feedback Conflicts on replica
Network / disk IO Infrastructure bottleneck
Logical slot consumer down WAL retention on primary

For CDC slots (Debezium), lag blocks vacuum on primary tables.

A strong answer is:

I read lag in bytes and replay paused state—kill runaway reporting on replica and fix stalled logical consumers before primary WAL disk fills.

Scenario: Zero-downtime schema migration?

Patterns:

Technique Example
Expand-contract Add nullable column → backfill → enforce NOT NULL
CONCURRENTLY indexes Avoid write locks
Dual write / read App writes both schemas during transition
Triggers Sync old/new tables temporarily

Dangerous: ALTER TABLE … ADD COLUMN DEFAULT on huge table (rewrite history in older versions—know version behavior), blocking ACCESS EXCLUSIVE DDL.

Use Flyway/Liquibase with review for lock modes.

A strong answer is:

Expand-contract with concurrent index builds—I avoid blocking DDL on hot tables and backfill in batches with throttling.

PostgreSQL vs MongoDB — when which?
Choose Postgres Choose MongoDB
Strong relations, ACID transactions Flexible schema, document model
Complex joins, reporting SQL Rapid nested document iteration
JSONB when hybrid needed Horizontal shard patterns native
Mature constraints, RLS Workload already document-centric

See MongoDB interview questions for document-store depth.

JSONB does not make Postgres a document DB—it lacks Mongo's sharding story out of the box.

A strong answer is:

Postgres for relational integrity and SQL analytics with optional JSONB; Mongo when document model and horizontal scale patterns are central—I do not store everything in jsonb by default.

What should you rehearse before PostgreSQL interviews?

Checklist:

  • MVCC and why vacuum exists
  • Isolation levels and deadlock retry
  • Index types — B-tree, GIN, BRIN, partial
  • EXPLAIN ANALYZE reading
  • JSONB operators and GIN ops class
  • Autovacuum and bloat story
  • Streaming vs logical replication
  • Backup / PITR basics
  • Connection pooling rationale
  • RLS and role grants
  • One slow query debug narrative
  • SQL technical interviews for live coding
  • Spring Boot / Django for ORM layer
  • Kafka for CDC slots
  • Pandas for analytics downstream

A strong answer is:

I rehearse EXPLAIN on a real query weekly, explain MVCC to a rubber duck, and tie one production story to vacuum, indexes, and replication lag.


Pattern cheat sheet (quick reference)

Task PostgreSQL approach
Primary keys GENERATED ALWAYS AS IDENTITY or uuid
Money numeric, not float
Timestamps timestamptz
Flexible attributes jsonb + GIN (jsonb_path_ops)
Hot filtered queries Partial index
Online index add CREATE INDEX CONCURRENTLY
Slow query triage pg_stat_statementsEXPLAIN ANALYZE
Dead tuple cleanup autovacuum (tuned, not disabled)
HA reads Streaming replica
Selective sync / CDC Logical replication + slot monitoring
Multi-tenant rows RLS policies
App connection storm PgBouncer transaction pooling
Top-N per group LATERAL or window functions

References

On-site prep


Summary

PostgreSQL interviews separate candidates who only write SELECTs from those who understand MVCC, vacuum, index types, and EXPLAIN plans under real load. Use this guide as a self-test: run EXPLAIN on a slow query, describe autovacuum tuning, and practice explaining answers aloud. Pair Postgres depth with SQL technical interviews for live coding, Django or Spring Boot for ORM integration, and Kafka when logical replication feeds event pipelines.

Deepak Prasad

R&D Engineer

Founder of GoLinuxCloud with more than 15 years of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive …