45+ PostgreSQL Interview Questions and Answers 2026

PostgreSQL interview questions, postgresql developer interview questions, and postgresql dba interview questions appear in backend, full-stack, data engineering, and platform roles where Postgres is the system of record. Interviewers go beyond generic SQL—they probe MVCC and vacuum, index type choice (B-tree vs GIN vs BRIN), EXPLAIN ANALYZE literacy, JSONB trade-offs, replication topology, and how you debug bloat or slow queries in production.

Below are 45 questions with elaborate answers; technical sections include a strong answer sample you can say aloud. Pair with DBMS interview questions for ER modeling, normalization, and ACID fundamentals, SQL technical interview questions for joins, window functions, and query-writing drills, MongoDB interview questions when interviewers compare document vs relational stores, pandas interview questions for analytics pipelines, Django interview questions for experienced developers and Spring Boot interview questions for ORM integration, Kafka interview questions for CDC and event sourcing, and data science interview questions for analytics workloads.

NOTE

Prep target: Run EXPLAIN (ANALYZE, BUFFERS) on a slow query, add the right index type for a JSONB filter, and explain what autovacuum does when updates are heavy—out loud, as if in a senior loop.

Tested on: Ubuntu 25.04 (Plucky Puffin); kernel 6.14.0-37-generic; PostgreSQL 17.7.

Interview context and how to prepare

What do PostgreSQL interviews actually test?

PostgreSQL interviews test whether you understand how the database behaves under concurrency and scale—not only SELECT syntax.

Layer	What interviewers probe
SQL fluency	Joins, CTEs, window functions, aggregates
Postgres internals	MVCC, vacuum, transaction IDs
Indexing	B-tree, GIN, partial/expression indexes
Performance	EXPLAIN plans, stats, work_mem
Types	JSONB, arrays, enums, ranges
Operations (DBA)	Replication, backups, PITR, partitioning
Security	Roles, RLS, least privilege

Role	Emphasis
Developer	ORM-safe queries, migrations, JSONB when appropriate
Backend	Connection pooling, transactions, locking behavior
Data engineer	Logical replication, CDC slots, bulk load
DBA	Vacuum tuning, HA, backup/restore, capacity

Beyond generic SQL, senior loops lean on MVCC and vacuum, index type selection, and reading EXPLAIN plans—the topics that separate application CRUD from database engineering.

PostgreSQL developer vs DBA interview focus?

Topic	Developer loop	DBA loop
Queries	ORM pitfalls, N+1, pagination	Slow query triage, pg_stat_statements
Schema	Migrations, constraints, indexes	Partitioning, tablespaces, extensions
Concurrency	Transaction boundaries, isolation	Lock monitoring, long transactions
Storage	JSONB vs columns	Bloat, autovacuum, fillfactor
HA	Connection strings, read replicas	Streaming replication, failover, PITR
Security	Parameterized queries	Roles, RLS, pg_hba.conf

Many postgresql developer interview questions still touch vacuum and indexes because app code causes bloat and bad plans.

What is a typical PostgreSQL interview loop?

Round	Duration	Focus
Screening	30 min	Stack, Postgres version, scale
SQL live coding	45–60 min	Joins, windows, CTEs—see SQL prep
Postgres depth	45 min	MVCC, indexes, EXPLAIN
System design	45–60 min	Read replicas, caching, sharding vs partitioning
DBA scenario (senior)	45 min	Replication lag, backup restore, vacuum crisis

System-design rounds often ask how Postgres fits multi-tenant SaaS or event-sourced architectures—read replicas, connection pooling, and logical replication slots.

What is a realistic 4–6 week PostgreSQL prep plan?

Week	Focus	Output
1	SQL joins, windows, CTEs	20 problems from SQL guide
2	MVCC, isolation, locks	Explain phantom vs repeatable read on Postgres
3	Indexes + EXPLAIN	Fix a seq scan with partial or composite index
4	JSONB, arrays, types	Model flexible attributes with GIN index
5	Replication, backup concepts	Draw primary/replica + PITR flow
6	Mock scenarios	Slow query + bloat + failover narrative

Run a local PostgreSQL 16+ instance and practice \d+, \di, EXPLAIN ANALYZE.

Why PostgreSQL over MySQL in interviews?

Interviewers want trade-off awareness, not fanboy answers.

Area	PostgreSQL strength
SQL standard	Richer SQL, window functions, CTEs (historically stronger)
Concurrency	MVCC model, fewer reader/writer surprises
Types	JSONB, arrays, ranges, custom types
Extensions	PostGIS, pgvector, citext
Replication	Physical + logical replication flexible

MySQL/InnoDB also uses MVCC—Postgres interviews go deeper on vacuum, index types, and planner behavior.

Architecture and PostgreSQL fundamentals

PostgreSQL architecture — process model basics?

PostgreSQL uses a multi-process model (not multi-threaded like some databases):

Process	Role
postmaster	Parent; listens for connections
backend	One per client connection (runs queries)
background workers	autovacuum, checkpointer, WAL writer, stats collector
WAL	Write-Ahead Log for durability and replication

Data files live under PGDATA (often /var/lib/postgresql/<version>/main on Ubuntu). Each database is a collection of schemas and tables stored in heap files.

Connection cost motivates PgBouncer pooling in production.

A strong answer is:

One backend process per connection—so pooling matters at scale—and WAL makes commits durable and feeds replication.

Database, schema, table — how does naming work?

Level	Example
Cluster	One PostgreSQL server instance
Database	Isolated namespace of schemas (`app_prod`)
Schema	Logical grouping (`public`, `billing`)
Table	`billing.invoices`

search_path controls unqualified name resolution—migrations should set explicit schemas to avoid public surprises.

sql


CREATE SCHEMA billing;
CREATE TABLE billing.invoices (
  id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  amount numeric(12,2) NOT NULL
);

A strong answer is:

I use schemas to separate domains in one database and qualify names in migrations so search_path never hides the wrong table.

Important PostgreSQL data types for interviews?

Type	Use
`bigint` / `int`	IDs, counters
`numeric(p,s)`	Money—avoid float
`text` vs `varchar`	`text` has no perf penalty in Postgres
`timestamptz`	Always prefer for instants (timezone-aware)
`uuid`	Distributed IDs (`gen_random_uuid()`)
`jsonb`	Semi-structured documents
`array`	Native arrays with operators
`enum`	Fixed label sets (migration caution)

Using timestamp without time zone for global apps is a common interview trap.

A strong answer is:

timestamptz for events, numeric for money, bigint or uuid for keys—I avoid float for currency and document why jsonb is not a default for every column.

Constraints and referential integrity?

Constraint	Purpose
`PRIMARY KEY`	Unique + not null row identity
`FOREIGN KEY`	Referential integrity
`UNIQUE`	Alternate keys
`NOT NULL`	Required columns
`CHECK`	Row-level rules
`EXCLUDE`	Prevent overlapping ranges (scheduling)

sql


CREATE TABLE order_items (
  order_id bigint REFERENCES orders(id) ON DELETE CASCADE,
  sku text NOT NULL,
  qty int CHECK (qty > 0),
  PRIMARY KEY (order_id, sku)
);

ON DELETE actions: CASCADE, SET NULL, RESTRICT, NO ACTION—know ORM defaults.

A strong answer is:

I enforce business rules in CHECK and FK constraints close to data—migrations add indexes on FK columns to avoid slow cascades and joins.

SERIAL vs IDENTITY vs UUID?

Approach	Notes
`SERIAL`	Legacy shorthand for sequence + default
`GENERATED … AS IDENTITY`	SQL standard; preferred in new schemas
`uuid`	No coordination across services; larger indexes

sql


CREATE TABLE users (
  id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  email citext UNIQUE NOT NULL
);

Sequences can have gaps (rollback, crash)—interviewers check you do not assume gapless IDs.

A strong answer is:

IDENTITY for monotonic numeric PKs in single-database apps; UUID when services generate IDs independently—I never assume sequences are gap-free.

MVCC, transactions, and concurrency

What is MVCC in PostgreSQL?

Multi-Version Concurrency Control keeps multiple row versions so readers do not block writers and writers do not block readers.

Each row version has system columns:

Column	Meaning
`xmin`	Inserting transaction ID
`xmax`	Deleting/updating transaction ID (if set)

A transaction sees a snapshot of committed rows visible at snapshot start. UPDATE creates a new row version; old version becomes a dead tuple until vacuum reclaims space.

This differs from in-place overwrite databases and explains why heavy UPDATE causes bloat.

A strong answer is:

MVCC gives snapshot isolation behavior—reads see a consistent view while writes create new tuple versions; dead versions need vacuum, which is core Postgres DBA knowledge.

Transaction isolation levels in PostgreSQL?

Postgres implements:

Level	Behavior
Read committed (default)	Each statement sees latest committed data
Repeatable read	Snapshot for whole transaction
Serializable	Serializable snapshot isolation (SSI)

sql


BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- consistent reads within this transaction
COMMIT;

Anomalies:

Read committed allows non-repeatable reads between statements
Repeatable read blocks phantom inserts in Postgres (stronger than standard RR)
Serializable may raise serialization failures—apps must retry

A strong answer is:

Default read committed is fine for most OLTP; I use repeatable read or serializable when invariants need stronger guarantees and handle retry on 40001.

Locks and blocking — what should developers know?

Postgres uses row-level locks for UPDATE/DELETE and table-level locks for DDL.

Useful views:

sql


SELECT * FROM pg_locks WHERE NOT granted;
SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock';

Pattern	Risk
Long transaction holding rows	Blocks vacuum, causes bloat
`SELECT … FOR UPDATE`	Explicit row locks for workflows
DDL on hot tables	`ACCESS EXCLUSIVE` blocks reads/writes

ORM sessions left open across HTTP requests are a common production blocker.

A strong answer is:

I keep transactions short, index FK columns, and check pg_stat_activity for lock chains—long open transactions hurt vacuum and throughput.

How does PostgreSQL handle deadlocks?

Deadlock detector picks a victim transaction and aborts it with SQLSTATE 40P01. Application must retry.

Prevention:

Lock rows in consistent order
Keep transactions small
Use advisory locks for application-level ordering

sql


BEGIN;
SELECT * FROM accounts WHERE id IN (1, 2) ORDER BY id FOR UPDATE;
-- updates
COMMIT;

Interviewers want you to mention retry with backoff in app code.

A strong answer is:

Postgres detects deadlocks and kills one transaction—I retry idempotent operations and design lock order to prevent cycles.

WAL and durability — what happens on COMMIT?

Write-Ahead Logging: changes go to WAL on disk before heap pages. COMMIT waits for WAL flush (depending on synchronous_commit).

Setting	Trade-off
`synchronous_commit = on`	Durability default
`off` / `local`	Faster commits, risk on crash

WAL enables PITR, streaming replication, and crash recovery.

A strong answer is:

COMMIT is durable when WAL is flushed—I do not disable synchronous_commit without understanding data-loss risk; WAL is also the replication stream.

Indexing strategies

PostgreSQL index types — when to use each?

Index	Best for
B-tree (default)	`=`, `<`, `>`, `BETWEEN`, `ORDER BY`
Hash	Equality only (niche)
GIN	JSONB, arrays, full-text
GiST	Geometry, ranges, full-text
BRIN	Very large, naturally ordered data (time-series)
SP-GiST	Partitioned/non-balanced structures

sql


CREATE INDEX idx_orders_created ON orders USING brin (created_at);
CREATE INDEX idx_docs_body ON docs USING gin (body jsonb_path_ops);

Wrong index type wastes space and slows writes—interviewers ask you to match operator class to query.

A strong answer is:

B-tree by default; GIN for jsonb @> and full-text; BRIN for append-only timestamps on huge tables—I justify type from the WHERE clause operators.

Composite, partial, and covering indexes?

Composite — multiple columns; leftmost prefix rule applies.

sql

CREATE INDEX idx_orders_user_created ON orders (user_id, created_at DESC);

Partial — indexes subset of rows:

sql


CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';

Smaller index, faster writes, perfect for hot filtered queries.

Covering index (INCLUDE):

sql

CREATE INDEX idx_orders_user_inc ON orders (user_id) INCLUDE (total_cents);

Enables index-only scans when visibility map allows.

A strong answer is:

Partial indexes for skewed filters like status='pending'; composite column order matches equality filters first, then range/sort.

Indexing JSONB — jsonb_ops vs jsonb_path_ops?

JSONB stores binary JSON with efficient operators.

sql


-- Containment query
SELECT * FROM products
WHERE attrs @> '{"color": "red"}';

CREATE INDEX idx_products_attrs ON products
USING gin (attrs jsonb_path_ops);

Operator class	Supports	Size
`jsonb_ops`	More operators	Larger
`jsonb_path_ops`	Mainly `@>`	Smaller, faster for containment

Tested containment on PostgreSQL 17:

sql


SELECT '{"tags": ["a","b"]}'::jsonb @> '{"tags": ["a"]}'::jsonb;
-- returns true

Prefer real columns for stable typed fields; JSONB for dynamic attributes.

A strong answer is:

jsonb_path_ops GIN when I only need @> containment; typed columns when the field is stable and filter-heavy.

Index trade-offs — when not to index?

Indexes speed reads and slow writes (every INSERT/UPDATE maintains index pages).

Skip or delay indexes when:

Table is tiny (seq scan cheaper)
Write-heavy with rare reads
Low selectivity column alone (e.g. boolean flag)
Wrong type for query pattern

Monitor unused indexes:

sql


SELECT * FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND schemaname = 'public';

A strong answer is:

I index for proven query patterns from pg_stat_statements, drop unused indexes, and accept seq scans on small tables.

CREATE INDEX CONCURRENTLY — why use it?

Normal CREATE INDEX takes SHARE lock blocking writes. CONCURRENTLY builds without blocking writes—slower, two-pass build.

sql

CREATE INDEX CONCURRENTLY idx_users_email ON users (email);

Failures leave invalid index—check pg_index.indisvalid and REINDEX if needed.

Production migrations on large tables almost always use CONCURRENTLY.

A strong answer is:

CONCURRENTLY for online index adds on live tables—I monitor for invalid indexes and reindex if the build failed mid-flight.

Query optimization and EXPLAIN

EXPLAIN and EXPLAIN ANALYZE — how do you read plans?

sql


EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 42 AND created_at > now() - interval '30 days';

Node	Concern
Seq Scan	Full table read—OK if small or most rows match
Index Scan / Bitmap	Index used—check rows removed by filter
Nested Loop	Good for small outer sets
Hash Join	Large equi-joins
Sort	May spill to disk if work_mem exceeded

Compare estimated vs actual rows—large gaps mean stale statistics → ANALYZE.

A strong answer is:

EXPLAIN ANALYZE shows real timings; I look for seq scans on big tables, bad row estimates, and buffer hits versus reads.

Why might Postgres choose a sequential scan?

Reasons:

Cause	Detail
Small table	Cheaper to read whole heap
Low selectivity	Reading most rows—index + heap fetch costs more
Missing stats	Run `ANALYZE`
Function on column	`WHERE lower(email) =` blocks plain index
Wide result set	Index scan + heap visits lose

Fix patterns: expression index, partial index, increase stats target, rewrite predicate.

A strong answer is:

Seq scan is not always wrong—I check row counts and selectivity; fix stats or add a targeted index when the scan is hot and large.

pg_stat_statements and slow query triage?

Extension pg_stat_statements aggregates query stats (calls, total time, mean time, rows).

sql


SELECT query, calls, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Triage flow:

Find top total time queries
EXPLAIN (ANALYZE) the normalized query
Add index or rewrite SQL
Verify with load test

Pair with log_min_duration_statement for raw slow logs.

A strong answer is:

I rank by total_exec_time in pg_stat_statements, EXPLAIN the winner, fix plan or index, and re-measure—not guess from ORM logs alone.

Join strategies and statistics?

Planner picks join order using table statistics (histograms,ndistinct).

Tips:

ANALYZE after large data changes
Join on indexed columns
Avoid cross joins accidentally in ORMs
JOIN smaller filtered set first when rewriting manually

sql


-- Filter early in CTE
WITH recent AS (
  SELECT id FROM orders WHERE created_at > now() - interval '7 days'
)
SELECT u.email, COUNT(*)
FROM recent r
JOIN orders o ON o.id = r.id
JOIN users u ON u.id = o.user_id
GROUP BY u.email;

A strong answer is:

Accurate stats drive join choices—I analyze after bulk loads and ensure join keys are indexed.

Pagination — OFFSET vs keyset?

OFFSET pagination degrades on large offsets (scans skipped rows):

sql


-- Slow at high page numbers
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;

Keyset (seek) pagination:

sql


SELECT * FROM orders
WHERE id > 100000
ORDER BY id
LIMIT 20;

Requires stable sort key; handle tie-breaker column.

A strong answer is:

Keyset pagination on indexed columns for infinite scroll; OFFSET only for small admin pages.

JSONB, advanced SQL, and extensions

JSON vs JSONB in PostgreSQL?

	JSON	JSONB
Storage	Text, preserves whitespace	Binary decomposed
Queries	Re-parsed each time	Faster operators
Indexing	Limited	GIN indexes
Writes	Faster insert	Slight overhead

Use JSONB for querying; JSON only if you need exact text preservation.

sql


SELECT attrs->>'sku' AS sku,
       attrs->'dimensions'->>'width' AS width
FROM products;

A strong answer is:

JSONB for application payloads I filter on; extract hot keys to typed columns when queries stabilize.

CTEs and window functions — Postgres strengths?

PostgreSQL optimizes CTEs (inline or materialize depending on version/planner)—use for readable pipelines.

sql


WITH monthly AS (
  SELECT date_trunc('month', created_at) AS m, SUM(amount) AS revenue
  FROM orders
  GROUP BY 1
)
SELECT m, revenue,
       revenue - LAG(revenue) OVER (ORDER BY m) AS mom_delta
FROM monthly;

Window functions: ROW_NUMBER, RANK, LEAD/LAG, SUM() OVER (PARTITION BY …)—see SQL technical guide for more patterns.

A strong answer is:

CTEs clarify multi-step analytics; windows for ranking and period-over-period without self-join explosion.

LATERAL joins — practical use?

LATERAL lets subquery reference columns from preceding FROM item—great for top-N per group:

sql


SELECT c.name, r.*
FROM customers c
CROSS JOIN LATERAL (
  SELECT *
  FROM orders o
  WHERE o.customer_id = c.id
  ORDER BY o.created_at DESC
  LIMIT 3
) r;

Alternative to window + filter patterns; planner can use indexes per customer.

A strong answer is:

LATERAL for correlated top-N per group—I reach for it when a subquery needs outer row values.

Common PostgreSQL extensions in interviews?

Extension	Use
pg_stat_statements	Query performance stats
citext	Case-insensitive text
pg_trgm	Fuzzy text search (`LIKE` acceleration)
PostGIS	Geospatial
pgvector	Vector embeddings / similarity search
uuid-ossp / pgcrypto	UUID and crypto helpers

sql


CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_users_name_trgm ON users USING gin (name gin_trgm_ops);

A strong answer is:

I enable extensions deliberately—pg_stat_statements in every prod cluster, pg_trgm or pgvector when product needs demand it.

UPSERT — ON CONFLICT?

sql


INSERT INTO inventory (sku, qty)
VALUES ('ABC', 10)
ON CONFLICT (sku) DO UPDATE
SET qty = inventory.qty + EXCLUDED.qty;

Requires UNIQUE or PRIMARY KEY on conflict target.

EXCLUDED refers to proposed insert row. Watch lock contention on hot keys.

A strong answer is:

ON CONFLICT for idempotent writes—I define unique constraints explicitly and handle conflict updates atomically.

DBA — vacuum, bloat, and maintenance

VACUUM and autovacuum — why are they critical?

UPDATE/DELETE leave dead tuples. VACUUM reclaims space for reuse, updates visibility map, prevents transaction ID wraparound.

Command	Effect
`VACUUM`	Reclaim dead tuple space (non-blocking)
`VACUUM ANALYZE`	Vacuum + refresh planner stats
`VACUUM FULL`	Rewrites table—locks exclusively (rare)

Autovacuum runs automatically—tune, do not disable.

Long transactions block vacuum cleanup → bloat.

A strong answer is:

Vacuum reclaims MVCC dead tuples and protects against xid wraparound—I tune autovacuum on high-churn tables and kill long idle transactions.

Table bloat — causes and fixes?

Bloat = heap pages holding dead tuples or sparse space.

Cause	Fix
Heavy updates	Tune autovacuum (`scale_factor`, `threshold`)
Long open transactions	End idle sessions
Bulk delete	`VACUUM` or `pg_repack` online

Check bloat estimates with pg_stat_user_tables (n_dead_tup) and tools like pgstattuple.

pg_repack rewrites online without VACUUM FULL exclusive lock.

A strong answer is:

Bloat from dead tuples and blocked vacuum—I lower autovacuum thresholds on hot tables and use pg_repack before VACUUM FULL in production.

ANALYZE and statistics?

ANALYZE samples table data to update pg_statistic for the planner.

When to run:

After large COPY/INSERT
When EXPLAIN estimates diverge from actual
Autovacuum analyze handles routine cases

sql


ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;
ANALYZE orders;

Higher statistics target improves estimates on skewed columns (costs more analyze time).

A strong answer is:

Bad plans often mean stale stats—I ANALYZE after bulk changes and raise statistics target on skewed filter columns.

Table partitioning — range, list, hash?

Native declarative partitioning (Postgres 10+):

sql


CREATE TABLE measurements (
  logdate date NOT NULL,
  value numeric
) PARTITION BY RANGE (logdate);

CREATE TABLE measurements_2026_06
  PARTITION OF measurements
  FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');

Strategy	Use
RANGE	Dates, monotonic IDs
LIST	Region, tenant bucket
HASH	Even spread when no natural key

Partition pruning skips irrelevant child tables—critical for performance.

A strong answer is:

Range partition time-series by month, attach new partitions ahead of time, and verify pruning in EXPLAIN.

Connection pooling — why PgBouncer?

Postgres backends are memory-heavy (~MB each). Apps opening thousands of connections exhaust RAM.

PgBouncer pools connections:

Mode	Behavior
Session	Client holds server for session
Transaction	Server returned after each transaction
Statement	Aggressive—breaks some session features

ORMs + serverless need pooling (PgBouncer, RDS Proxy, Supavisor).

A strong answer is:

I pool at PgBouncer transaction mode for web apps—raw Postgres connections do not scale to thousands of lambdas or pods.

Replication, backup, and high availability

Streaming replication — how does it work?

Primary streams WAL records to standby; standby replays WAL—physical replication, byte-for-byte copy.

Mode	Trade-off
Asynchronous	Lower latency, possible small loss on failover
Synchronous	Commit waits for replica ack—stronger durability

Standbys are hot (read-only) with hot_standby = on.

Monitor replication lag via pg_stat_replication.

A strong answer is:

WAL streaming keeps replica identical to primary—I monitor lag bytes and choose sync vs async based on RPO requirements.

Logical vs physical replication?

Physical	Logical
WAL byte replay	Decoded row changes
Same schema required	Selective tables, version upgrades
Whole cluster image	CDC, migrations, multi-tenant fan-out

sql


CREATE PUBLICATION orders_pub FOR TABLE orders;
CREATE SUBSCRIPTION orders_sub
  CONNECTION 'host=primary dbname=app'
  PUBLICATION orders_pub;

Logical slots retain WAL until consumed—lagging subscriber causes WAL bloat on primary (CDC interview topic).

A strong answer is:

Physical for HA failover; logical for selective sync and upgrades—I watch replication slot lag to prevent WAL disk fill on primary.

Backups and point-in-time recovery (PITR)?

Method	Role
pg_dump / pg_dumpall	Logical backup—portable, slower restore
Base backup + WAL archive	Physical; enables PITR

PITR flow: restore base backup to time T0, replay WAL to target timestamp.

Tools: WAL-G, pgBackRest, cloud automated backups.

Test restores regularly—untested backups are folklore.

A strong answer is:

pg_dump for logical portability; base backup plus WAL archiving for PITR—I schedule restore drills, not just backups.

Failover and high availability patterns?

Options:

Pattern	Notes
Manual promote	`pg_ctl promote` on standby
Patroni / repmgr	Automated leader election
Cloud managed	RDS Multi-AZ, Cloud SQL HA

Application needs connection targeting to new primary and handling brief errors during failover.

Read replicas offload read scaling; writes still go to primary (unless sharded).

A strong answer is:

HA needs automated failover plus app retry—I use managed HA or Patroni and test promote without losing replication slots blindly.

Roles, grants, and Row Level Security?

Postgres roles can be users or groups:

sql


CREATE ROLE app_read NOLOGIN;
GRANT CONNECT ON DATABASE shop TO app_read;
GRANT USAGE ON SCHEMA public TO app_read;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_read;

Row Level Security (RLS) filters rows per session:

sql


ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON orders
  USING (tenant_id = current_setting('app.tenant_id')::bigint);

Least privilege: app role without SUPERUSER, CREATEDB, or broad DDL.

A strong answer is:

Separate roles per app with minimal grants; RLS for multi-tenant row isolation when defense in depth matters.

Scenarios and final prep

Scenario: Query suddenly slow after deploy — debug steps?

Checklist:

pg_stat_statements — new query shape or regressed mean time?
EXPLAIN (ANALYZE, BUFFERS) — plan change? seq scan?
Statistics — run ANALYZE after migration bulk load
Indexes — migration drop index accidentally?
Data volume — crossed threshold where plan flips
Locks — blocking DDL or long transaction?
Connection storm — pool saturation?

Narrate before diving into \d and catalogs.

A strong answer is:

I compare EXPLAIN before and after, check stats and indexes first, then locks—deploys that bulk-load without ANALYZE are a frequent culprit.

Scenario: Replica lag growing — what do you check?

Check	Action
`pg_stat_replication`	Lag bytes, replay pause
Primary load	Spike in writes or WAL generation
Long queries on replica	Cancel heavy reporting queries
Missing hot_standby_feedback	Conflicts on replica
Network / disk IO	Infrastructure bottleneck
Logical slot consumer down	WAL retention on primary

For CDC slots (Debezium), lag blocks vacuum on primary tables.

A strong answer is:

I read lag in bytes and replay paused state—kill runaway reporting on replica and fix stalled logical consumers before primary WAL disk fills.

Scenario: Zero-downtime schema migration?

Patterns:

Technique	Example
Expand-contract	Add nullable column → backfill → enforce NOT NULL
CONCURRENTLY indexes	Avoid write locks
Dual write / read	App writes both schemas during transition
Triggers	Sync old/new tables temporarily

Dangerous: ALTER TABLE … ADD COLUMN DEFAULT on huge table (rewrite history in older versions—know version behavior), blocking ACCESS EXCLUSIVE DDL.

Use Flyway/Liquibase with review for lock modes.

A strong answer is:

Expand-contract with concurrent index builds—I avoid blocking DDL on hot tables and backfill in batches with throttling.

PostgreSQL vs MongoDB — when which?

Choose Postgres	Choose MongoDB
Strong relations, ACID transactions	Flexible schema, document model
Complex joins, reporting SQL	Rapid nested document iteration
JSONB when hybrid needed	Horizontal shard patterns native
Mature constraints, RLS	Workload already document-centric

See MongoDB interview questions for document-store depth.

JSONB does not make Postgres a document DB—it lacks Mongo's sharding story out of the box.

A strong answer is:

Postgres for relational integrity and SQL analytics with optional JSONB; Mongo when document model and horizontal scale patterns are central—I do not store everything in jsonb by default.

What should you rehearse before PostgreSQL interviews?

Checklist:

MVCC and why vacuum exists
Isolation levels and deadlock retry
Index types — B-tree, GIN, BRIN, partial
EXPLAIN ANALYZE reading
JSONB operators and GIN ops class
Autovacuum and bloat story
Streaming vs logical replication
Backup / PITR basics
Connection pooling rationale
RLS and role grants
One slow query debug narrative
SQL technical interviews for live coding
Spring Boot / Django for ORM layer
Kafka for CDC slots
Pandas for analytics downstream

A strong answer is:

I rehearse EXPLAIN on a real query weekly, explain MVCC to a rubber duck, and tie one production story to vacuum, indexes, and replication lag.

Pattern cheat sheet (quick reference)

Task	PostgreSQL approach
Primary keys	`GENERATED ALWAYS AS IDENTITY` or `uuid`
Money	`numeric`, not `float`
Timestamps	`timestamptz`
Flexible attributes	`jsonb` + GIN (`jsonb_path_ops`)
Hot filtered queries	Partial index
Online index add	`CREATE INDEX CONCURRENTLY`
Slow query triage	`pg_stat_statements` → `EXPLAIN ANALYZE`
Dead tuple cleanup	autovacuum (tuned, not disabled)
HA reads	Streaming replica
Selective sync / CDC	Logical replication + slot monitoring
Multi-tenant rows	RLS policies
App connection storm	PgBouncer transaction pooling
Top-N per group	`LATERAL` or window functions

References

On-site prep

Summary

PostgreSQL interviews separate candidates who only write SELECTs from those who understand MVCC, vacuum, index types, and EXPLAIN plans under real load. Use this guide as a self-test: run EXPLAIN on a slow query, describe autovacuum tuning, and practice explaining answers aloud. Pair Postgres depth with SQL technical interviews for live coding, Django or Spring Boot for ORM integration, and Kafka when logical replication feeds event pipelines.

Interview context and how to prepare

Architecture and PostgreSQL fundamentals

MVCC, transactions, and concurrency

Indexing strategies

Query optimization and EXPLAIN

JSONB, advanced SQL, and extensions

DBA — vacuum, bloat, and maintenance

Replication, backup, and high availability

Scenarios and final prep

Pattern cheat sheet (quick reference)

References

Summary

Related Articles

DBMS Interview Questions and Answers

SQL Technical Interview Questions and Answers

SQL Tutorial for Beginners (Hands-On with Real Examples)

Search GoLinuxCloud