DBMS interview questions, basic dbms interview questions, and common dbms interview questions show up in campus placements, backend screening rounds, and data roles before vendors like PostgreSQL or Oracle dominate the conversation. Interviewers expect you to explain DBMS vs RDBMS, walk through ER modeling and keys, justify normal forms, describe ACID and isolation levels, compare index types, and reason about SQL vs NoSQL trade-offs—not only write SELECT statements.
Below are 45 questions with elaborate answers; technical sections include a strong answer sample you can say aloud. Pair with SQL technical interview questions for live query writing, PostgreSQL interview questions for engine-specific depth, MongoDB interview questions for document-store contrast, data science interview questions for analytics pipelines, Python developer interview questions for ORM context, and full stack developer interviews when system design ties app tiers to persistence.
Tested on: Ubuntu 25.04 (Plucky Puffin); kernel 6.14.0-37-generic; PostgreSQL 17.7 for transaction and constraint examples (
{run=false}SQL).
Interview context and how to prepare
What do DBMS interviews actually test?
DBMS interviews test foundational database theory that every application engineer should know—even if daily work hides it behind an ORM.
| Layer | What interviewers probe |
|---|---|
| Concepts | DBMS vs RDBMS, schema vs instance, three-schema architecture |
| Modeling | ER diagrams, keys, mapping to tables |
| Design | Normalization (1NF–BCNF), when to denormalize |
| Transactions | ACID, isolation levels, deadlocks |
| Physical layer | Indexes, query plans at a high level |
| Architecture | Replication, backup types, SQL vs NoSQL |
| Role | Emphasis |
|---|---|
| Campus / junior | ER model, normal forms, SQL basics |
| Backend | Transactions, indexing rationale, ORM pitfalls |
| Data / analytics | Star schema, denormalization, ETL idempotency |
| DBA track | Concurrency, recovery, backup/restore |
Campus and experienced loops both expect you to explain normalization, keys, and ACID with examples—not only memorize definitions.
DBMS vs RDBMS — what is the difference?
A DBMS (Database Management System) is software that lets users define, store, retrieve, and manage data with controlled access. The term is broad—it can include hierarchical, network, object-oriented, or document systems.
An RDBMS (Relational DBMS) is a DBMS that stores data in tables (relations) with rows and columns, enforces relational integrity via keys and constraints, and uses SQL (or SQL-like languages) for access.
| Aspect | DBMS (general) | RDBMS |
|---|---|---|
| Data model | Various (files, graphs, documents) | Tables / relations |
| Relationships | Model-dependent | Foreign keys, joins |
| Query language | Varies | SQL standard (mostly) |
| Examples | MongoDB, Redis, Neo4j | PostgreSQL, MySQL, Oracle |
Every RDBMS is a DBMS; not every DBMS is relational. In interviews, say RDBMS when discussing joins, normalization, and ACID on tabular schemas; say DBMS when comparing storage paradigms broadly.
DBMS theory for developers vs DBAs?
| Topic | Developer loop | DBA loop |
|---|---|---|
| Schema | ER → tables, migrations | Capacity, partitioning |
| Normalization | Avoid update anomalies in app schema | Rarely redesigns live OLTP casually |
| Transactions | Boundaries in code, retry on deadlock | Lock monitoring, long transactions |
| Indexes | "We need one for this WHERE clause" | Index bloat, maintenance windows |
| Backup | Assumes restore works | RPO/RTO, PITR, drill restores |
| HA | Connection strings, read replicas | Failover, replication lag |
Basic dbms interview questions skew toward theory (normal forms, keys). Senior backend loops add isolation anomalies and index trade-offs. DBA tracks go deeper on recovery and replication—see PostgreSQL interview questions for engine-specific depth.
What is a typical DBMS interview loop?
| Round | Duration | Focus |
|---|---|---|
| Aptitude / screening | 30 min | SQL basics, DBMS definitions |
| Theory | 45 min | ER, normalization, ACID |
| SQL coding | 45–60 min | Joins, aggregates—see SQL prep |
| System design (senior) | 45 min | SQL vs NoSQL, caching, sharding |
| Scenario | 30 min | "Design a library / e-commerce schema" |
Campus rounds often ask draw ER diagram for university registration; product companies add transaction isolation and index choice for backend roles.
What is a realistic 3–5 week DBMS prep plan?
| Week | Focus | Output |
|---|---|---|
| 1 | DBMS vs RDBMS, ER model, keys | Draw ER for library or hospital system |
| 2 | 1NF–BCNF with examples | Normalize a denormalized order table |
| 3 | ACID, isolation, deadlocks | Explain transfer + phantom read |
| 4 | Indexes, query processing basics | B-tree vs hash; when index hurts INSERT |
| 5 | SQL vs NoSQL, CAP, mock scenarios | Defend Postgres vs Mongo for one use case |
Practice 10 SQL problems weekly from SQL technical interview questions. Run transactions locally on PostgreSQL to see COMMIT / ROLLBACK behavior.
Database fundamentals
Data vs information?
Data are raw facts—numbers, strings, timestamps—without context. Information is data processed into a meaningful form for decisions.
| Example | Data | Information |
|---|---|---|
| Retail | 42, SKU-9, 2026-06-27 |
"42 units of SKU-9 sold today—reorder threshold breached" |
| Logs | HTTP 500 at 14:02 |
"Error rate spiked 3× after deploy at 14:00" |
A DBMS stores data; applications and reports turn it into information via constraints, relationships, and queries. Metadata (schema, statistics) describes the data itself.
A strong answer is:
Data are raw values; information is data with context and meaning—a DBMS stores data; queries and business rules produce information for decisions.
Why use a database instead of files?
Flat files work for simple scripts; production systems need structured concurrent access.
| Concern | Files | DBMS |
|---|---|---|
| Concurrency | Race conditions, manual locking | Transaction manager, locks |
| Integrity | App must enforce rules | Constraints, foreign keys |
| Query | Parse entire file | Indexed lookup, optimizer |
| Security | OS file permissions | Roles, grants, RLS (engine-dependent) |
| Recovery | Manual backups | WAL, point-in-time recovery |
| Schema evolution | Breaking changes fragile | Migrations, versioning |
Files remain fine for logs, CSV exports, and static assets; OLTP and shared mutable state belong in a DBMS.
A strong answer is:
Files lack concurrent safe updates, declarative integrity, and efficient indexed queries—a DBMS gives transactions, constraints, and recovery that files force every app to reinvent.
Schema vs instance vs database?
| Term | Meaning |
|---|---|
| Database | Organized collection of related data managed by DBMS |
| Schema | Logical structure—tables, columns, constraints, views (the design) |
| Instance | Actual data stored at a moment in time (the content) |
Analogy: schema is the blueprint of a building; instance is the furniture and occupants inside.
One DBMS server can host multiple databases (e.g. postgres, myapp_prod). Each database has its own schema namespace. Catalog / system tables store metadata about schemas.
A strong answer is:
Schema is the logical design—tables and constraints; instance is the live data at a point in time; I do not confuse CREATE TABLE (schema) with INSERT (instance).
Three-schema architecture?
ANSI/SPARC three-level architecture separates concerns so storage changes do not break applications.
| Level | Name | What it describes | Audience |
|---|---|---|---|
| External | View level | User-specific views | App developers, analysts |
| Conceptual | Logical level | Entities, relationships, constraints | Data modelers |
| Internal | Physical level | Files, indexes, allocation | DBAs, engine |
Mapping layers translate between levels. Logical data independence: change conceptual schema with minimal external view changes. Physical data independence: move indexes or storage without rewriting apps.
A strong answer is:
Three-schema architecture separates user views, logical ER model, and physical storage—so I can add an index or split a table without rewriting every application query.
Major DBMS components?
| Component | Role |
|---|---|
| DDL processor | Parses CREATE/ALTER/DROP; updates catalog |
| DML processor | Runs INSERT/UPDATE/DELETE/SELECT |
| Query processor | Parser → optimizer → execution plan |
| Storage manager | Pages, buffers, disk I/O |
| Transaction manager | ACID, commit/abort, isolation |
| Recovery manager | WAL, checkpoint, crash recovery |
| Concurrency control | Locks, timestamps, MVCC (engine-specific) |
Understanding the pipeline helps in interviews: "Why is this query slow?" → parse OK, bad plan or missing index or lock wait.
A strong answer is:
I name the query processor, storage manager, transaction manager, and recovery manager—then tie slow queries to optimizer plans and lock waits, not mysticism.
Common data models in DBMS?
| Model | Structure | Example systems |
|---|---|---|
| Relational | Tables, keys | PostgreSQL, MySQL |
| Document | JSON/BSON documents | MongoDB |
| Key-value | Opaque key → value | Redis, DynamoDB |
| Column-family | Wide columns per row key | Cassandra, HBase |
| Graph | Nodes and edges | Neo4j |
| Hierarchical / Network | Legacy models | IMS, IDMS (historical) |
Relational dominates OLTP with strong consistency needs. Document fits flexible schemas; graph fits relationship-heavy traversals. Choice depends on access patterns, not hype.
A strong answer is:
Relational for tabular integrity and joins; document for flexible nested records; graph for deep relationship traversals—I pick by query pattern, not buzzwords.
DDL, DML, DCL, and TCL?
SQL command categories:
| Category | Purpose | Examples |
|---|---|---|
| DDL (Data Definition) | Schema structure | CREATE, ALTER, DROP, TRUNCATE |
| DML (Data Manipulation) | Data changes | INSERT, UPDATE, DELETE, SELECT |
| DCL (Data Control) | Permissions | GRANT, REVOKE |
| TCL (Transaction Control) | Transaction boundaries | BEGIN, COMMIT, ROLLBACK, SAVEPOINT |
TRUNCATE is DDL-like—fast, often cannot roll back, resets identity. DELETE is DML—row-level, logged, transactional.
BEGIN;
INSERT INTO accounts (id, balance) VALUES (1, 100);
ROLLBACK; -- row not visible after rollbackA strong answer is:
DDL defines schema; DML changes rows; DCL grants access; TCL wraps units of work—I use COMMIT/ROLLBACK so partial updates never leave inconsistent balances.
ER model and relational mapping
What is the ER model?
The Entity-Relationship (ER) model is a conceptual diagram notation for database design before tables exist.
| Element | Meaning |
|---|---|
| Entity | Object type (Student, Course, Order) |
| Attribute | Property (name, date, amount) |
| Relationship | Association between entities (enrolls, contains) |
| Cardinality | 1:1, 1:N, M:N |
Weak entity depends on another entity for identity (OrderLine depends on Order). Derived attribute computed from others (age from birthdate)—often not stored.
ER diagrams communicate with stakeholders before migration scripts. Mapping rules convert ER → relational tables.
A strong answer is:
ER model is conceptual—entities, attributes, relationships, cardinality—I draw it first with stakeholders, then map to tables with keys before writing DDL.
Cardinality: 1:1, 1:N, M:N?
| Cardinality | Meaning | Relational mapping |
|---|---|---|
| 1:1 | One A ↔ one B | Foreign key on either side (or merge if always together) |
| 1:N | One A → many B | FK on the many side |
| M:N | Many A ↔ many B | Junction table with two FKs |
Example M:N: Students enroll in Courses → enrollments(student_id, course_id, grade).
Participation: total (every entity must participate) vs partial. Optional relationship → nullable FK.
A strong answer is:
1:N puts the foreign key on the many side; M:N needs a junction table with both keys—I state cardinality before writing CREATE TABLE.
How do you map ER to relational tables?
| ER construct | Relational table |
|---|---|
| Strong entity | Table; PK on identifier |
| Weak entity | Table; PK includes owner FK |
| 1:N relationship | FK on N-side |
| M:N relationship | Junction table + composite PK |
| Multivalued attribute | Separate table (entity_id, value) |
| 1:1 relationship | FK on one side; unique constraint |
Example — university:
students(id, name)courses(id, title)enrollments(student_id, course_id, semester)PK(student_id, course_id)
Avoid NULL-heavy designs when relationship attributes exist—junction table carries enrollment date, grade, etc.
A strong answer is:
Each entity becomes a table; 1:N is a foreign key on the many side; M:N is a junction table—I move relationship attributes onto the table that represents the association.
Entity set vs relationship set?
An entity set is a collection of entities of the same type—all Employee rows share attributes like emp_id, name.
A relationship set is a collection of relationship instances linking entities—Works_In(emp, dept) pairs employees with departments.
In relational terms:
- Entity set → table
- Relationship set → FK columns or junction table
Relationship attributes (e.g. since_date on Works_In) belong on the relationship representation—not duplicated on both entity tables.
A strong answer is:
Entity set is a type of object; relationship set is associations between them—in SQL, relationships become foreign keys or a bridge table with their own attributes.
Weak entity and identifying relationship?
A weak entity cannot be uniquely identified by its own attributes alone—it depends on a owner entity.
Example: InvoiceLine identified by (invoice_id, line_no)—line_no is unique only per invoice.
| Concept | Role |
|---|---|
| Owner entity | Strong entity (Invoice) |
| Identifying relationship | Links weak to owner |
| Partial key | Discriminator within owner scope (line_no) |
DDL: composite primary key (invoice_id, line_no) with invoice_id FK → invoices(id) ON DELETE CASCADE often appropriate.
A strong answer is:
Weak entities need the owner's key in their primary key—InvoiceLine uses (invoice_id, line_no) because line numbers repeat across invoices.
Keys, constraints, and integrity
Super key, candidate key, primary key, alternate key?
| Key type | Definition |
|---|---|
| Super key | Any set of attributes that uniquely identifies a row |
| Candidate key | Minimal super key—no proper subset is also a super key |
| Primary key | Chosen candidate key for the table |
| Alternate key | Candidate key not selected as PK (enforced UNIQUE) |
| Foreign key | References PK (or unique) in another table |
Example employees:
- Super keys:
{emp_id},{emp_id, name},{email}(if unique) - Candidate keys:
{emp_id},{email} - Primary key:
emp_id - Alternate key:
emailUNIQUE
A strong answer is:
Candidate keys are minimal unique identifiers; primary key is the main one; alternate keys stay unique—foreign keys reference parent primary or unique keys.
Composite key vs surrogate key?
Composite key: primary key spans multiple columns—natural for junction tables (order_id, product_id).
Surrogate key: artificial identifier (SERIAL, UUID) with no business meaning.
| Approach | Pros | Cons |
|---|---|---|
| Natural / composite | Matches business rules | Wide FKs; key changes are painful |
| Surrogate | Stable, narrow FKs | Extra column; not business-meaningful |
Use surrogate when natural keys are long, composite, or unstable (username changes). Keep natural unique constraints where business requires (email UNIQUE).
A strong answer is:
Composite keys fit junction tables; surrogate keys simplify joins when natural keys are wide or unstable—I still enforce business uniqueness with alternate constraints.
Foreign key constraints and referential actions?
A foreign key enforces referential integrity: child values must exist in parent (or be NULL if allowed).
CREATE TABLE orders (
id int PRIMARY KEY,
customer_id int NOT NULL REFERENCES customers(id)
ON DELETE RESTRICT
ON UPDATE CASCADE
);| Action | ON DELETE behavior |
|---|---|
| RESTRICT / NO ACTION | Block delete if children exist |
| CASCADE | Delete children too |
| SET NULL | Null out FK in children |
| SET DEFAULT | Set default value |
RESTRICT protects accidental orphan deletes in OLTP. CASCADE suits strict ownership (line items die with order).
A strong answer is:
Foreign keys keep child rows honest—I default to ON DELETE RESTRICT for safety and CASCADE only when child rows have no meaning without the parent.
Types of integrity constraints?
| Type | Enforces |
|---|---|
| Entity integrity | Primary key unique and NOT NULL |
| Referential integrity | FK values match parent or NULL |
| Domain integrity | Valid types, CHECK, ENUM |
| User-defined / business | Custom rules (triggers, CHECK) |
CREATE TABLE products (
id int PRIMARY KEY,
price numeric CHECK (price >= 0),
sku text UNIQUE NOT NULL
);ORMs map these to validators—but database constraints are the last line of defense against buggy deploys.
A strong answer is:
Entity, referential, and domain integrity—PKs, foreign keys, and CHECK constraints—I put non-negotiable rules in the database, not only in application code.
NULL in relational databases?
NULL means unknown or not applicable—not zero, not empty string.
| Expression | Result |
|---|---|
NULL = NULL |
Unknown (not TRUE) |
NULL AND TRUE |
NULL |
COUNT(*) |
Counts all rows including NULLs |
COUNT(column) |
Ignores NULLs |
Use IS NULL / IS NOT NULL. Three-valued logic (TRUE, FALSE, UNKNOWN) affects WHERE filters—rows with UNKNOWN do not match.
Design tip: use NULL for optional fields; avoid NULL in PKs; consider NOT NULL DEFAULT when a sentinel is clearer.
A strong answer is:
NULL is unknown, not zero—comparisons use IS NULL; I avoid NULL primary keys and explain three-valued logic when debugging surprising WHERE results.
Normalization
Why normalize?
Normalization decomposes tables to reduce redundancy and update anomalies.
| Anomaly | Problem |
|---|---|
| Insert | Cannot add data without unrelated data |
| Update | Must change many rows for one fact |
| Delete | Removing one fact loses unrelated data |
Example denormalized order_lines(customer_name, customer_city, product, qty):
- Update customer city in every row they ordered
- Delete last order line loses customer address
Normalization splits into customers, products, orders, order_lines.
Trade-off: more joins at read time—sometimes denormalize deliberately for reporting (see Q28).
A strong answer is:
Normalization removes redundancy so updates happen in one place—I explain insert/update/delete anomalies on a messy table, then decompose it.
First normal form (1NF)?
A table is in 1NF when:
- Each column holds atomic (indivisible) values
- Each row is unique (has a key)
- No repeating groups in one column
Violations:
phone_numbers: "555-1, 555-2"→ split tocustomer_phonestable- Multiple
course1,course2columns → row per enrollment
1NF is the baseline—almost all RDBMS tables satisfy it by design.
A strong answer is:
1NF means atomic values and no repeating groups—I split multi-valued columns into separate rows or tables instead of comma-separated lists.
Second normal form (2NF)?
A table is in 2NF when:
- It is in 1NF
- Every non-key attribute depends on the whole primary key—not a proper subset (no partial dependency)
Applies when PK is composite. Example violation:
enrollments(student_id, course_id, student_name, course_title, grade)
student_namedepends only onstudent_id(partial)course_titledepends only oncourse_id(partial)
Fix: move student_name → students; course_title → courses; keep grade on enrollments.
A strong answer is:
2NF removes partial dependencies on composite keys—I move attributes that depend on only part of the key into their own tables.
Third normal form (3NF)?
A table is in 3NF when:
- It is in 2NF
- No non-key attribute depends on another non-key attribute (transitive dependency)
Example violation:
employees(emp_id, dept_id, dept_name, dept_location)
dept_nameanddept_locationdepend ondept_id, not directly onemp_id
Fix: departments(dept_id, dept_name, dept_location) + employees(emp_id, dept_id, ...).
Most OLTP schemas target 3NF unless reporting needs dictate otherwise.
A strong answer is:
3NF eliminates transitive dependencies—I pull attributes that depend on non-key columns into separate tables like departments separate from employees.
BCNF vs 3NF?
BCNF (Boyce-Codd Normal Form) strengthens 3NF:
Every determinant (attribute that determines others) must be a candidate key.
| Form | Rule |
|---|---|
| 3NF | Non-key attrs depend only on keys; exceptions for overlapping keys |
| BCNF | Stricter—every functional dependency left side is a super key |
3NF but not BCNF example (classic):
student_advisor(student, subject, advisor) where each student has one advisor per subject, and each advisor teaches one subject → decompose further.
In practice 3NF ≈ BCNF for most business schemas. BCNF matters in academic questions and odd dependency patterns.
A strong answer is:
BCNF says every determinant is a candidate key—stricter than 3NF; I know the classic student-advisor example but most production schemas stop at 3NF.
When is denormalization justified?
Denormalization intentionally adds redundancy for read performance or simpler queries.
| Use case | Technique |
|---|---|
| Reporting / OLAP | Star schema, pre-aggregated facts |
| Read-heavy dashboards | Cached counters (order_count on customer) |
| Reduce join cost | Duplicate display names on fact tables |
| Document stores | Embed related records (MongoDB) |
Costs: update anomalies return—you must sync denormalized copies (triggers, batch jobs, CDC).
Rule: normalize first, measure slow queries, denormalize with a documented consistency strategy.
A strong answer is:
I normalize OLTP to 3NF first, then denormalize only with metrics—star schemas for analytics or cached counts with a clear update path, not upfront duplication.
Functional dependency (FD)?
A functional dependency X → Y means: if two rows agree on X, they must agree on Y.
Notation: student_id → student_name
| FD type | Example |
|---|---|
| Trivial | (emp_id, name) → emp_id |
| Non-trivial | emp_id → name |
| Full | Depends on whole composite key (2NF) |
| Partial | Depends on subset of composite key (2NF violation) |
| Transitive | A → B → C where B not a key (3NF violation) |
Normalization algorithms decompose tables using FDs. Interviewers may ask you to list FDs from a table definition.
A strong answer is:
Functional dependencies describe what determines what—I list them from a schema, then use them to justify 2NF and 3NF decomposition steps.
Scenario: Normalize this order table?
Given (denormalized):
orders(order_id, customer_name, customer_email, product_name, qty, unit_price)
Problems:
- Customer data repeats per line → update anomaly on email change
- Product name repeats → rename product touches many rows
- Cannot list products without an order (insert anomaly)
3NF decomposition:
customers(customer_id PK, name, email)
products(product_id PK, name, unit_price)
orders(order_id PK, customer_id FK, order_date)
order_lines(order_id FK, product_id FK, qty, line_total)line_total may be derived (qty * unit_price) or stored for historical pricing.
A strong answer is:
I split customers, products, orders, and order_lines—removing partial and transitive dependencies so customer email updates happen once in customers.
Transactions, ACID, and concurrency
Explain ACID properties?
ACID guarantees reliable transactions:
| Property | Meaning | Example |
|---|---|---|
| Atomicity | All or nothing | Transfer debit + credit both commit or both roll back |
| Consistency | Valid state → valid state | Constraints hold after transaction |
| Isolation | Concurrent txs appear serial | No dirty reads (at proper level) |
| Durability | Committed data survives crash | WAL flushed to disk |
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;If second UPDATE fails, ROLLBACK undoes the debit—atomicity. Consistency is partly app + constraint responsibility; engine provides mechanisms.
A strong answer is:
ACID: transfer example for atomicity, constraints for consistency, isolation levels for concurrent reads, WAL for durability—I tie each letter to a concrete failure mode.
Transaction states?
| State | Description |
|---|---|
| Active | Executing statements |
| Partially committed | Final statement done; commit not yet durable |
| Committed | Changes permanent |
| Failed | Normal execution cannot proceed |
| Aborted | Rolled back; DB unchanged by this tx |
| Terminated | Left the system |
BEGIN starts active state. COMMIT → committed → terminated. Error or ROLLBACK → aborted → terminated.
Savepoints allow partial rollback inside a transaction: SAVEPOINT sp1; ... ROLLBACK TO sp1;
A strong answer is:
Active until COMMIT or ROLLBACK—failed transactions abort; I mention savepoints when a long transaction needs partial undo without full rollback.
Isolation levels and anomalies?
SQL standard isolation levels (lowest to highest):
| Level | Dirty read | Non-repeatable read | Phantom read |
|---|---|---|---|
| READ UNCOMMITTED | Possible | Possible | Possible |
| READ COMMITTED | No | Possible | Possible |
| REPEATABLE READ | No | No | Possible* |
| SERIALIZABLE | No | No | No |
*PostgreSQL REPEATABLE READ also prevents phantoms via MVCC snapshot; MySQL InnoDB behavior varies—know your engine.
| Anomaly | What happens |
|---|---|
| Dirty read | Read uncommitted data from another tx |
| Non-repeatable read | Same row read twice, different values |
| Phantom read | Same query returns different row sets |
Default on PostgreSQL is READ COMMITTED—good balance for most OLTP.
A strong answer is:
I compare isolation levels with the dirty/non-repeatable/phantom table—READ COMMITTED for most apps, SERIALIZABLE when financial correctness demands it despite retries.
Concurrency control techniques?
| Technique | Idea |
|---|---|
| Lock-based | Shared (read) vs exclusive (write) locks |
| Two-phase locking (2PL) | Grow locks then shrink—guarantees serializability |
| MVCC | Readers see snapshot; writers create new versions (PostgreSQL) |
| Optimistic | Read freely; validate on commit (version column) |
| Pessimistic | SELECT FOR UPDATE locks early |
Lost update without control: two txs read balance 100, both write 90—should be 80.
Fix: row-level lock, atomic UPDATE balance = balance - 10, or optimistic retry.
A strong answer is:
Lock-based 2PL, MVCC snapshots, or optimistic versioning—I pick pessimistic locks for contested balances and MVCC for read-heavy Postgres workloads.
What is deadlock and how is it handled?
Deadlock: cycle of transactions each waiting for a lock held by another.
Tx A: lock row 1 → wait row 2
Tx B: lock row 2 → wait row 1Detection: DBMS builds wait-for graph; cycle → abort victim (youngest tx, least work).
Prevention:
- Lock rows in consistent order (always
idascending) - Keep transactions short
- Use lower isolation when safe
- Retry on deadlock error (
40001in PostgreSQL)
-- App pattern: retry up to N times on deadlock
BEGIN;
SELECT * FROM accounts WHERE id IN (1,2) ORDER BY id FOR UPDATE;
-- transfer logic
COMMIT;A strong answer is:
Deadlock is a lock cycle—the DBMS kills a victim transaction; I prevent it by locking in consistent order, keeping txs short, and retrying on deadlock errors in app code.
Scenario: Safe money transfer transaction?
Requirements: debit and credit must be atomic; balance never negative; concurrent transfers safe.
BEGIN;
SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;
SELECT balance FROM accounts WHERE id = 2 FOR UPDATE;
UPDATE accounts SET balance = balance - 100 WHERE id = 1 AND balance >= 100;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- verify row counts / balances; ROLLBACK if insufficient funds
COMMIT;Steps to mention:
- BEGIN explicit transaction
- Lock both rows (
FOR UPDATE) in deterministic order - Check balance before debit
- COMMIT or ROLLBACK on any failure
- Idempotent client token for retries (avoid double transfer)
Isolation: READ COMMITTED + row locks usually enough; SERIALIZABLE for strict audit systems with retry logic.
A strong answer is:
BEGIN, lock both accounts in id order with FOR UPDATE, check balance, debit and credit in one transaction, COMMIT or ROLLBACK—I add idempotency keys for client retries.
Indexing and query processing
Index types in DBMS?
| Index | Use case |
|---|---|
| B-tree (default) | Range queries, sorting, equality |
| Hash | Equality only; no range |
| Bitmap | Low-cardinality columns (warehouse) |
| Full-text | Text search |
| Composite | Multi-column WHERE / ORDER BY |
| Covering | INCLUDE columns avoid table lookup |
| Partial | Subset of rows (active = true) |
Clustered (InnoDB PK): data rows stored in index order. Non-clustered: separate structure pointing to rows.
Indexes speed SELECT; slow INSERT/UPDATE/DELETE (maintain index).
A strong answer is:
B-tree for most OLTP equality and range; composite indexes match leftmost WHERE columns—I mention write overhead and avoid indexing every column.
Clustered vs non-clustered index?
| Clustered | Non-clustered | |
|---|---|---|
| Data layout | Table rows in index order | Index separate from heap |
| Per table | Usually one (PK) | Many allowed |
| Lookup | Range scan on PK is fast | Extra hop to heap |
| Examples | InnoDB PK, SQL Server clustered | PostgreSQL (all indexes non-clustered) |
PostgreSQL: table is heap; even PK index is non-clustered—CLUSTER command reorders physically (one-time).
Choose clustered key carefully—it is the physical order (often narrow, monotonic id).
A strong answer is:
Clustered index is the table order—one per table on SQL Server/InnoDB; PostgreSQL uses heap storage with separate indexes—I do not confuse logical PK with physical clustering.
Query processing pipeline?
Stages when you run SELECT:
- Parser — syntax check, build parse tree
- Validator — tables/columns exist; type check
- Query rewriter — views expanded, rule-based transforms
- Optimizer — cost-based plan (seq scan vs index scan, join order)
- Execution engine — runs plan, returns rows
Statistics (row counts, histograms) feed the optimizer—stale stats → bad plans.
Interview link: PostgreSQL EXPLAIN for engine-specific plan reading.
A strong answer is:
Parse, validate, optimize, execute—I say the optimizer picks plans from statistics, so stale stats cause slow queries even with indexes present.
Scenario: Which index for this query?
Query:
SELECT * FROM orders
WHERE customer_id = 42 AND status = 'open'
ORDER BY created_at DESC
LIMIT 20;Reasoning:
| Option | Verdict |
|---|---|
Index on (customer_id) |
Good filter; may sort large set |
Index on (customer_id, status, created_at DESC) |
Best—filter + order without sort |
Index on (status) alone |
Weak—low selectivity if few statuses |
Partial index if most queries want status = 'open':
CREATE INDEX idx_open_customer ON orders (customer_id, created_at DESC)
WHERE status = 'open';Mention write amplification—only add indexes queries prove they need.
A strong answer is:
Composite index on (customer_id, status, created_at DESC) matches filter and ORDER BY—a partial index on open orders if that is the hot path; I verify with EXPLAIN, not guessing.
SQL vs NoSQL, distributed DB, and final prep
SQL vs NoSQL — when which?
| Factor | SQL (RDBMS) | NoSQL |
|---|---|---|
| Schema | Fixed, enforced | Flexible, evolving |
| Relationships | Joins, FKs | Embed or application joins |
| Transactions | Strong ACID (default) | Varies (eventual common) |
| Scale-out | Vertical + read replicas; sharding harder | Partitioning built-in |
| Query | Ad hoc SQL | Model-specific APIs |
Choose SQL when: complex relationships, strong consistency, reporting joins, financial correctness.
Choose NoSQL when: flexible schema, massive horizontal scale, simple access patterns (key lookup), document shape matches app (see MongoDB interview questions).
Many systems use both—Postgres for source of truth, Redis for cache, Elasticsearch for search.
A strong answer is:
SQL for relational integrity and ad hoc joins; NoSQL for flexible schema and horizontal scale—I pick by access pattern and consistency needs, not religion.
CAP theorem — practical meaning?
CAP: in a network partition, you choose between Consistency and Availability (Partition tolerance is mandatory in distributed systems).
| Choice | Behavior during partition |
|---|---|
| CP | Refuse writes/reads to stay consistent (some nodes unavailable) |
| AP | Accept requests; replicas may diverge (eventual consistency) |
Not "pick two of three" in normal operation—only during partition.
Examples:
- Traditional RDBMS primary: often CP-ish for writes
- Cassandra: tunable; AP leaning
- Redis Cluster: configuration-dependent
PACELC extension: else (no partition), choose Latency vs Consistency.
A strong answer is:
During a partition you trade consistency for availability or vice versa—I explain CP vs AP with a concrete system, not as a buzzword triangle.
Replication and backup types?
Replication — copies data for read scale or HA:
| Type | Mechanism | Use |
|---|---|---|
| Primary-replica | Async/sync copy | Read scaling, failover |
| Multi-primary | Bidirectional (conflict risk) | Geo write |
| Logical | Row/change events | CDC, partial sync |
Backup:
| Type | Contents | Recovery |
|---|---|---|
| Full | Entire database | Baseline restore |
| Incremental | Changes since last backup | Faster backup; chain restore |
| PITR | WAL/archive logs | Restore to timestamp |
RPO (how much data loss) and RTO (downtime) drive strategy. Test restore drills—backups untested are wishes.
A strong answer is:
Primary-replica for HA and read scale; full plus WAL for PITR—I define RPO/RTO and insist on tested restore, not just backup jobs running.
Views, stored procedures, and triggers?
| Object | Purpose | Caveat |
|---|---|---|
| View | Saved query; virtual table | Updatable views have restrictions |
| Stored procedure | Logic in DB | Vendor-specific; harder to test in CI |
| Trigger | Auto-run on INSERT/UPDATE/DELETE | Hidden side effects; debug pain |
Views simplify security (GRANT on view not base table) and encapsulate joins.
Triggers for audit logs work; avoid heavy business logic that belongs in application services.
Materialized view stores results—refresh on schedule for dashboards.
A strong answer is:
Views encapsulate queries and permissions; procedures bundle DB-side logic; triggers for audit—I avoid trigger spaghetti that duplicates app business rules.
Final DBMS interview checklist?
- DBMS vs RDBMS crisp definition
- Draw ER diagram with cardinality
- 1NF → 3NF with anomalies explained
- Keys: PK, FK, candidate, surrogate
- ACID with transfer example
- Isolation levels anomaly table
- Deadlock prevention + retry
- B-tree index rationale
- SQL vs NoSQL trade-offs
- CAP during partition
- One normalization scenario on whiteboard
- SQL technical interviews for live coding
- PostgreSQL if engine-specific loop
- MongoDB for document contrast
A strong answer is:
I rehearse ER + normalization weekly, explain ACID on a transfer, and pair theory here with SQL practice from the SQL guide—not campus notes without queries.
Pattern cheat sheet (quick reference)
| Task | DBMS approach |
|---|---|
| Unique row identity | Primary key; surrogate if natural key unstable |
| Parent-child integrity | Foreign key + deliberate ON DELETE action |
| Remove redundancy | Normalize to 3NF; denormalize only with sync plan |
| Money movement | Transaction + row locks + balance check |
| Hot filtered query | Composite or partial index matching WHERE/ORDER BY |
| Concurrent reads | MVCC (Postgres) or isolation level choice |
| Deadlock | Lock ordering + short txs + app retry |
| Flexible schema | Document store or JSONB column (engine-dependent) |
| Analytics | Star schema / materialized views |
| HA | Replication + tested backup restore |
References
DBMS interview prep
On-site prep
- SQL technical interview questions
- PostgreSQL interview questions
- MongoDB interview questions
- Data science interview questions
- Python developer interview questions
- Full stack developer interviews
- Computer networks interview questions
- Interview Questions category
Summary
DBMS interviews test ER modeling, normal forms, ACID, and indexing trade-offs with concrete examples—not definition memorization alone. Normalize a messy table on a whiteboard, walk through a safe transfer transaction, and defend SQL vs NoSQL for a real product. Pair theory here with SQL technical interviews, PostgreSQL interview questions, and MongoDB interview questions when the loop goes deeper.

