DBMS Interview Questions and Answers

DBMS interview questions, basic dbms interview questions, and common dbms interview questions for 2026: ER model, normalization, ACID, transactions, indexing, SQL vs NoSQL, and scenarios with elaborate answers.

Published

Updated

Tech reviewed byDeepak Prasad

DBMS Interview Questions and Answers

DBMS interview questions, basic dbms interview questions, and common dbms interview questions show up in campus placements, backend screening rounds, and data roles before vendors like PostgreSQL or Oracle dominate the conversation. Interviewers expect you to explain DBMS vs RDBMS, walk through ER modeling and keys, justify normal forms, describe ACID and isolation levels, compare index types, and reason about SQL vs NoSQL trade-offs—not only write SELECT statements.

Below are 45 questions with elaborate answers; technical sections include a strong answer sample you can say aloud. Pair with SQL technical interview questions for live query writing, PostgreSQL interview questions for engine-specific depth, MongoDB interview questions for document-store contrast, data science interview questions for analytics pipelines, Python developer interview questions for ORM context, and full stack developer interviews when system design ties app tiers to persistence.

NOTE
Prep target: Normalize a messy denormalized table on a whiteboard, explain ACID with a transfer example, and defend when you would break 3NF for performance—out loud, as if in a mid-level loop.

Tested on: Ubuntu 25.04 (Plucky Puffin); kernel 6.14.0-37-generic; PostgreSQL 17.7 for transaction and constraint examples ({run=false} SQL).


Interview context and how to prepare

What do DBMS interviews actually test?

DBMS interviews test foundational database theory that every application engineer should know—even if daily work hides it behind an ORM.

Layer What interviewers probe
Concepts DBMS vs RDBMS, schema vs instance, three-schema architecture
Modeling ER diagrams, keys, mapping to tables
Design Normalization (1NF–BCNF), when to denormalize
Transactions ACID, isolation levels, deadlocks
Physical layer Indexes, query plans at a high level
Architecture Replication, backup types, SQL vs NoSQL
Role Emphasis
Campus / junior ER model, normal forms, SQL basics
Backend Transactions, indexing rationale, ORM pitfalls
Data / analytics Star schema, denormalization, ETL idempotency
DBA track Concurrency, recovery, backup/restore

Campus and experienced loops both expect you to explain normalization, keys, and ACID with examples—not only memorize definitions.

DBMS vs RDBMS — what is the difference?

A DBMS (Database Management System) is software that lets users define, store, retrieve, and manage data with controlled access. The term is broad—it can include hierarchical, network, object-oriented, or document systems.

An RDBMS (Relational DBMS) is a DBMS that stores data in tables (relations) with rows and columns, enforces relational integrity via keys and constraints, and uses SQL (or SQL-like languages) for access.

Aspect DBMS (general) RDBMS
Data model Various (files, graphs, documents) Tables / relations
Relationships Model-dependent Foreign keys, joins
Query language Varies SQL standard (mostly)
Examples MongoDB, Redis, Neo4j PostgreSQL, MySQL, Oracle

Every RDBMS is a DBMS; not every DBMS is relational. In interviews, say RDBMS when discussing joins, normalization, and ACID on tabular schemas; say DBMS when comparing storage paradigms broadly.

DBMS theory for developers vs DBAs?
Topic Developer loop DBA loop
Schema ER → tables, migrations Capacity, partitioning
Normalization Avoid update anomalies in app schema Rarely redesigns live OLTP casually
Transactions Boundaries in code, retry on deadlock Lock monitoring, long transactions
Indexes "We need one for this WHERE clause" Index bloat, maintenance windows
Backup Assumes restore works RPO/RTO, PITR, drill restores
HA Connection strings, read replicas Failover, replication lag

Basic dbms interview questions skew toward theory (normal forms, keys). Senior backend loops add isolation anomalies and index trade-offs. DBA tracks go deeper on recovery and replication—see PostgreSQL interview questions for engine-specific depth.

What is a typical DBMS interview loop?
Round Duration Focus
Aptitude / screening 30 min SQL basics, DBMS definitions
Theory 45 min ER, normalization, ACID
SQL coding 45–60 min Joins, aggregates—see SQL prep
System design (senior) 45 min SQL vs NoSQL, caching, sharding
Scenario 30 min "Design a library / e-commerce schema"

Campus rounds often ask draw ER diagram for university registration; product companies add transaction isolation and index choice for backend roles.

What is a realistic 3–5 week DBMS prep plan?
Week Focus Output
1 DBMS vs RDBMS, ER model, keys Draw ER for library or hospital system
2 1NF–BCNF with examples Normalize a denormalized order table
3 ACID, isolation, deadlocks Explain transfer + phantom read
4 Indexes, query processing basics B-tree vs hash; when index hurts INSERT
5 SQL vs NoSQL, CAP, mock scenarios Defend Postgres vs Mongo for one use case

Practice 10 SQL problems weekly from SQL technical interview questions. Run transactions locally on PostgreSQL to see COMMIT / ROLLBACK behavior.


Database fundamentals

Data vs information?

Data are raw facts—numbers, strings, timestamps—without context. Information is data processed into a meaningful form for decisions.

Example Data Information
Retail 42, SKU-9, 2026-06-27 "42 units of SKU-9 sold today—reorder threshold breached"
Logs HTTP 500 at 14:02 "Error rate spiked 3× after deploy at 14:00"

A DBMS stores data; applications and reports turn it into information via constraints, relationships, and queries. Metadata (schema, statistics) describes the data itself.

A strong answer is:

Data are raw values; information is data with context and meaning—a DBMS stores data; queries and business rules produce information for decisions.

Why use a database instead of files?

Flat files work for simple scripts; production systems need structured concurrent access.

Concern Files DBMS
Concurrency Race conditions, manual locking Transaction manager, locks
Integrity App must enforce rules Constraints, foreign keys
Query Parse entire file Indexed lookup, optimizer
Security OS file permissions Roles, grants, RLS (engine-dependent)
Recovery Manual backups WAL, point-in-time recovery
Schema evolution Breaking changes fragile Migrations, versioning

Files remain fine for logs, CSV exports, and static assets; OLTP and shared mutable state belong in a DBMS.

A strong answer is:

Files lack concurrent safe updates, declarative integrity, and efficient indexed queries—a DBMS gives transactions, constraints, and recovery that files force every app to reinvent.

Schema vs instance vs database?
Term Meaning
Database Organized collection of related data managed by DBMS
Schema Logical structure—tables, columns, constraints, views (the design)
Instance Actual data stored at a moment in time (the content)

Analogy: schema is the blueprint of a building; instance is the furniture and occupants inside.

One DBMS server can host multiple databases (e.g. postgres, myapp_prod). Each database has its own schema namespace. Catalog / system tables store metadata about schemas.

A strong answer is:

Schema is the logical design—tables and constraints; instance is the live data at a point in time; I do not confuse CREATE TABLE (schema) with INSERT (instance).

Three-schema architecture?

ANSI/SPARC three-level architecture separates concerns so storage changes do not break applications.

Level Name What it describes Audience
External View level User-specific views App developers, analysts
Conceptual Logical level Entities, relationships, constraints Data modelers
Internal Physical level Files, indexes, allocation DBAs, engine

Mapping layers translate between levels. Logical data independence: change conceptual schema with minimal external view changes. Physical data independence: move indexes or storage without rewriting apps.

A strong answer is:

Three-schema architecture separates user views, logical ER model, and physical storage—so I can add an index or split a table without rewriting every application query.

Major DBMS components?
Component Role
DDL processor Parses CREATE/ALTER/DROP; updates catalog
DML processor Runs INSERT/UPDATE/DELETE/SELECT
Query processor Parser → optimizer → execution plan
Storage manager Pages, buffers, disk I/O
Transaction manager ACID, commit/abort, isolation
Recovery manager WAL, checkpoint, crash recovery
Concurrency control Locks, timestamps, MVCC (engine-specific)

Understanding the pipeline helps in interviews: "Why is this query slow?" → parse OK, bad plan or missing index or lock wait.

A strong answer is:

I name the query processor, storage manager, transaction manager, and recovery manager—then tie slow queries to optimizer plans and lock waits, not mysticism.

Common data models in DBMS?
Model Structure Example systems
Relational Tables, keys PostgreSQL, MySQL
Document JSON/BSON documents MongoDB
Key-value Opaque key → value Redis, DynamoDB
Column-family Wide columns per row key Cassandra, HBase
Graph Nodes and edges Neo4j
Hierarchical / Network Legacy models IMS, IDMS (historical)

Relational dominates OLTP with strong consistency needs. Document fits flexible schemas; graph fits relationship-heavy traversals. Choice depends on access patterns, not hype.

A strong answer is:

Relational for tabular integrity and joins; document for flexible nested records; graph for deep relationship traversals—I pick by query pattern, not buzzwords.

DDL, DML, DCL, and TCL?

SQL command categories:

Category Purpose Examples
DDL (Data Definition) Schema structure CREATE, ALTER, DROP, TRUNCATE
DML (Data Manipulation) Data changes INSERT, UPDATE, DELETE, SELECT
DCL (Data Control) Permissions GRANT, REVOKE
TCL (Transaction Control) Transaction boundaries BEGIN, COMMIT, ROLLBACK, SAVEPOINT

TRUNCATE is DDL-like—fast, often cannot roll back, resets identity. DELETE is DML—row-level, logged, transactional.

sql
BEGIN;
INSERT INTO accounts (id, balance) VALUES (1, 100);
ROLLBACK;  -- row not visible after rollback

A strong answer is:

DDL defines schema; DML changes rows; DCL grants access; TCL wraps units of work—I use COMMIT/ROLLBACK so partial updates never leave inconsistent balances.


ER model and relational mapping

What is the ER model?

The Entity-Relationship (ER) model is a conceptual diagram notation for database design before tables exist.

Element Meaning
Entity Object type (Student, Course, Order)
Attribute Property (name, date, amount)
Relationship Association between entities (enrolls, contains)
Cardinality 1:1, 1:N, M:N

Weak entity depends on another entity for identity (OrderLine depends on Order). Derived attribute computed from others (age from birthdate)—often not stored.

ER diagrams communicate with stakeholders before migration scripts. Mapping rules convert ER → relational tables.

A strong answer is:

ER model is conceptual—entities, attributes, relationships, cardinality—I draw it first with stakeholders, then map to tables with keys before writing DDL.

Cardinality: 1:1, 1:N, M:N?
Cardinality Meaning Relational mapping
1:1 One A ↔ one B Foreign key on either side (or merge if always together)
1:N One A → many B FK on the many side
M:N Many A ↔ many B Junction table with two FKs

Example M:N: Students enroll in Courses → enrollments(student_id, course_id, grade).

Participation: total (every entity must participate) vs partial. Optional relationship → nullable FK.

A strong answer is:

1:N puts the foreign key on the many side; M:N needs a junction table with both keys—I state cardinality before writing CREATE TABLE.

How do you map ER to relational tables?
ER construct Relational table
Strong entity Table; PK on identifier
Weak entity Table; PK includes owner FK
1:N relationship FK on N-side
M:N relationship Junction table + composite PK
Multivalued attribute Separate table (entity_id, value)
1:1 relationship FK on one side; unique constraint

Example — university:

  • students(id, name)
  • courses(id, title)
  • enrollments(student_id, course_id, semester) PK (student_id, course_id)

Avoid NULL-heavy designs when relationship attributes exist—junction table carries enrollment date, grade, etc.

A strong answer is:

Each entity becomes a table; 1:N is a foreign key on the many side; M:N is a junction table—I move relationship attributes onto the table that represents the association.

Entity set vs relationship set?

An entity set is a collection of entities of the same type—all Employee rows share attributes like emp_id, name.

A relationship set is a collection of relationship instances linking entities—Works_In(emp, dept) pairs employees with departments.

In relational terms:

  • Entity set → table
  • Relationship set → FK columns or junction table

Relationship attributes (e.g. since_date on Works_In) belong on the relationship representation—not duplicated on both entity tables.

A strong answer is:

Entity set is a type of object; relationship set is associations between them—in SQL, relationships become foreign keys or a bridge table with their own attributes.

Weak entity and identifying relationship?

A weak entity cannot be uniquely identified by its own attributes alone—it depends on a owner entity.

Example: InvoiceLine identified by (invoice_id, line_no)line_no is unique only per invoice.

Concept Role
Owner entity Strong entity (Invoice)
Identifying relationship Links weak to owner
Partial key Discriminator within owner scope (line_no)

DDL: composite primary key (invoice_id, line_no) with invoice_id FK → invoices(id) ON DELETE CASCADE often appropriate.

A strong answer is:

Weak entities need the owner's key in their primary key—InvoiceLine uses (invoice_id, line_no) because line numbers repeat across invoices.


Keys, constraints, and integrity

Super key, candidate key, primary key, alternate key?
Key type Definition
Super key Any set of attributes that uniquely identifies a row
Candidate key Minimal super key—no proper subset is also a super key
Primary key Chosen candidate key for the table
Alternate key Candidate key not selected as PK (enforced UNIQUE)
Foreign key References PK (or unique) in another table

Example employees:

  • Super keys: {emp_id}, {emp_id, name}, {email} (if unique)
  • Candidate keys: {emp_id}, {email}
  • Primary key: emp_id
  • Alternate key: email UNIQUE

A strong answer is:

Candidate keys are minimal unique identifiers; primary key is the main one; alternate keys stay unique—foreign keys reference parent primary or unique keys.

Composite key vs surrogate key?

Composite key: primary key spans multiple columns—natural for junction tables (order_id, product_id).

Surrogate key: artificial identifier (SERIAL, UUID) with no business meaning.

Approach Pros Cons
Natural / composite Matches business rules Wide FKs; key changes are painful
Surrogate Stable, narrow FKs Extra column; not business-meaningful

Use surrogate when natural keys are long, composite, or unstable (username changes). Keep natural unique constraints where business requires (email UNIQUE).

A strong answer is:

Composite keys fit junction tables; surrogate keys simplify joins when natural keys are wide or unstable—I still enforce business uniqueness with alternate constraints.

Foreign key constraints and referential actions?

A foreign key enforces referential integrity: child values must exist in parent (or be NULL if allowed).

sql
CREATE TABLE orders (
  id         int PRIMARY KEY,
  customer_id int NOT NULL REFERENCES customers(id)
    ON DELETE RESTRICT
    ON UPDATE CASCADE
);
Action ON DELETE behavior
RESTRICT / NO ACTION Block delete if children exist
CASCADE Delete children too
SET NULL Null out FK in children
SET DEFAULT Set default value

RESTRICT protects accidental orphan deletes in OLTP. CASCADE suits strict ownership (line items die with order).

A strong answer is:

Foreign keys keep child rows honest—I default to ON DELETE RESTRICT for safety and CASCADE only when child rows have no meaning without the parent.

Types of integrity constraints?
Type Enforces
Entity integrity Primary key unique and NOT NULL
Referential integrity FK values match parent or NULL
Domain integrity Valid types, CHECK, ENUM
User-defined / business Custom rules (triggers, CHECK)
sql
CREATE TABLE products (
  id    int PRIMARY KEY,
  price numeric CHECK (price >= 0),
  sku   text UNIQUE NOT NULL
);

ORMs map these to validators—but database constraints are the last line of defense against buggy deploys.

A strong answer is:

Entity, referential, and domain integrity—PKs, foreign keys, and CHECK constraints—I put non-negotiable rules in the database, not only in application code.

NULL in relational databases?

NULL means unknown or not applicable—not zero, not empty string.

Expression Result
NULL = NULL Unknown (not TRUE)
NULL AND TRUE NULL
COUNT(*) Counts all rows including NULLs
COUNT(column) Ignores NULLs

Use IS NULL / IS NOT NULL. Three-valued logic (TRUE, FALSE, UNKNOWN) affects WHERE filters—rows with UNKNOWN do not match.

Design tip: use NULL for optional fields; avoid NULL in PKs; consider NOT NULL DEFAULT when a sentinel is clearer.

A strong answer is:

NULL is unknown, not zero—comparisons use IS NULL; I avoid NULL primary keys and explain three-valued logic when debugging surprising WHERE results.


Normalization

Why normalize?

Normalization decomposes tables to reduce redundancy and update anomalies.

Anomaly Problem
Insert Cannot add data without unrelated data
Update Must change many rows for one fact
Delete Removing one fact loses unrelated data

Example denormalized order_lines(customer_name, customer_city, product, qty):

  • Update customer city in every row they ordered
  • Delete last order line loses customer address

Normalization splits into customers, products, orders, order_lines.

Trade-off: more joins at read time—sometimes denormalize deliberately for reporting (see Q28).

A strong answer is:

Normalization removes redundancy so updates happen in one place—I explain insert/update/delete anomalies on a messy table, then decompose it.

First normal form (1NF)?

A table is in 1NF when:

  1. Each column holds atomic (indivisible) values
  2. Each row is unique (has a key)
  3. No repeating groups in one column

Violations:

  • phone_numbers: "555-1, 555-2" → split to customer_phones table
  • Multiple course1, course2 columns → row per enrollment

1NF is the baseline—almost all RDBMS tables satisfy it by design.

A strong answer is:

1NF means atomic values and no repeating groups—I split multi-valued columns into separate rows or tables instead of comma-separated lists.

Second normal form (2NF)?

A table is in 2NF when:

  1. It is in 1NF
  2. Every non-key attribute depends on the whole primary key—not a proper subset (no partial dependency)

Applies when PK is composite. Example violation:

enrollments(student_id, course_id, student_name, course_title, grade)

  • student_name depends only on student_id (partial)
  • course_title depends only on course_id (partial)

Fix: move student_namestudents; course_titlecourses; keep grade on enrollments.

A strong answer is:

2NF removes partial dependencies on composite keys—I move attributes that depend on only part of the key into their own tables.

Third normal form (3NF)?

A table is in 3NF when:

  1. It is in 2NF
  2. No non-key attribute depends on another non-key attribute (transitive dependency)

Example violation:

employees(emp_id, dept_id, dept_name, dept_location)

  • dept_name and dept_location depend on dept_id, not directly on emp_id

Fix: departments(dept_id, dept_name, dept_location) + employees(emp_id, dept_id, ...).

Most OLTP schemas target 3NF unless reporting needs dictate otherwise.

A strong answer is:

3NF eliminates transitive dependencies—I pull attributes that depend on non-key columns into separate tables like departments separate from employees.

BCNF vs 3NF?

BCNF (Boyce-Codd Normal Form) strengthens 3NF:

Every determinant (attribute that determines others) must be a candidate key.

Form Rule
3NF Non-key attrs depend only on keys; exceptions for overlapping keys
BCNF Stricter—every functional dependency left side is a super key

3NF but not BCNF example (classic):

student_advisor(student, subject, advisor) where each student has one advisor per subject, and each advisor teaches one subject → decompose further.

In practice 3NF ≈ BCNF for most business schemas. BCNF matters in academic questions and odd dependency patterns.

A strong answer is:

BCNF says every determinant is a candidate key—stricter than 3NF; I know the classic student-advisor example but most production schemas stop at 3NF.

When is denormalization justified?

Denormalization intentionally adds redundancy for read performance or simpler queries.

Use case Technique
Reporting / OLAP Star schema, pre-aggregated facts
Read-heavy dashboards Cached counters (order_count on customer)
Reduce join cost Duplicate display names on fact tables
Document stores Embed related records (MongoDB)

Costs: update anomalies return—you must sync denormalized copies (triggers, batch jobs, CDC).

Rule: normalize first, measure slow queries, denormalize with a documented consistency strategy.

A strong answer is:

I normalize OLTP to 3NF first, then denormalize only with metrics—star schemas for analytics or cached counts with a clear update path, not upfront duplication.

Functional dependency (FD)?

A functional dependency X → Y means: if two rows agree on X, they must agree on Y.

Notation: student_id → student_name

FD type Example
Trivial (emp_id, name) → emp_id
Non-trivial emp_id → name
Full Depends on whole composite key (2NF)
Partial Depends on subset of composite key (2NF violation)
Transitive A → B → C where B not a key (3NF violation)

Normalization algorithms decompose tables using FDs. Interviewers may ask you to list FDs from a table definition.

A strong answer is:

Functional dependencies describe what determines what—I list them from a schema, then use them to justify 2NF and 3NF decomposition steps.

Scenario: Normalize this order table?

Given (denormalized):

orders(order_id, customer_name, customer_email, product_name, qty, unit_price)

Problems:

  • Customer data repeats per line → update anomaly on email change
  • Product name repeats → rename product touches many rows
  • Cannot list products without an order (insert anomaly)

3NF decomposition:

text
customers(customer_id PK, name, email)
products(product_id PK, name, unit_price)
orders(order_id PK, customer_id FK, order_date)
order_lines(order_id FK, product_id FK, qty, line_total)

line_total may be derived (qty * unit_price) or stored for historical pricing.

A strong answer is:

I split customers, products, orders, and order_lines—removing partial and transitive dependencies so customer email updates happen once in customers.


Transactions, ACID, and concurrency

Explain ACID properties?

ACID guarantees reliable transactions:

Property Meaning Example
Atomicity All or nothing Transfer debit + credit both commit or both roll back
Consistency Valid state → valid state Constraints hold after transaction
Isolation Concurrent txs appear serial No dirty reads (at proper level)
Durability Committed data survives crash WAL flushed to disk
sql
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

If second UPDATE fails, ROLLBACK undoes the debit—atomicity. Consistency is partly app + constraint responsibility; engine provides mechanisms.

A strong answer is:

ACID: transfer example for atomicity, constraints for consistency, isolation levels for concurrent reads, WAL for durability—I tie each letter to a concrete failure mode.

Transaction states?
State Description
Active Executing statements
Partially committed Final statement done; commit not yet durable
Committed Changes permanent
Failed Normal execution cannot proceed
Aborted Rolled back; DB unchanged by this tx
Terminated Left the system

BEGIN starts active state. COMMIT → committed → terminated. Error or ROLLBACK → aborted → terminated.

Savepoints allow partial rollback inside a transaction: SAVEPOINT sp1; ... ROLLBACK TO sp1;

A strong answer is:

Active until COMMIT or ROLLBACK—failed transactions abort; I mention savepoints when a long transaction needs partial undo without full rollback.

Isolation levels and anomalies?

SQL standard isolation levels (lowest to highest):

Level Dirty read Non-repeatable read Phantom read
READ UNCOMMITTED Possible Possible Possible
READ COMMITTED No Possible Possible
REPEATABLE READ No No Possible*
SERIALIZABLE No No No

*PostgreSQL REPEATABLE READ also prevents phantoms via MVCC snapshot; MySQL InnoDB behavior varies—know your engine.

Anomaly What happens
Dirty read Read uncommitted data from another tx
Non-repeatable read Same row read twice, different values
Phantom read Same query returns different row sets

Default on PostgreSQL is READ COMMITTED—good balance for most OLTP.

A strong answer is:

I compare isolation levels with the dirty/non-repeatable/phantom table—READ COMMITTED for most apps, SERIALIZABLE when financial correctness demands it despite retries.

Concurrency control techniques?
Technique Idea
Lock-based Shared (read) vs exclusive (write) locks
Two-phase locking (2PL) Grow locks then shrink—guarantees serializability
MVCC Readers see snapshot; writers create new versions (PostgreSQL)
Optimistic Read freely; validate on commit (version column)
Pessimistic SELECT FOR UPDATE locks early

Lost update without control: two txs read balance 100, both write 90—should be 80.

Fix: row-level lock, atomic UPDATE balance = balance - 10, or optimistic retry.

A strong answer is:

Lock-based 2PL, MVCC snapshots, or optimistic versioning—I pick pessimistic locks for contested balances and MVCC for read-heavy Postgres workloads.

What is deadlock and how is it handled?

Deadlock: cycle of transactions each waiting for a lock held by another.

text
Tx A: lock row 1 → wait row 2
Tx B: lock row 2 → wait row 1

Detection: DBMS builds wait-for graph; cycle → abort victim (youngest tx, least work).

Prevention:

  • Lock rows in consistent order (always id ascending)
  • Keep transactions short
  • Use lower isolation when safe
  • Retry on deadlock error (40001 in PostgreSQL)
sql
-- App pattern: retry up to N times on deadlock
BEGIN;
SELECT * FROM accounts WHERE id IN (1,2) ORDER BY id FOR UPDATE;
-- transfer logic
COMMIT;

A strong answer is:

Deadlock is a lock cycle—the DBMS kills a victim transaction; I prevent it by locking in consistent order, keeping txs short, and retrying on deadlock errors in app code.

Scenario: Safe money transfer transaction?

Requirements: debit and credit must be atomic; balance never negative; concurrent transfers safe.

sql
BEGIN;

SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;
SELECT balance FROM accounts WHERE id = 2 FOR UPDATE;

UPDATE accounts SET balance = balance - 100 WHERE id = 1 AND balance >= 100;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

-- verify row counts / balances; ROLLBACK if insufficient funds
COMMIT;

Steps to mention:

  1. BEGIN explicit transaction
  2. Lock both rows (FOR UPDATE) in deterministic order
  3. Check balance before debit
  4. COMMIT or ROLLBACK on any failure
  5. Idempotent client token for retries (avoid double transfer)

Isolation: READ COMMITTED + row locks usually enough; SERIALIZABLE for strict audit systems with retry logic.

A strong answer is:

BEGIN, lock both accounts in id order with FOR UPDATE, check balance, debit and credit in one transaction, COMMIT or ROLLBACK—I add idempotency keys for client retries.


Indexing and query processing

Index types in DBMS?
Index Use case
B-tree (default) Range queries, sorting, equality
Hash Equality only; no range
Bitmap Low-cardinality columns (warehouse)
Full-text Text search
Composite Multi-column WHERE / ORDER BY
Covering INCLUDE columns avoid table lookup
Partial Subset of rows (active = true)

Clustered (InnoDB PK): data rows stored in index order. Non-clustered: separate structure pointing to rows.

Indexes speed SELECT; slow INSERT/UPDATE/DELETE (maintain index).

A strong answer is:

B-tree for most OLTP equality and range; composite indexes match leftmost WHERE columns—I mention write overhead and avoid indexing every column.

Clustered vs non-clustered index?
Clustered Non-clustered
Data layout Table rows in index order Index separate from heap
Per table Usually one (PK) Many allowed
Lookup Range scan on PK is fast Extra hop to heap
Examples InnoDB PK, SQL Server clustered PostgreSQL (all indexes non-clustered)

PostgreSQL: table is heap; even PK index is non-clustered—CLUSTER command reorders physically (one-time).

Choose clustered key carefully—it is the physical order (often narrow, monotonic id).

A strong answer is:

Clustered index is the table order—one per table on SQL Server/InnoDB; PostgreSQL uses heap storage with separate indexes—I do not confuse logical PK with physical clustering.

Query processing pipeline?

Stages when you run SELECT:

  1. Parser — syntax check, build parse tree
  2. Validator — tables/columns exist; type check
  3. Query rewriter — views expanded, rule-based transforms
  4. Optimizer — cost-based plan (seq scan vs index scan, join order)
  5. Execution engine — runs plan, returns rows

Statistics (row counts, histograms) feed the optimizer—stale stats → bad plans.

Interview link: PostgreSQL EXPLAIN for engine-specific plan reading.

A strong answer is:

Parse, validate, optimize, execute—I say the optimizer picks plans from statistics, so stale stats cause slow queries even with indexes present.

Scenario: Which index for this query?

Query:

sql
SELECT * FROM orders
WHERE customer_id = 42 AND status = 'open'
ORDER BY created_at DESC
LIMIT 20;

Reasoning:

Option Verdict
Index on (customer_id) Good filter; may sort large set
Index on (customer_id, status, created_at DESC) Best—filter + order without sort
Index on (status) alone Weak—low selectivity if few statuses

Partial index if most queries want status = 'open':

sql
CREATE INDEX idx_open_customer ON orders (customer_id, created_at DESC)
WHERE status = 'open';

Mention write amplification—only add indexes queries prove they need.

A strong answer is:

Composite index on (customer_id, status, created_at DESC) matches filter and ORDER BY—a partial index on open orders if that is the hot path; I verify with EXPLAIN, not guessing.


SQL vs NoSQL, distributed DB, and final prep

SQL vs NoSQL — when which?
Factor SQL (RDBMS) NoSQL
Schema Fixed, enforced Flexible, evolving
Relationships Joins, FKs Embed or application joins
Transactions Strong ACID (default) Varies (eventual common)
Scale-out Vertical + read replicas; sharding harder Partitioning built-in
Query Ad hoc SQL Model-specific APIs

Choose SQL when: complex relationships, strong consistency, reporting joins, financial correctness.

Choose NoSQL when: flexible schema, massive horizontal scale, simple access patterns (key lookup), document shape matches app (see MongoDB interview questions).

Many systems use both—Postgres for source of truth, Redis for cache, Elasticsearch for search.

A strong answer is:

SQL for relational integrity and ad hoc joins; NoSQL for flexible schema and horizontal scale—I pick by access pattern and consistency needs, not religion.

CAP theorem — practical meaning?

CAP: in a network partition, you choose between Consistency and Availability (Partition tolerance is mandatory in distributed systems).

Choice Behavior during partition
CP Refuse writes/reads to stay consistent (some nodes unavailable)
AP Accept requests; replicas may diverge (eventual consistency)

Not "pick two of three" in normal operation—only during partition.

Examples:

  • Traditional RDBMS primary: often CP-ish for writes
  • Cassandra: tunable; AP leaning
  • Redis Cluster: configuration-dependent

PACELC extension: else (no partition), choose Latency vs Consistency.

A strong answer is:

During a partition you trade consistency for availability or vice versa—I explain CP vs AP with a concrete system, not as a buzzword triangle.

Replication and backup types?

Replication — copies data for read scale or HA:

Type Mechanism Use
Primary-replica Async/sync copy Read scaling, failover
Multi-primary Bidirectional (conflict risk) Geo write
Logical Row/change events CDC, partial sync

Backup:

Type Contents Recovery
Full Entire database Baseline restore
Incremental Changes since last backup Faster backup; chain restore
PITR WAL/archive logs Restore to timestamp

RPO (how much data loss) and RTO (downtime) drive strategy. Test restore drills—backups untested are wishes.

A strong answer is:

Primary-replica for HA and read scale; full plus WAL for PITR—I define RPO/RTO and insist on tested restore, not just backup jobs running.

Views, stored procedures, and triggers?
Object Purpose Caveat
View Saved query; virtual table Updatable views have restrictions
Stored procedure Logic in DB Vendor-specific; harder to test in CI
Trigger Auto-run on INSERT/UPDATE/DELETE Hidden side effects; debug pain

Views simplify security (GRANT on view not base table) and encapsulate joins.

Triggers for audit logs work; avoid heavy business logic that belongs in application services.

Materialized view stores results—refresh on schedule for dashboards.

A strong answer is:

Views encapsulate queries and permissions; procedures bundle DB-side logic; triggers for audit—I avoid trigger spaghetti that duplicates app business rules.

Final DBMS interview checklist?
  • DBMS vs RDBMS crisp definition
  • Draw ER diagram with cardinality
  • 1NF → 3NF with anomalies explained
  • Keys: PK, FK, candidate, surrogate
  • ACID with transfer example
  • Isolation levels anomaly table
  • Deadlock prevention + retry
  • B-tree index rationale
  • SQL vs NoSQL trade-offs
  • CAP during partition
  • One normalization scenario on whiteboard
  • SQL technical interviews for live coding
  • PostgreSQL if engine-specific loop
  • MongoDB for document contrast

A strong answer is:

I rehearse ER + normalization weekly, explain ACID on a transfer, and pair theory here with SQL practice from the SQL guide—not campus notes without queries.


Pattern cheat sheet (quick reference)

Task DBMS approach
Unique row identity Primary key; surrogate if natural key unstable
Parent-child integrity Foreign key + deliberate ON DELETE action
Remove redundancy Normalize to 3NF; denormalize only with sync plan
Money movement Transaction + row locks + balance check
Hot filtered query Composite or partial index matching WHERE/ORDER BY
Concurrent reads MVCC (Postgres) or isolation level choice
Deadlock Lock ordering + short txs + app retry
Flexible schema Document store or JSONB column (engine-dependent)
Analytics Star schema / materialized views
HA Replication + tested backup restore

References

DBMS interview prep

On-site prep


Summary

DBMS interviews test ER modeling, normal forms, ACID, and indexing trade-offs with concrete examples—not definition memorization alone. Normalize a messy table on a whiteboard, walk through a safe transfer transaction, and defend SQL vs NoSQL for a real product. Pair theory here with SQL technical interviews, PostgreSQL interview questions, and MongoDB interview questions when the loop goes deeper.

Deepak Prasad

R&D Engineer

Founder of GoLinuxCloud with more than 15 years of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive …