Table of Contents
Overview of SQL Delete duplicate rows
Technically in SQL to prevent the insertion of duplicate values in a column of a database table we need to apply the SQL constraints on a column like SQL UNIQUE constraint to enforce the uniqueness of records, However, sometimes you may find duplicate values in a table due to the poor database design, application bugs, or undefined data from external sources
One or more rows that have identical data values are considered to be duplicate rows
How to find duplicate rows in SQL?
We first need to define criteria to find duplicate rows. Is it a combination of two or more columns or is it simply searching for duplicates within a single column?
Second, writing a query to search duplicated values in SQL comprises two key steps:
- Using the GROUP BY clause to group all rows by the target column(s) – i.e., the column(s) you want to check for duplicate values on.
- Using the COUNT function in the HAVING clause to check if any of the groups have more than 1 entry; those would be the duplicate values.
Methods to delete duplicate rows from a table in SQL
- SQL delete duplicate rows using GROUP BY and HAVING clause
- SQL delete duplicate rows using ROW_NUMBER () function
- SQL delete duplicate rows using RANK () function
Examples of SQL delete duplicate rows
Consider a student result management database with three tables’ tblstudent, tblresult, and tblsubject for practical examples
Student Table
student_id | studentname | admissionno | admissiondate | enrollmentno | date_of_birth | city | class_id | |
---|---|---|---|---|---|---|---|---|
101 | reema | 10001 | 02-02-2000 | e15200002 | 02-02-1990 | reema@gmail.com | surat | 2 |
102 | kriya | 10002 | 04-05-2001 | e16200003 | 04-08-1991 | kriya@gmail.com | surat | 1 |
103 | meena | 10003 | 06-05-1999 | e15200004 | 02-09-1989 | meena@gmail.com | vadodara | 3 |
104 | carlin | 2001 | 04-01-1998 | e14200001 | 04-04-1989 | carli@gmail.com | vapi | 1 |
105 | dhiren | 2002 | 02-02-1997 | e13400002 | 02-02-1987 | dhiru@gmail.com | vapi | 2 |
106 | hiren | 2003 | 01-01-1997 | e13400001 | 03-03-1887 | hiren@gmail.com | surat | 2 |
107 | mahir | 10004 | 06-09-2000 | e15200003 | 07-09-1990 | mahi@gmail.com | vapi | 3 |
108 | nishi | 2004 | 02-04-2001 | e16200001 | 03-02-1991 | nishi@gmail.com | vadodara | 1 |
Result Table
result_id | student_id | examname | examdate | subject | obtainmark | totalmarks | percentage | grade | status |
---|---|---|---|---|---|---|---|---|---|
3001 | 101 | sem1 | 07-08-2001 | 1 | 80 | 100 | 80 | A+ | pass |
3002 | 101 | sem1 | 08-08-2001 | 2 | 76 | 100 | 76 | A+ | pass |
3003 | 102 | sem3 | 05-05-2000 | 3 | 67 | 100 | 67 | A | pass |
3004 | 102 | sem3 | 06-05-2000 | 4 | 89 | 100 | 89 | A+ | pass |
3005 | 102 | sem3 | 07-05-2000 | 5 | 90 | 100 | 90 | A+ | pass |
3006 | 103 | sem5 | 08-09-1998 | 6 | 55 | 100 | 55 | B | pass |
3007 | 103 | sem5 | 09-09-1998 | 7 | 30 | 100 | 30 | D | fail |
3008 | 103 | sem5 | 10-09-1998 | 8 | 34 | 100 | 34 | D | fail |
Subject Table
subjectid | facultyname | subjectname | subjectcode |
---|---|---|---|
1 | krishna | c | 1003 |
2 | rahul | cpp | 1004 |
3 | radha | asp | 1005 |
4 | meera | sql | 1006 |
5 | yasoda | cloud | 1007 |
6 | nadan | cg | 1008 |
7 | nadan | asp | 1005 |
8 | yasoda | cpp | 1004 |
SQL delete duplicate rows based on one column values
Example 1: Write SQL query to delete duplicate values from subject name column, one subject can be taken by one faculty only
DELETE FROM tblsubject
WHERE (subjectname IN
(SELECT subjectname FROM tblsubject AS tblsubject_1 GROUP BY subjectname HAVING (COUNT(subjectname) > 1)))
- In the above query, SQL delete statement is used with inner sub-query to retrieve and remove duplicate records from a subject table based on the subject name
- In the sub-query, we have used SQL GROUP BY to make a group of the same subject name AND COUNT() function to count each subject name group records with having a condition to count value is greater than 1
OUTPUT:
To view the result of the above-execute query, we need to use the SQL select statement to retrieve all records of the subject table
SQL delete duplicate rows based on multiple column values
Example 2: Write SQL query to delete duplicate rows having duplicate values in student_id and semester columns of the student result table
DELETE FROM student_result
WHERE (student_id IN (SELECT student_id FROM student_result AS student_result_2 GROUP BY student_id, semester
HAVING (COUNT(student_id) > 1))) AND (semester IN (SELECT semester FROM student_result AS student_result_1 GROUP BY student_id, semester
HAVING (COUNT(semester) > 1)))
- In the above query, SQL delete statement is used to delete rows with duplicate values in student_id and semester columns
- SQL sub-query is embedded within the where clause to retrieve duplicate rows of both columns student_id and semester
OUTPUT:
To view the result of the above-execute query, we need to use the SQL select statement to retrieve all records of the student result table
SQL delete duplicate rows using ROW_NUMBER() function
SQL ROW_number() function assigns a unique sequential row number to each row, using ROW_number() function with common table expression (CTE) will assign a count number to each duplicate row, so the row with a count number greater than one is identified as duplicate row
Example 3: Write SQL query to delete duplicate rows having duplicate values in subjectname and subjectcode columns of subject table
WITH cte AS (select subjectid,subjectname,subjectcode, ROW_NUMBER() OVER (PARTITION BY subjectname,subjectcode
ORDER BY subjectname,subjectcode) row_num FROM tblsubject )
DELETE FROM cte WHERE row_num > 1;
- In the above query, SQL common table expression is used with ROW_number() function to find duplicate rows specified by values in the subjectname and subjectcode columns.
- Then, the SQL delete statement is used to delete all the duplicate rows but keeps one occurrence of each duplicate group of rows
OUTPUT:
To view the result of the above-execute query, we need to use the SQL select statement to retrieve all records of the subject table
SQL delete duplicate rows using RANK() function
SQL RANK() function returns the rank, or row number, of each row within the partitioned grouping of a result set. Using the SQL RANK() function, we can create sets of duplicate records and assigns a number to each row in the duplicate set
Example 4: Write SQL query to delete duplicate rows having duplicate values in subjectname and subjectcode columns of subject table
DELETE s FROM tblsubject s
INNER JOIN (SELECT *, RANK() OVER(PARTITION BY [subjectname], [subjectcode] ORDER BY [subjectcode]) AS [rank] FROM tblsubject) t ON t.[subjectid] = s.[subjectid]
WHERE [rank] > 1;
- In the above query, the SQL rank() function is used to assign a rank to all records retrieve by SQL select sub-query
- If the rank value is greater than 1 it indicates duplicate records, so to delete the records having a rank value greater than 1 SQL delete statement is used with where the condition
OUTPUT:
To view the result of the above-execute query, we need to use the SQL select statement to retrieve all records of the subject table
Summary
In this article on SQL Delete duplicate rows, we have covered an overview of SQL delete duplicate rows, how to find duplicate rows in SQL, list out various methods to delete duplicate rows in SQL, also explained practical examples to remove duplicate rows like SQL delete duplicate rows using GROUP BY and HAVING clause based on one column values and multiple column values, SQL delete duplicate rows using ROW_NUMBER() function and SQL delete duplicate rows using RANK() function.
References
SQL RANK() Function
SQL JOINS
SQL GROUP BY
Read More
Remove duplicate rows in SQL Server
Related Keywords: sql to remove duplicates, sql remove duplicates, deleting duplicate rows in sql, remove duplicates sql, delete duplicate rows in sql, remove duplicate rows in sql, sql remove duplicate rows, sql delete duplicate rows keep one, sql delete duplicate rows based on multiple columns, mysql delete duplicate rows, delete duplicate rows in oracle, delete duplicate records in sql server using row number, how to delete duplicate rows in sql using rowid