Overview of SQL duplicate rows
To prevent duplicate values from being inserted into a column of a database table, SQL constraints need to be applied. The constraint enforces the uniqueness of records, like the SQL UNIQUE constraint.
However, sometimes you may find duplicate values in a table due to the poor database design, application bugs, or undefined data from external sources
Duplicate records are those that contain identical data values. To determine duplicate records, we must first define your criteria.
Different methods to select duplicate records in SQL
We first need to define criteria to find duplicate rows. Is it a combination of two or more columns or is it simply searching for duplicates within a single column?
Second, writing a query to search duplicated values in SQL comprises two key steps:
- Using the GROUP BY clause to group all rows by the target column(s) that is the column(s) you want to check for duplicate values on.
- Using the COUNT function in the HAVING clause to check if any of the groups have more than 1 entry; those would be the duplicate values.
What is the use of GROUP BY clause on non-ID column(s)?
In SQL, the GROUP BY clause creates groups of records whose non-ID columns all have the same value. The columns you want to group by are put after the GROUP BY keyword. Because the ID column is our table's primary key, each row will have a different value under that column. We would not be able to detect duplicates if we included ID column.
Usage of SQL COUNT() Function
The count() function with the GROUP BY clause is used to count the data which were grouped on a particular column of the table. To select duplicate records the count() function return value must be greater than one.
Syntax of using GROUP BY clause, COUNT(), and HAVING condition
SELECT column1,column2,COUNT(column_name),...
FROM table_name
GROUP BY column1,column2,....
HAVING COUNT(column_name) > 1
Here ,
- column1, column2: Specified the name of columns to be retrieved
- COUNT(column_name) : aggregate function COUNT() with column name of which values to be counted
- GROUP BY column1, column2: clause specified with the column name on which grouping will be performed
- HAVING: clause to a specified condition
Practical Examples of SQL select duplicate records
Consider student result management system to perform practical examples to select duplicate records.
Example-1: Select duplicate records using GROUP BY clause on Single column
Write SQL query to retrieve student data having same city name
SELECT city, COUNT(city) AS 'Duplicate City Count'
FROM tblstudent
GROUP BY city
HAVING (COUNT(city) > 1)
- In the above query ,SQ: select statement is used to select duplicate records of city column of student
- SQL GROUP BY clause to create group of each city records.
- SQL HAVING clause is applied on aggregate function to check more than one record having same city name
OUTPUT:
Example-2: Select duplicate records using GROUP BY clause on multiple columns
Write SQL query to retrieve student records with duplicate city and class ID
SELECT city, COUNT(city) AS 'Duplication Count City', class_id, COUNT(class_id) AS 'Duplication Class ID Count'
FROM tblstudent
GROUP BY city, class_id
HAVING (COUNT(city) > 1) OR
(COUNT(class_id) > 1)
- In the above query, SQL select statement is used to retrieve duplicate records count value with GROUP BY and count aggregate function
- SQL GROUP BY clause is used to create group of records having same value in city and class ID columns
- HAVING condition is applied to check for duplication value in city and class_ID
OUTPUT:
Example-3: Select duplicate records using GROUP BY clause on Single column with nested –Select query
Nested-Select query means query within another query. Alternatively, a Nested-Select query is query embedded in another SQL query's WHERE clause.
In the above example, we have retrieved duplicate records count for a specified column but to retrieve duplicate record values with all columns values
Write SQL query to retrieve duplicate city records of student with personal details
SELECT student_id, studentname, city FROM tblstudent
WHERE (city IN (SELECT city FROM tblstudent AS tblstudent_1
GROUP BY city
HAVING (COUNT(city) > 1)))
- In the above query, SQL select statement with a nested –select query is used to retrieve duplicate records count value with GROUP BY and count aggregate function
- SQL GROUP BY clause is used to create a group of records having the same value in city name and HAVING condition is applied to check for duplication value in city
- When we execute the above query nested –select query will be executed first and it returns name of city which occurs more than one time.
OUTPUT:
Example-4: Select duplicate records using GROUP BY clause on multiple columns with nested –Select query
Write SQL query to retrieve student’s personal details having duplicate city and class ID
SELECT student_id, studentname, city, class_id FROM tblstudent
WHERE(city IN (SELECT city FROM tblstudent AS tblstudent_1
GROUP BY city, class_id
HAVING (COUNT(city) > 1) OR (COUNT(class_id) > 1))) OR
(class_id IN (SELECT class_id FROM tblstudent AS tblstudent_2 GROUP BY city, class_id
HAVING (COUNT(city) > 1) OR (COUNT(class_id) > 1)))
- In the above query, SQL select statement with two nested–select queries are used to count and retrieve duplicate records with GROUP BY and count aggregate function
- SQL GROUP BY clause is used to create groups of records having same value in city name and same class ID
- SQL HAVING condition is applied to check for duplication value in the city column and class ID column.
- When we execute the above query nested –select query will be executed first and it returns name of city which occurs more than one time.
OUTPUT:
Summary
In this article of select duplicate records in SQL, we have covered an overview of SQL duplicate records, steps to select duplicate records in SQL , why to use SQL GROUP BY on non-ID column, Usage of SQL COUNT() function, Syntax of select duplicate records using GROUP BY clause, COUNT() and HAVING condition explained with practical examples such as select duplicate records from a single column, select duplicate records using GROUP BY on multiple columns, select duplicate records using nested-select query and GROUP BY clause.
References
Read More