Select duplicate records in SQL table [SOLVED]

Overview of SQL duplicate rows

To prevent duplicate values from being inserted into a column of a database table, SQL constraints need to be applied. The constraint enforces the uniqueness of records, like the SQL UNIQUE constraint.

However, sometimes you may find duplicate values in a table due to the poor database design, application bugs, or undefined data from external sources

Duplicate records are those that contain identical data values. To determine duplicate records, we must first define your criteria.

Different methods to select duplicate records in SQL

We first need to define criteria to find duplicate rows. Is it a combination of two or more columns or is it simply searching for duplicates within a single column?

Second, writing a query to search duplicated values in SQL comprises two key steps:

Using theGROUP BYclause to group all rows by the target column(s) that is the column(s) you want to check for duplicate values on.
Using theCOUNTfunction in the HAVING clause to check if any of the groups have more than 1 entry; those would be the duplicate values.

What is the use of GROUP BY clause on non-ID column(s)?

In SQL, the GROUP BY clause creates groups of records whose non-ID columns all have the same value. The columns you want to group by are put after the GROUP BY keyword. Because the ID column is our table's primary key, each row will have a different value under that column. We would not be able to detect duplicates if we included ID column.

Usage of SQL COUNT() Function

The count() function with the GROUP BY clause is used to count the data which were grouped on a particular column of the table.To select duplicate records the count() function return value must be greater than one.

Syntax of using GROUP BY clause, COUNT(), and HAVING condition

sql


SELECT  column1,column2,COUNT(column_name),...
FROM table_name
GROUP BY column1,column2,....
HAVING COUNT(column_name) > 1

Here ,

column1, column2: Specified the name of columns to be retrieved
COUNT(column_name) : aggregate function COUNT() with column name of which values to be counted
GROUP BY column1, column2: clause specified with the column name on which grouping will be performed
HAVING: clause to a specified condition

Practical Examples of SQL select duplicate records

Consider student result management system to perform practical examples to select duplicate records.

Example-1: Select duplicate records using GROUP BY clause on Single column

Write SQL query to retrieve student data having same city name

sql


SELECT        city, COUNT(city) AS 'Duplicate City Count'
FROM            tblstudent
GROUP BY city
HAVING        (COUNT(city) > 1)

In the above query ,SQ: select statement is used to select duplicate records of city column of student
SQL GROUP BY clause to create group of each city records.
SQL HAVING clause is applied on aggregate function to check more than one record having same city name

OUTPUT:

Example-2: Select duplicate records using GROUP BY clause on multiple columns

Write SQL query to retrieve student records with duplicate city and class ID

sql


SELECT        city, COUNT(city) AS 'Duplication Count City', class_id, COUNT(class_id) AS 'Duplication Class ID Count'
FROM            tblstudent
GROUP BY city, class_id
HAVING        (COUNT(city) > 1) OR
                         (COUNT(class_id) > 1)

In the above query, SQL select statement is used to retrieve duplicate records count value with GROUP BY and count aggregate function
SQL GROUP BY clause is used to create group of records having same value in city and class ID columns
HAVING condition is applied to check for duplication value in city and class_ID

OUTPUT:

Example-3: Select duplicate records using GROUP BY clause on Single column with nested –Select query

Nested-Select query means query within another query. Alternatively, a Nested-Select query is query embedded in another SQL query's WHERE clause.

In the above example, we have retrieved duplicate records count for a specified column but to retrieve duplicate record values with all columns values

Write SQL query to retrieve duplicate city records of student with personal details

sql


SELECT student_id, studentname, city FROM tblstudent
WHERE (city IN (SELECT  city FROM  tblstudent AS tblstudent_1 
GROUP BY city
HAVING    (COUNT(city) > 1)))

In the above query, SQL select statement with a nested –select query is used to retrieve duplicate records count value with GROUP BY and count aggregate function
SQL GROUP BY clause is used to create a group of records having the same value in city name and HAVING condition is applied to check for duplication value in city
When we execute the above query nested –select query will be executed first and it returns name of city which occurs more than one time.

OUTPUT:

Example-4: Select duplicate records using GROUP BY clause on multiple columns with nested –Select query

Write SQL query to retrieve student’s personal details having duplicate city and class ID

sql


SELECT student_id, studentname, city, class_id FROM            tblstudent
WHERE(city IN (SELECT   city FROM    tblstudent AS tblstudent_1
GROUP BY city, class_id
HAVING    (COUNT(city) > 1) OR (COUNT(class_id) > 1))) OR
(class_id IN (SELECT        class_id  FROM  tblstudent AS tblstudent_2 GROUP BY city, class_id
HAVING  (COUNT(city) > 1) OR  (COUNT(class_id) > 1)))

In the above query, SQL select statement with two nested–select queries are used to count and retrieve duplicate records with GROUP BY and count aggregate function
SQL GROUP BY clause is used to create groups of records having same value in city name and same class ID
SQL HAVING condition is applied to check for duplication value in the city column and class ID column.
When we execute the above query nested –select query will be executed first and it returns name of city which occurs more than one time.

OUTPUT:

Summary

In this article of select duplicate records in SQL, we have covered an overview of SQL duplicate records, steps to select duplicate records in SQL , why to use SQL GROUP BY on non-ID column, Usage of SQL COUNT() function, Syntax of select duplicate records using GROUP BY clause, COUNT() and HAVING condition explained with practical examples such as select duplicate records from a single column, select duplicate records using GROUP BY on multiple columns, select duplicate records using nested-select query and GROUP BY clause.

References

Microsoft documentation

Overview of SQL duplicate rows

Different methods to select duplicate records in SQL

What is the use of GROUP BY clause on non-ID column(s)?

Usage of SQL COUNT() Function

Syntax of using GROUP BY clause, COUNT(), and HAVING condition

Practical Examples of SQL select duplicate records

Example-1: Select duplicate records using GROUP BY clause on Single column

Example-2: Select duplicate records using GROUP BY clause on multiple columns

Example-3: Select duplicate records using GROUP BY clause on Single column with nested –Select query

Example-4: Select duplicate records using GROUP BY clause on multiple columns with nested –Select query

Summary

References

Read More

Related Articles

DBMS Interview Questions and Answers

PostgreSQL Interview Questions and Answers

SQL Technical Interview Questions and Answers

Search GoLinuxCloud