Select duplicate records in SQL table [SOLVED]


SQL

Reviewer: Deepak Prasad

Overview of SQL duplicate rows

To prevent duplicate values from being inserted into a column of a database table, SQL constraints need to be applied. The constraint enforces the uniqueness of records, like the SQL UNIQUE constraint.

However, sometimes you may find duplicate values in a table due to the poor database design, application bugs, or undefined data from external sources

Duplicate records are those that contain identical data values. To determine duplicate records, we must first define your criteria.

 

Different methods to select duplicate records in SQL

We first need to define criteria to find duplicate rows. Is it a combination of two or more columns or is it simply searching for duplicates within a single column?

Second, writing a query to search duplicated values in SQL comprises two key steps:

  1. Using the GROUP BY clause to group all rows by the target column(s) that is the column(s) you want to check for duplicate values on.
  2. Using the COUNT function in the HAVING clause to check if any of the groups have more than 1 entry; those would be the duplicate values.

 

What is the use of GROUP BY clause on non-ID column(s)?

In SQL, the GROUP BY clause creates groups of records whose non-ID columns all have the same value. The columns you want to group by are put after the GROUP BY keyword. Because the ID column is our table's primary key, each row will have a different value under that column. We would not be able to detect duplicates if we included ID column.

 

Usage of SQL COUNT() Function

The count() function with the GROUP BY clause is used to count the data which were grouped on a particular column of the table. To select duplicate records the count() function return value must be greater than one.

 

Syntax of using GROUP BY clause, COUNT(), and HAVING condition

SELECT  column1,column2,COUNT(column_name),...
FROM table_name
GROUP BY column1,column2,....
HAVING COUNT(column_name) > 1

Here ,

  • column1, column2: Specified the name of columns to be retrieved
  • COUNT(column_name) : aggregate function COUNT() with column name of which values to be counted
  • GROUP BY column1, column2: clause specified with the column name on which grouping will be performed
  • HAVING: clause to a specified condition

 

Practical Examples of SQL select duplicate records

Consider student result management system to perform practical examples to select duplicate records.

Select duplicate records in SQL table [SOLVED]

 

Example-1: Select duplicate records using GROUP BY clause on Single column

Write SQL query to retrieve student data having same city name

SELECT        city, COUNT(city) AS 'Duplicate City Count'
FROM            tblstudent
GROUP BY city
HAVING        (COUNT(city) > 1)
  • In the above query ,SQ: select statement is used to select duplicate records of city column of  student
  • SQL GROUP BY clause to create  group of each city records.
  • SQL HAVING clause is applied on aggregate function to check more than one record having same city name

OUTPUT:

Select duplicate records in SQL table [SOLVED]

 

Example-2: Select duplicate records using GROUP BY clause on multiple columns

Write SQL query to retrieve student records with duplicate city and class ID

SELECT        city, COUNT(city) AS 'Duplication Count City', class_id, COUNT(class_id) AS 'Duplication Class ID Count'
FROM            tblstudent
GROUP BY city, class_id
HAVING        (COUNT(city) > 1) OR
                         (COUNT(class_id) > 1)
  • In the above query, SQL select statement is used to retrieve duplicate records count value with GROUP BY and count aggregate function
  • SQL GROUP BY clause is used to create group of records having same value in city and class ID columns
  • HAVING condition is applied to check for duplication value in city and class_ID

OUTPUT:

Select duplicate records in SQL table [SOLVED]

 

Example-3: Select duplicate records using GROUP BY clause on Single column with nested –Select query

Nested-Select query means query within another query. Alternatively, a Nested-Select query is query embedded in another SQL query's WHERE clause.

In the above example, we have retrieved duplicate records count for a specified column but to retrieve duplicate record values with all columns values

Write SQL query to retrieve duplicate city  records of student with personal details

 SELECT student_id, studentname, city FROM tblstudent
WHERE (city IN (SELECT  city FROM  tblstudent AS tblstudent_1 
GROUP BY city
HAVING    (COUNT(city) > 1)))
  • In the above query, SQL select statement with a nested –select query  is used to retrieve duplicate records count value with GROUP BY and count aggregate function
  • SQL GROUP BY clause is used to create a group of records having the same value in city name and HAVING condition is applied to check for duplication value in city
  • When we execute the above query nested –select query will be executed first and it returns name of city which occurs more than one time.

OUTPUT:

Select duplicate records in SQL table [SOLVED]

 

Example-4: Select duplicate records using GROUP BY clause on multiple columns with nested –Select query

Write SQL query to retrieve student’s personal details having duplicate city and class ID

SELECT student_id, studentname, city, class_id FROM            tblstudent
WHERE(city IN (SELECT   city FROM    tblstudent AS tblstudent_1
GROUP BY city, class_id
HAVING    (COUNT(city) > 1) OR (COUNT(class_id) > 1))) OR
(class_id IN (SELECT        class_id  FROM  tblstudent AS tblstudent_2 GROUP BY city, class_id
HAVING  (COUNT(city) > 1) OR  (COUNT(class_id) > 1)))
  • In the above query, SQL select statement with two nested–select queries are used to count and  retrieve duplicate records  with GROUP BY and count aggregate function
  • SQL GROUP BY clause is used to create groups of records having same value in city name and same class ID
  • SQL  HAVING condition is applied to check for duplication value in the city column and class ID column.
  • When we execute the above query nested –select query will be executed first and it returns name of city which occurs more than one time.

OUTPUT:

Select duplicate records in SQL table [SOLVED]

 

Summary

In this article of select duplicate records in SQL, we have covered an overview of SQL duplicate records, steps to select duplicate records in SQL , why to use SQL GROUP BY on non-ID column, Usage of SQL COUNT() function, Syntax of select duplicate records using GROUP BY clause, COUNT() and HAVING condition explained with practical examples such as select duplicate records from a single column, select duplicate records using GROUP BY on multiple columns, select duplicate records using nested-select query and GROUP BY clause.

 

References

SQL GROUP BY
SQL COUNT

 

Read More

https://support.microsoft.com/en-us/office/find-duplicate-records-with-a-query-3cc805a2-2a13-4439-b0d3-6b23c7d60fbb

 

Falguni Thakker

Falguni Thakker

She is a dedicated professional with expertise in SQL, Python, C++, and Linux. Currently serving as a professor at a prestigious university. With a passion for teaching and a strong technical background, she inspires the next generation of computer scientists. You can connect with her on her LinkedIn profile.

Can't find what you're searching for? Let us assist you.

Enter your query below, and we'll provide instant results tailored to your needs.

If my articles on GoLinuxCloud has helped you, kindly consider buying me a coffee as a token of appreciation.

Buy GoLinuxCloud a Coffee

For any other feedbacks or questions you can send mail to admin@golinuxcloud.com

Thank You for your support!!

Leave a Comment