Delete duplicate rows or records in SQL | SQL Interview Question
Deleting duplicate rows or records is a popular SQL interview questions but many find it difficult to solve. Let’s try to break it down in this lesson,
Deleting duplicate records involves the below steps:
Step 1: Check if the table contains duplicate values as per your criteria
Step 2: Write the Select query to find the duplicates records
Step 3: Delete the duplicates records found in Step 2
Now let’s explore these steps in detail,
Step 1: Check if the table contains duplicate values as per your criteria
In our example, we have a course table, that has the below records. As we can see, we have many duplicate records (marked in yellow) in this table.

Now let’s validate the same using a SQL query.
Here GROUP BY is a good strategy to fetches all unique combinations of course_name, price, and instructor_id and counts how many instances (rows) have the same combination.
- The below SQL query is used to identify duplicates in the
coursetable based on specific columns (course_name,price, andinstructor_id). - Then, we add
HAVING COUNT(*) > 1which filters to show only those combinations that appear more than once, which indicates the presence of duplicates in the table.
SELECT course_name, price, instructor_id, count(course_id)
FROM course
GROUP BY course_name, price, instructor_id
HAVING COUNT(*) > 1The output confirms that the below courses are repeated 3 times.

Now that we know, that we have duplicate records in the course table, let’s explore how we can remove these duplicate records.
In this article we will discuss 3 commonly used methods,
Method 1: Using GROUP BY and HAVING clause
Method 2: Using ROW_NUMBER() window function
Method 3: Using Self Join
Method 1: Using GROUP BY and HAVING clause
This query retrieves the minimum course_id for each unique combination of course_name, price, and instructor_id where there are duplicates. Here's how it works:
GROUP BY course_name, price, instructor_id: Groups the records by these columns, creating groups of records that have the same combination of these values.HAVING COUNT(*) > 1: Filters the groups to only include those where the count of rows within each group (i.e., the count of duplicates based on the specified columns) is greater than 1.SELECT MIN(course_id): Within each of these groups, it selects the minimumcourse_id. This gives you the minimumcourse_idfor each set of duplicates based oncourse_name,price, andinstructor_id.
SELECT MIN(course_id)
FROM course
GROUP BY course_name, price, instructor_id
HAVING COUNT(*) > 1Output:

We can use the first query as a subquery to delete duplicate records:
- The subquery inside
NOT INprovides a list of minimumcourse_ids for each group of duplicates based oncourse_name,price, andinstructor_id. - The outer DELETE statement removes all rows from the
coursetable where thecourse_idis not present in the list of minimumcourse_ids for groups that have duplicates based on the specified columns.
DELETE FROM course
WHERE course_id NOT IN (
SELECT MIN(course_id)
FROM course
GROUP BY course_name, price, instructor_id
HAVING COUNT(*) > 1
);Method 2: Using ROW_NUMBER() window function
- Here we Use a window function
ROW_NUMBER()within a subquery to generate row numbers partitioned bycourse_name,price, andinstructor_id. - The inner subquery assigns a sequential number (
row_num) to each row within its respective partition. - The outer query selects
course_ids whererow_numis greater than 1, which indicates the duplicates for the specified combination of columns.
SELECT course_id
FROM (
SELECT course_id,
ROW_NUMBER() OVER(PARTITION BY course_name, price, instructor_id ORDER BY course_id) AS row_num
FROM course
) AS subquery
WHERE row_num > 1Output:

DELETE FROM course WHERE course_id IN ( SELECT course_id FROM ( SELECT course_id,
ROW_NUMBER() OVER(PARTITION BY course_name, price, instructor_id ORDER BY course_id)
AS row_num FROM course ) AS subquery WHERE row_num > 1 );If you want to avoid subquery, we can also use Common Table Expression (CTE) along with window function as shown below,
- First, create a Common Table Expression (CTE)
CTEthat usesROW_NUMBER()to assign sequential numbers partitioned bycourse_name,price, andinstructor_id. - Then, use DELETE statement to delete rows from the CTE where
row_numis greater than 1 to remove the duplicates.
WITH CTE AS (
SELECT course_id, course_name, price, instructor_id,
ROW_NUMBER() OVER(PARTITION BY course_name, price, instructor_id ORDER BY course_id) AS row_num
FROM course
)
DELETE FROM CTE WHERE row_num > 1;Method 3: Using Self Join
Here we use a self-join on the course table (c1 and c2 are aliases for the same table) to identify duplicate records based on certain criteria (course_name, price, and instructor_id).
Here's how it works:
JOIN course c2 ON...: Joins thecoursetable to itself (c2) based on matchingcourse_name,price, andinstructor_id, ensuring thatc1refers to a record with a highercourse_idthanc2.AND c1.course_id > c2.course_id: This condition ensures that only the record with the highercourse_idwithin each pair of duplicates is selected.SELECT c1.course_id: It retrieves thecourse_idfromc1, which represents the duplicates with highercourse_idvalues.
SELECT c1.course_id
FROM course c1
JOIN course c2 ON c1.course_name = c2.course_name
AND c1.price = c2.price
AND c1.instructor_id = c2.instructor_id
AND c1.course_id > c2.course_idOutput:

We then pass the previous course_id list as a subquery to delete duplicate records from the course table,
DELETE FROM course
WHERE course_id IN (
SELECT c1.course_id
FROM course c1
JOIN course c2 ON c1.course_name = c2.course_name
AND c1.price = c2.price
AND c1.instructor_id = c2.instructor_id
AND c1.course_id > c2.course_id
);