How to remove duplicate rows in SQL?

Duplicate data can be a common problem in databases, and removing it is an important step in maintaining the quality of your data. In SQL, there are a few ways to remove duplicate rows from a table, and this article will cover two of the most common methods: using the DISTINCT keyword and using a DELETE statement with a JOIN clause.

Method 1: Using the DISTINCT keyword

The DISTINCT keyword is used in the SELECT statement to return only unique values from the selected columns. This is a simple and straightforward method for removing duplicates from a table.

Here’s an example of how to use the DISTINCT keyword:

SELECT DISTINCT column1, column2, column3
FROM table_name;

This query will return only the unique rows in the column1, column2, and column3 columns of the table_name table.

Method 2: Using a DELETE statement with a JOIN clause

This method is used to completely remove all duplicates from a table, not just return unique values. To do this, you can use a DELETE statement with a JOIN clause.

Here’s an example:

DELETE t1
FROM table_name t1
JOIN (SELECT column1, column2, column3
      FROM table_name
      GROUP BY column1, column2, column3
      HAVING COUNT(*) > 1) t2
ON t1.column1 = t2.column1 AND t1.column2 = t2.column2 AND t1.column3 = t2.column3;

To remove duplicate rows in SQL , The following script shows usage of GROUP BY, HAVING, ORDER BY in one query, and returns the results with a duplicate column and its count.

To Fetch Duplicate Rows:

SELECT YourColumnName,
       COUNT(*) TotalCount
FROM   YourTableName
GROUP  BY YourColumnName
HAVING COUNT(*) > 1
ORDER  BY COUNT(*) DESC

To Delete the Duplicate Rows:

DELETE users 
WHERE rowid NOT IN 
(SELECT MIN(rowid)
FROM users
GROUP BY name, email);

DELETE FROM myTable WHERE RowID IN (
  SELECT MIN(RowID) AS IDNo FROM myTable
  GROUP BY Col1, Col2, Col3
  HAVING COUNT(*) = 2 )

WITH cte
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3 
                                       ORDER BY ( SELECT 0)) RN
         FROM   #MyTable)
DELETE FROM cte
WHERE  RN > 1;

delete t1
from table t1, table t2
where t1.columnA = t2.columnA
and t1.rowid>t2.rowid

DELETE LU 
FROM   (SELECT *, 
               Row_number() 
                 OVER ( 
                   partition BY col1, col1, col3 
                   ORDER BY rowid DESC) [Row] 
        FROM   mytable) LU 
WHERE  [row] > 1

This will delete duplicate rows, except the first row

DELETE
FROM
    Mytable
WHERE
    RowID NOT IN (
        SELECT
            MIN(RowID)
        FROM
            Mytable
        GROUP BY
            Col1,
            Col2,
            Col3
    )

Hope this is helpfull. I am sure there are lot of ways to remove duplicates. Please do comments which process do you follow.

About Me:-
My name is Om Prakash Singh – welcome to my blog!
I am a data analytics consultant with a specialty in Looker. With years of experience in the field, I have developed a strong understanding of data analysis and visualization, and have a passion for helping organizations make informed decisions through data-driven insights. My expertise lies in utilizing Looker, a leading data platform, to help businesses turn their data into actionable insights, streamline their data processes, and drive growth. I am dedicated to sharing my knowledge and experience with others through this blog, and I hope you will find the content informative and valuable. Whether you’re a seasoned data professional or just starting out, I believe there is something here for everyone. Thank you for stopping by and I look forward to connecting with you!
Reach out to us here if you are interested to evaluate if Looker is right for you or any other BI solution.

How to remove duplicate rows in SQL?