How to remove duplicate rows in SQL?

How to remove duplicate rows in SQL?

  • Post category:SQL
  • Post last modified:February 5, 2023
  • Reading time:6 mins read

Duplicate data can be a common problem in databases, and removing it is an important step in maintaining the quality of your data. In SQL, there are a few ways to remove duplicate rows from a table, and this article will cover two of the most common methods: using the DISTINCT keyword and using a DELETE statement with a JOIN clause.

Method 1: Using the DISTINCT keyword

The DISTINCT keyword is used in the SELECT statement to return only unique values from the selected columns. This is a simple and straightforward method for removing duplicates from a table.

Here’s an example of how to use the DISTINCT keyword:

SELECT DISTINCT column1, column2, column3
FROM table_name;

This query will return only the unique rows in the column1, column2, and column3 columns of the table_name table.

Method 2: Using a DELETE statement with a JOIN clause

This method is used to completely remove all duplicates from a table, not just return unique values. To do this, you can use a DELETE statement with a JOIN clause.

Here’s an example:

DELETE t1
FROM table_name t1
JOIN (SELECT column1, column2, column3
      FROM table_name
      GROUP BY column1, column2, column3
      HAVING COUNT(*) > 1) t2
ON t1.column1 = t2.column1 AND t1.column2 = t2.column2 AND t1.column3 = t2.column3;

To remove duplicate rows in SQL , The following script shows usage of GROUP BYHAVINGORDER BY in one query, and returns the results with a duplicate column and its count.

To Fetch Duplicate Rows:

SELECT YourColumnName,
       COUNT(*) TotalCount
FROM   YourTableName
GROUP  BY YourColumnName
HAVING COUNT(*) > 1
ORDER  BY COUNT(*) DESC 

To Delete the Duplicate Rows:

DELETE users 
WHERE rowid NOT IN 
(SELECT MIN(rowid)
FROM users
GROUP BY name, email); 
DELETE FROM myTable WHERE RowID IN (
  SELECT MIN(RowID) AS IDNo FROM myTable
  GROUP BY Col1, Col2, Col3
  HAVING COUNT(*) = 2 )
WITH cte
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3 
                                       ORDER BY ( SELECT 0)) RN
         FROM   #MyTable)
DELETE FROM cte
WHERE  RN > 1;
delete t1
from table t1, table t2
where t1.columnA = t2.columnA
and t1.rowid>t2.rowid
DELETE LU 
FROM   (SELECT *, 
               Row_number() 
                 OVER ( 
                   partition BY col1, col1, col3 
                   ORDER BY rowid DESC) [Row] 
        FROM   mytable) LU 
WHERE  [row] > 1 

This will delete duplicate rows, except the first row

DELETE
FROM
    Mytable
WHERE
    RowID NOT IN (
        SELECT
            MIN(RowID)
        FROM
            Mytable
        GROUP BY
            Col1,
            Col2,
            Col3
    )

Hope this is helpfull. I am sure there are lot of ways to remove duplicates. Please do comments which process do you follow.

About Me:-
My name is Om Prakash Singh – welcome to my blog!
I am a data analytics consultant with a specialty in Looker. With years of experience in the field, I have developed a strong understanding of data analysis and visualization, and have a passion for helping organizations make informed decisions through data-driven insights. My expertise lies in utilizing Looker, a leading data platform, to help businesses turn their data into actionable insights, streamline their data processes, and drive growth. I am dedicated to sharing my knowledge and experience with others through this blog, and I hope you will find the content informative and valuable. Whether you’re a seasoned data professional or just starting out, I believe there is something here for everyone. Thank you for stopping by and I look forward to connecting with you!
Reach out to us here if you are interested to evaluate if Looker is right for you or any other BI solution.

Leave a Reply