Duplicate data can be a common problem in databases, and removing it is an important step in maintaining the quality of your data. In SQL, there are a few ways to remove duplicate rows from a table, and this article will cover two of the most common methods: using the DISTINCT
keyword and using a DELETE
statement with a JOIN
clause.
Method 1: Using the DISTINCT
keyword
The DISTINCT
keyword is used in the SELECT
statement to return only unique values from the selected columns. This is a simple and straightforward method for removing duplicates from a table.
Here’s an example of how to use the DISTINCT
keyword:
SELECT DISTINCT column1, column2, column3
FROM table_name;
This query will return only the unique rows in the column1
, column2
, and column3
columns of the table_name
table.
Method 2: Using a DELETE
statement with a JOIN
clause
This method is used to completely remove all duplicates from a table, not just return unique values. To do this, you can use a DELETE
statement with a JOIN
clause.
Here’s an example:
DELETE t1
FROM table_name t1
JOIN (SELECT column1, column2, column3
FROM table_name
GROUP BY column1, column2, column3
HAVING COUNT(*) > 1) t2
ON t1.column1 = t2.column1 AND t1.column2 = t2.column2 AND t1.column3 = t2.column3;
To remove duplicate rows in SQL , The following script shows usage of GROUP BY
, HAVING
, ORDER BY
in one query, and returns the results with a duplicate column and its count.
To Fetch Duplicate Rows:
SELECT YourColumnName,
COUNT(*) TotalCount
FROM YourTableName
GROUP BY YourColumnName
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
To Delete the Duplicate Rows:
DELETE users
WHERE rowid NOT IN
(SELECT MIN(rowid)
FROM users
GROUP BY name, email);
DELETE FROM myTable WHERE RowID IN (
SELECT MIN(RowID) AS IDNo FROM myTable
GROUP BY Col1, Col2, Col3
HAVING COUNT(*) = 2 )
WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3
ORDER BY ( SELECT 0)) RN
FROM #MyTable)
DELETE FROM cte
WHERE RN > 1;
delete t1
from table t1, table t2
where t1.columnA = t2.columnA
and t1.rowid>t2.rowid
DELETE LU
FROM (SELECT *,
Row_number()
OVER (
partition BY col1, col1, col3
ORDER BY rowid DESC) [Row]
FROM mytable) LU
WHERE [row] > 1
This will delete duplicate rows, except the first row
DELETE
FROM
Mytable
WHERE
RowID NOT IN (
SELECT
MIN(RowID)
FROM
Mytable
GROUP BY
Col1,
Col2,
Col3
)
Hope this is helpfull. I am sure there are lot of ways to remove duplicates. Please do comments which process do you follow.
About Me:-
My name is Om Prakash Singh – welcome to my blog!
I am a data analytics consultant with a specialty in Looker. With years of experience in the field, I have developed a strong understanding of data analysis and visualization, and have a passion for helping organizations make informed decisions through data-driven insights. My expertise lies in utilizing Looker, a leading data platform, to help businesses turn their data into actionable insights, streamline their data processes, and drive growth. I am dedicated to sharing my knowledge and experience with others through this blog, and I hope you will find the content informative and valuable. Whether you’re a seasoned data professional or just starting out, I believe there is something here for everyone. Thank you for stopping by and I look forward to connecting with you!
Reach out to us here if you are interested to evaluate if Looker is right for you or any other BI solution.