Removing duplicate rows in PostgreSQL

8 responses to “Removing duplicate rows in PostgreSQL”

F(log) says:

April 14, 2017 at 12:47 pm

Very useful and easy to understand. This will go to my personal documentation 🙂

Reply
Vyacheslav says:

April 14, 2017 at 2:08 pm

Hey, thanks for the article. Take a loop at this one as well

https://wiki.postgresql.org/wiki/Deleting_duplicates

Reply
- Hans-Jürgen Schönig says:
  
  April 14, 2017 at 2:16 pm
  
  this one is know too. however, we should not assume that we have unique IDs. if there is no primary key, the assumption might be false 🙂
  
  Reply
  - Vyacheslav says:
    
    April 14, 2017 at 2:42 pm
    
    Thanks for pointing out how to use tuple notation. Before that I would convert the row to json row_to_json(*). I didn't know that this is unnecessary.
    
    Reply
Salah Jubeh says:

April 20, 2017 at 12:37 pm

You can delete duplicates without CTE and GROUP BY clause by using DELETE statement. The following is much faster than using group by an less complicated.

delete from t_location a using t_location b where (a.country, a.city)=(b.country, b.city) and a.ctid < b.ctid;

Reply
Golokesh Patra says:

August 10, 2017 at 6:33 am

**WORKS FOR BOTH NORMAL SQL AND POSTGRESQL (ALSO WORKS IN AWS REDSHIFT)**

DROP TABLE IF EXISTS backupOfTheTableContainingDuplicates;

CREATE TABLE aNewEmptyTemporaryOrBackupTable
AS SELECT DISTINCT * FROM originalTableContainingDuplicates;

TRUNCATE TABLE originalTableContainingDuplicates;

INSERT INTO originalTableContainingDuplicates SELECT * FROM
aNewEmptyTemporaryOrBackupTable ;

DROP TABLE aNewEmptyTemporaryOrBackupTable ;

**EXPLANATION OF THE ABOVE SQL SCRIPT**
So,

The 1st query ensures, If you have any backup/temporary table of the original table containing duplicates then first drop that table.

The 2nd query, creates a new table(Temporary/Backup) table with unique entries in the original Table containing duplicate, so the new temporary table is same as the original table MINUS the duplicate entries.

The 3rd Query, truncates or empties the original table.

the 4th Query, inserts or copies all the unique entries in the temporary table to the original table which has been recently truncated (So has no Data). After this query is executed, the Original Table will be populated with UNIQUE data that was in the temporary table.

The 5th Query, removes/drops the unnecessary temporary table.

So End result is, the original table has only UNIQUE ENTRIES and no duplicates.

Reply
Douglas H. Bradshaw says:

May 3, 2018 at 3:03 pm

That's really helpful -- I didn't know about ctid. Thanks!

Reply
Antonio says:

November 29, 2019 at 4:51 pm

Hi Hans-Jurgen, very clever approach, thanks.
But what if instead to delete those rows we would like to identify them by adding a field?
It would allow to merge duplicates or select the one who really must be deleted...
Imagine you have a list of contacts, in those duplicated ones some operators has added info into some records but not in every records (records with some data and records without).
I'm actually struggling to find this in SQL (postgres actually) while it is relatively easy in a spreadsheet.

thanks

Reply

Removing duplicate rows in PostgreSQL

Detecting and finding duplicate entries

Identifying rows

Finally: Removing duplication rows

8 responses to “Removing duplicate rows in PostgreSQL”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts