Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.misc -> Re: find duplicates
s Lehaire wrote:
> Hi I have to create a primary key on a table but I've got a duplicate data
> in this table
> so i try to find a sql request to find the duplicat rows
> can you help me ?
IMO all the answers thus far is partially right. Partially because they only address the identification issue. They do not adress actually finding the actual duplicate rows.
There's a difference. What do you do after you run that SELECT GROUP BY sql statement? Does the GROUP BY allow you immediate identification of duplicates? Nope. It tells you how bad the dup problem is - nothing more.. unless you the GROUP BY to run another SQL to extract or identify the duplicate row.
What I want is the actual row identification of duplicates from the start, allowing me to inspect these and determine which row you want and which row I want to delete.
I do the following then dealing with dups.
CREATE TABLE dups
NOLOGGING AS
SELECT
t1.rowid "BASE_ROW_ID",
t2.rowid "DUP_ROW_ID
FROM foo t1,
foo t2
WHERE t1.col1 = t2.col1 AND t1.col2 = t2.col2 AND t1.rowid < t2.rowid
Identifying the dups is done via joining the columns that are duplicated. In the above example, columns 1 and 2.
To ensure that I do not join row 1 with itself as a duplicate, I add the rowid check.
If there are multiple duplicates, then each combo will be represented. E.g. if rows 1, 3 and 5 are duplicates, the above result will have rows 1 & 3 as a duplicate, row 1 & 5 and row 3 & 5.
If you want the results to only contain row 1 & 3 and row 1 & 5, then you can wrap the above SELECT into a GROUP BY when creating the result table, e.g.
CREATE TABLE dups
NOLOGGING AS
SELECT
MIN(base_row_id) "BASE_ROW_ID",
dup_row_id
FROM (
SELECT
t1.rowid "BASE_ROW_ID",
t2.rowid "DUP_ROW_ID
FROM foo t1,
foo t2
WHERE t1.col1 = t2.col1 AND t1.col2 = t2.col2 AND t1.rowid < t2.rowid )
As the results contains the actual row identifier, you have fast and direct access to the source rows for displaying, deleting, updating, etc.
-- BillyReceived on Wed Jan 15 2003 - 03:41:19 CST
![]() |
![]() |