Duplicate Report [message #355648] |
Sun, 26 October 2008 18:51 |
someuserhavingdoubt
Messages: 2 Registered: October 2008
|
Junior Member |
|
|
Hi All,
One of the business requirement on my current project is to find out the duplicte records. There are various ways of doing this however the part where I am facing a challange is:
1. I need to get the source system, created date & created by along with the duplicate recordset in a single query.
2. There are around 4 to 5 million records in this staging tables.
The table(stagingCustomers) structure is
CreatedOn
CreatedBy
UpdatedOn
UpdatedBy
SourceSystem
Name
Address
City
State
Country
Zip
Phone
Current Query (takes about 3:00 hrs):
SELECT a.sourcesystem, a.name, a.city, a.state, a.country, a.createdon, a.createdby
FROM stagingcustomers a
WHERE EXISTS (SELECT 1
FROM stagingcustomers b
WHERE a.name = b.name
AND a.city = b.city
AND a.state = b.state
AND a.country = b.country
GROUP BY b.name, b.city, b.state, b.country
HAVING COUNT (*) > 1)
ORDER BY a.name, a.city, a.state, a.country, a.createdon
---------------------------------------------------------------------
Any better way of doing this ?
|
|
|
Re: Duplicate Report [message #355657 is a reply to message #355648] |
Sun, 26 October 2008 22:57 |
|
Barbara Boehmer
Messages: 9100 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
Here are a couple of methods using the emp demo table assuming that the same deptno and job constitutes a duplicate.
SCOTT@orcl_11g> SELECT deptno, job, ename
2 FROM (SELECT deptno, job, ename,
3 COUNT (*) OVER (PARTITION BY deptno, job) counter
4 FROM emp)
5 WHERE counter > 1
6 ORDER BY deptno, job, ename
7 /
DEPTNO JOB ENAME
---------- --------- ----------
20 ANALYST FORD
20 ANALYST SCOTT
20 CLERK ADAMS
20 CLERK SMITH
30 SALESMAN ALLEN
30 SALESMAN MARTIN
30 SALESMAN TURNER
30 SALESMAN WARD
8 rows selected.
SCOTT@orcl_11g> create table exceptions
2 (row_id rowid,
3 owner varchar2(30),
4 table_name varchar2(30),
5 constraint varchar2(30))
6 /
Table created.
SCOTT@orcl_11g> ALTER TABLE emp
2 ADD CONSTRAINT emp_deptno_job_uk UNIQUE (deptno, job)
3 EXCEPTIONS INTO exceptions
4 /
ADD CONSTRAINT emp_deptno_job_uk UNIQUE (deptno, job)
*
ERROR at line 2:
ORA-02299: cannot validate (SCOTT.EMP_DEPTNO_JOB_UK) - duplicate keys found
SCOTT@orcl_11g> SELECT emp.deptno, emp.job, emp.ename
2 FROM emp, exceptions
3 WHERE emp.ROWID = exceptions.row_id
4 ORDER BY deptno, job, ename
5 /
DEPTNO JOB ENAME
---------- --------- ----------
20 ANALYST FORD
20 ANALYST SCOTT
20 CLERK ADAMS
20 CLERK SMITH
30 SALESMAN ALLEN
30 SALESMAN MARTIN
30 SALESMAN TURNER
30 SALESMAN WARD
8 rows selected.
SCOTT@orcl_11g>
|
|
|
|
Re: Duplicate Report [message #355660 is a reply to message #355658] |
Sun, 26 October 2008 23:08 |
|
Barbara Boehmer
Messages: 9100 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
someuserhavingdoubt wrote on Sun, 26 October 2008 21:00 | I was wondering if this could be achieved without using PARTITION BY.
|
Not if you need to list columns, like sourcesystem, that can have different values and still constitute a duplicate based on other columns.
|
|
|