Home » SQL & PL/SQL » SQL & PL/SQL » Duplicate Report (Oracle 8)
Duplicate Report [message #355648] Sun, 26 October 2008 18:51 Go to next message
someuserhavingdoubt
Messages: 2
Registered: October 2008
Junior Member
Hi All,

One of the business requirement on my current project is to find out the duplicte records. There are various ways of doing this however the part where I am facing a challange is:

1. I need to get the source system, created date & created by along with the duplicate recordset in a single query.
2. There are around 4 to 5 million records in this staging tables.

The table(stagingCustomers) structure is

CreatedOn
CreatedBy
UpdatedOn
UpdatedBy
SourceSystem
Name
Address
City
State
Country
Zip
Phone

Current Query (takes about 3:00 hrs):

SELECT a.sourcesystem, a.name, a.city, a.state, a.country, a.createdon, a.createdby
FROM stagingcustomers a
WHERE EXISTS (SELECT 1
FROM stagingcustomers b
WHERE a.name = b.name
AND a.city = b.city
AND a.state = b.state
AND a.country = b.country
GROUP BY b.name, b.city, b.state, b.country
HAVING COUNT (*) > 1)
ORDER BY a.name, a.city, a.state, a.country, a.createdon
---------------------------------------------------------------------
Any better way of doing this ?

Re: Duplicate Report [message #355657 is a reply to message #355648] Sun, 26 October 2008 22:57 Go to previous messageGo to next message
Barbara Boehmer
Messages: 8633
Registered: November 2002
Location: California, USA
Senior Member
Here are a couple of methods using the emp demo table assuming that the same deptno and job constitutes a duplicate.

SCOTT@orcl_11g> SELECT deptno, job, ename
  2  FROM   (SELECT deptno, job, ename,
  3  		    COUNT (*) OVER (PARTITION BY deptno, job) counter
  4  	     FROM   emp)
  5  WHERE  counter > 1
  6  ORDER  BY deptno, job, ename
  7  /

    DEPTNO JOB       ENAME
---------- --------- ----------
        20 ANALYST   FORD
        20 ANALYST   SCOTT
        20 CLERK     ADAMS
        20 CLERK     SMITH
        30 SALESMAN  ALLEN
        30 SALESMAN  MARTIN
        30 SALESMAN  TURNER
        30 SALESMAN  WARD

8 rows selected.


SCOTT@orcl_11g> create table exceptions
  2    (row_id	   rowid,
  3  	owner	   varchar2(30),
  4  	table_name varchar2(30),
  5  	constraint varchar2(30))
  6  /

Table created.

SCOTT@orcl_11g> ALTER TABLE emp
  2  ADD CONSTRAINT emp_deptno_job_uk UNIQUE (deptno, job)
  3  EXCEPTIONS INTO exceptions
  4  /
ADD CONSTRAINT emp_deptno_job_uk UNIQUE (deptno, job)
               *
ERROR at line 2:
ORA-02299: cannot validate (SCOTT.EMP_DEPTNO_JOB_UK) - duplicate keys found


SCOTT@orcl_11g> SELECT emp.deptno, emp.job, emp.ename
  2  FROM   emp, exceptions
  3  WHERE  emp.ROWID = exceptions.row_id
  4  ORDER  BY deptno, job, ename
  5  /

    DEPTNO JOB       ENAME
---------- --------- ----------
        20 ANALYST   FORD
        20 ANALYST   SCOTT
        20 CLERK     ADAMS
        20 CLERK     SMITH
        30 SALESMAN  ALLEN
        30 SALESMAN  MARTIN
        30 SALESMAN  TURNER
        30 SALESMAN  WARD

8 rows selected.

SCOTT@orcl_11g>

Re: Duplicate Report [message #355658 is a reply to message #355657] Sun, 26 October 2008 23:00 Go to previous messageGo to next message
someuserhavingdoubt
Messages: 2
Registered: October 2008
Junior Member
Hi Barbara,

Thanks for the quick reply. THe first solution works great, however I was wondering if this could be achieved without using PARTITION BY.

Thanks
Re: Duplicate Report [message #355660 is a reply to message #355658] Sun, 26 October 2008 23:08 Go to previous message
Barbara Boehmer
Messages: 8633
Registered: November 2002
Location: California, USA
Senior Member
someuserhavingdoubt wrote on Sun, 26 October 2008 21:00
I was wondering if this could be achieved without using PARTITION BY.


Not if you need to list columns, like sourcesystem, that can have different values and still constitute a duplicate based on other columns.

Previous Topic: To extract the previous month from current month?
Next Topic: Convert Access commands into Oracle
Goto Forum:
  


Current Time: Tue Dec 06 04:34:00 CST 2016

Total time taken to generate the page: 0.12484 seconds