Home » SQL & PL/SQL » SQL & PL/SQL » delete duplicates based on a combination of fields
delete duplicates based on a combination of fields [message #10606] Tue, 03 February 2004 05:35 Go to next message
shad
Messages: 50
Registered: October 2000
Member
Hi all

-- I have a table called vouchers with about 33 000 000 records. Here is the description of fields.

SQL> desc uif.vouchers
 Name                                      Null?    Type
 ----------------------------------------- -------- -------------------

 VOUCHER_ID                                NOT NULL VARCHAR2(15)
(1) REF_NO                                    NOT NULL VARCHAR2(7)
 PAYMENT_TYPE                                       CHAR(1)
(1) DOCTYPE                                   NOT NULL VARCHAR2(30)
 (3) PERIOD_START                              NOT NULL VARCHAR2(4)
 (4) PERIOD_END                                         VARCHAR2(4)
 CODE_START                                         NUMBER(3)
 CODE_END                                           NUMBER(3)
 NUMBER_CONTRIB                                     NUMBER(10)
 EARNINGS_TOTAL                                     NUMBER(13,2)
 VOUCHER_AMOUNT                                     NUMBER(13,2)
(5)  VOUCHER_ALLOC_AMOUNT                               NUMBER(13,2)
 TOTAL_EMPLOYEE_EARNINGS                            NUMBER(13,2)
 EMPLOYER_CONT_PAYABLE                              NUMBER(11,2)
 EMPLOYEE_CONT_PAYABLE                              NUMBER(11,2)
 UF3_COMPL_DATE                                     DATE
 UF3_PRINTED_DATE                                   DATE
 SYSREGUSER                                NOT NULL VARCHAR2(50)
 SYSREGDATE                                NOT NULL DATE
 SYSLASTUSER                                        VARCHAR2(50)
 SYSLASTDATE                                        DATE
 STATUS                                             CHAR(1)
 ORIGIN                                             VARCHAR2(30)
 DOCNUM                                             NUMBER(6)
 Y2K_PERIOD_START                                   VARCHAR2(6)
 Y2K_PERIOD_END                                     VARCHAR2(6)
 CTRON_ID                                           VARCHAR2(15)
 CNV_STAT                                           CHAR(1)
(6)  TEAM                                               VARCHAR2(1)
 RECEIPTNO                                          VARCHAR2(15)
 VOUCHER_ID_REF                                     VARCHAR2(15)
 JOUR_ID                                            VARCHAR2(15)
 LINENO                                             NUMBER(3)
 SUM_STAT                                           VARCHAR2(1)

à I need to delete duplicates using the combination of the fields REF_NO, DOCTYPE,PERIOD_START,PERIOD_END,VOUCHER_ALLOC_AMOUNT and TEAM as the basis for this.

 -à I ran the following delete statement to delete duplicates but it has been running for the past three weeks.

delete   from vouchers a



where    rowid < (



         select  max(rowid)



         from    vouchers b



         where                 b.REF_NO                             = a.REF_NO



                 and                 b.DOCTYPE                           = a.DOCTYPE



                 and                 b.PERIOD_START                               = a.PERIOD_START



                 and                 b.PERIOD_END                    = a.PERIOD_END



                 and                 b.VOUCHER_ALLOC_AMOUNT        = a.VOUCHER_ALLOC_AMOUNT



                 and                 b.TEAM                                  = a.TEAM); 



 

 à Is there something wrong here?

 

 

 

 

 

 

 

 

 

 

 

 



 

 

 

 

 

 
Re: delete duplicates based on a combination of fields [message #10607 is a reply to message #10606] Tue, 03 February 2004 05:47 Go to previous messageGo to next message
Sebastian S
Messages: 4
Registered: February 2004
Junior Member
If i got it right, you're deleting ALL the registers whose rowid is less than the highest value of rowid obtained from all the registers that match. Is that correct?
Re: delete duplicates based on a combination of fields [message #10609 is a reply to message #10606] Tue, 03 February 2004 06:46 Go to previous messageGo to next message
Sebastian S
Messages: 4
Registered: February 2004
Junior Member
If i got it right, you're deleting ALL the registers whose rowid is less than the highest value of rowid obtained from all the registers that match. Is that correct?
Re: delete duplicates based on a combination of fields [message #10614 is a reply to message #10606] Tue, 03 February 2004 10:36 Go to previous messageGo to next message
sverch
Messages: 582
Registered: December 2000
Senior Member
Look at today's Tom's example with analytical functions. It is awesome. And it is much much faster.
http://asktom.oracle.com/pls/ask/f?p=4950:8:383290::NO::F4950_P8_DISPLAYID,F4950_P8_CRITERIA:15258974323143,
Re: delete duplicates based on a combination of fields [message #10623 is a reply to message #10606] Wed, 04 February 2004 03:32 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9090
Registered: November 2002
Location: California, USA
Senior Member
Try analyzing your tables, indexes, and indexed columns before running your delete statement, so that the optimizer can use the statistics to choose the best execution plan:

ANALYZE TABLE vouchers
COMPUTE STATISTICS
FOR TABLE
FOR ALL INDEXES
FOR ALL INDEXED COLUMNS;
Re: delete duplicates based on a combination of fields [message #10633 is a reply to message #10606] Thu, 05 February 2004 13:10 Go to previous message
William Robertson
Messages: 1643
Registered: August 2003
Location: London, UK
Senior Member
This is untested, but I would start with something like:
DELETE vouchers
WHERE  rowid IN
       ( SELECT LEAD(rowid) OVER (
                PARTITION BY ref_no, doctype, period_start, period_end, voucher_alloc_amount, team
                ORDER BY ref_no, doctype, period_start, period_end, voucher_alloc_amount, team ) );
Previous Topic: Oracle 1i Screen Functions
Next Topic: Delete duplicate skills
Goto Forum:
  


Current Time: Thu Apr 25 01:11:36 CDT 2024