Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Re: de-dup process

Re: de-dup process

From: A Ebadi <ebadi01_at_yahoo.com>
Date: Mon, 22 Jan 2007 09:51:46 -0800 (PST)
Message-ID: <923995.78336.qm@web31208.mail.mud.yahoo.com>


Thanks to everyone's suggestions a few weeks ago regarding this de-dup issue. I just wanted to let everyone know that the solution we decided on was to go with hourly partitioning instead of daily which reduced our subset of data which had to be de-duped by 24. The app had to be modified slightly to accomplish this. Now, w are able delete out the dups (i.e. CTAS out the data we are keeping). We are able to run many of these de-dups in parallel also.    

  Again, I understand this doesn't scale to infinity, but will get us by for 12-18 months based on the volume estimates.    

  Thanks,
  Abdul

A Ebadi <ebadi01_at_yahoo.com> wrote:

    Putting a unique key constraint on it and loading it direct-path with dups will insert the dups into the table and make the index unusable, so I don't know how this could help us?   I don't want the dups inserted.    

  Thanks.

Tony van Lingen <tony.vanlingen_at_epa.qld.gov.au> wrote:   It may even be easier... You say "We are doing direct path load so no unique key indexes can be put on the table to take care of the duplicates". The utility guide (10gR2) however explicitly names unique constraints as a constraint that can be enforced during direct path loads:     Integrity Constraints All integrity constraints are enforced during direct path loads, although not necessarily at the same time. NOT NULL constraints are enforced during the load. Records that fail these constraints are rejected.   UNIQUE constraints are enforced both during and after the load. A record that violates a UNIQUE constraint is not rejected (the record is not available in memory when the constraint violation is detected). (Utilities, B14215-01 chapter 11).

Did you actually try this?

  Cheers,
Tony

A Ebadi wrote: We have a huge table (> 160 million rows) which has about 20 million duplicate rows that we need to delete. What is the most efficient way to do this as we will need to do this daily?   A single varchar2(30) column is used to identified duplicates. We could possibly have > 2 rows of duplicates.    

  We are doing direct path load so no unique key indexes can be put on the table to take care of the duplicates.    

  Platform: Oracle 10G RAC (2 node) on Solaris 10.    

  Thanks!      



  Need a quick answer? Get one in minutes from people who know. Ask your question on Yahoo! Answers.

  Disclaimer       

  WARNING: This e-mail (including any attachments) has originated from a Queensland Government department and may contain information that is confidential, private, or covered by legal professional privilege, and may be protected by copyright.       

  You may use this e-mail only if you are the person(s) it was intended to be sent to and if you use it in an authorised way. No one is allowed to use, review, alter, transmit, disclose, distribute, print or copy this e-mail without appropriate authority. If you have received this e-mail in error, please inform the sender immediately by phone or e-mail and delete this e-mail, including any copies, from your computer system network and destroy any hardcopies.       

  Unless otherwise stated, this e-mail represents the views of the sender and not the views of the Environmental Protection Agency.       

  Although this e-mail has been checked for the presence of computer viruses, the Environmental Protection Agency provides no warranty that all viruses have been detected and cleaned. Any use of this e-mail could harm your computer system. It is your responsibility to ensure that this e-mail does not contain and is not affected by computer viruses, defects or interference by third parties or replication problems (including incompatibility with your computer system).       

  E-mails sent to and from the Environmental Protection Agency will be electronically stored, managed and may be audited, in accordance with the law and Queensland Government Information Standards (IS31, IS38, IS40, IS41 and IS42) to the extent they are consistent with the law.       


        

  Want to start your own business? Learn how on Yahoo! Small Business.  

Cheap Talk? Check out Yahoo! Messenger's low PC-to-Phone call rates.
--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jan 22 2007 - 11:51:46 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US