Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.tools -> Re: How to find Duplicate records

Re: How to find Duplicate records

From: freek <freekdhooge_at_hotmail.com>
Date: Tue, 24 Jul 2001 19:56:26 GMT
Message-ID: <Kpk77.25309$Xy1.5760401@afrodite.telenet-ops.be>

You will have to create a kind of key, that will give you an id which gives you an id
which records look similar (the key is the same). after doing that you compare various fields off those records with each other giving points
for each field that match. The amount off points for a field depends on the importance of that field (ex. a matching telephone field is more sure than a matching street). After setting a max and min boundary you can see which records are doubles, which aren't and which are the maybe's. The initial key and the setting off the points and boundaries are process of trial and error.

example

for the key we take the first 4 letters off the surname, the first 2 of the firstname and the first 5 of the street. we see that the key is the same for the records 1, 2 and 3

                                                                       5 and
6
so instead of having to compare each record to each other record in the database, we just
compare record 1,2 and 3 with each other and record 5 and 6. when we then compare the fields surname and firstname and street (I wil take only records 1,2 and 3,
we find that for record 1 and 2 there are 2 fields matching for record 1 and 3 there are 0 fields matching for record 2 and 3 there are 0 fields matching

conclusion: record 1 and 2 is a double

                 record 3 is unique

surname         firstname             street                        house
key
dhooge              tim                   aardeken                2
dhootiaarde
dhooge              tin                    aardeken                2
dhootiaarde
dhoor                timothy            aardeweg                9
dhootiaarde
dhooghe            jan                   aardeken
dhoojaaarde
jansen               jan                    nijverheid               15
jansjanijve
janssen             jan                    nijverheid                15
jansjanijve

"Shahid Mahmood" <Shahid.Mahmood_at_team.telstra.com> wrote in message news:2a89f9.0107232026.481337f_at_posting.google.com...
> Hi
>
> I am trying to find out the duplicate records in the table which
> contains over 10 million records. Could you please let me know the
> easiest and quickest way to get it done.
>
> Regards
>
>
> Shahid
Received on Tue Jul 24 2001 - 14:56:26 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US