Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.tools -> Re: Suggestions for Advanced Detection of Duplicates

Re: Suggestions for Advanced Detection of Duplicates

From: Connor McDonald <connor_mcdonald_at_yahoo.com>
Date: 2000/04/11
Message-ID: <38F32241.4998@yahoo.com>#1/1

Michael C. Wilkerson wrote:
>
> I appologize for not clarifying exactly what I was looking for in my
> previous post.
>
> I was hoping someone could either provide some of their strategies or
> suggest resources that covered the detection of duplicate records in a
> database. Now the fields in these records may not be EXACTLY identical,
> but still refer to the same person.
>
> For example, a possible scenerio would be:
> person A enters his information:
> Name: Mike Wilkerson
> Address: 810 Somewhere St. Apt. 100
> City: Dallas
> State: Texas
> Phone: (512) 555-5555
>
> Several weeks later the person returns but, having forgotten his
> username/password, he creates another new record so that he can access
> the app/site/whatever again (the jerk). This time, however, he enters:
> Name: Mike Wilkerson
> Address: 810 Somewhere #100
> City: Dallas
> State: Texas
> Phone: 512-555-5555
>
> Now, it's the same person, but if you were to attempt to use the
> following query to detect if this user was already in the database, it
> would no return any rows:
> SELECT name, address, city, state, phone
> from tblpeople
> where name = 'Mike Wilkerson'
> and address = '810 Somewhere #100'
> and city = 'Dallas'
> and state = 'Texas'
> and phone = '512-555-5555'
>
> There are many different ways users could enter in their information.
> In the above example I could have put Michael instead of Mike or TX
> instead of Texas, etc...
>
> I suspect that the detection of duplicates in a database has to be a
> serious concern for anyone who wants to track how many distinct
> customers/users/etc that they have. Budgets, funding, etc could be
> based off of these kinds of counts, so I was hoping that there were
> books covering methods for this kind of thing. I've already looked at
> using TRANSLATE to eliminate all spaces and punctuation from the phone
> field. For the example above they would both get translated to just
> 5125555555 and it would be a strong indication that they were for the
> same individual. I don't really expect any method to catch 100% of the
> duplicates, just some documentation/suggestions that might cover 1)
> things to consider/watch out for and 2) example methods/strategies for
> detecting these kinds of duplications.
>
> Thanks for your help,
> Michael

Check out the SOUNDEX function - it may be (part of) a solution to sort out pseudo-duplicates.

HTH

-- 
===========================================
Connor McDonald
http://www.oracledba.co.uk

We are born naked, wet and hungry...then things get worse
Received on Tue Apr 11 2000 - 00:00:00 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US