Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.tools -> Suggestions for Advanced Detection of Duplicates
I appologize for not clarifying exactly what I was looking for in my previous post.
I was hoping someone could either provide some of their strategies or suggest resources that covered the detection of duplicate records in a database. Now the fields in these records may not be EXACTLY identical, but still refer to the same person.
For example, a possible scenerio would be:
person A enters his information:
Name: Mike Wilkerson
Address: 810 Somewhere St. Apt. 100
City: Dallas
State: Texas
Phone: (512) 555-5555
Several weeks later the person returns but, having forgotten his
username/password, he creates another new record so that he can access
the app/site/whatever again (the jerk). This time, however, he enters:
Name: Mike Wilkerson
Address: 810 Somewhere #100
City: Dallas
State: Texas
Phone: 512-555-5555
Now, it's the same person, but if you were to attempt to use the
following query to detect if this user was already in the database, it
would no return any rows:
SELECT name, address, city, state, phone
from tblpeople
where name = 'Mike Wilkerson'
and address = '810 Somewhere #100'
and city = 'Dallas'
and state = 'Texas'
and phone = '512-555-5555'
There are many different ways users could enter in their information. In the above example I could have put Michael instead of Mike or TX instead of Texas, etc...
I suspect that the detection of duplicates in a database has to be a serious concern for anyone who wants to track how many distinct customers/users/etc that they have. Budgets, funding, etc could be based off of these kinds of counts, so I was hoping that there were books covering methods for this kind of thing. I've already looked at using TRANSLATE to eliminate all spaces and punctuation from the phone field. For the example above they would both get translated to just 5125555555 and it would be a strong indication that they were for the same individual. I don't really expect any method to catch 100% of the duplicates, just some documentation/suggestions that might cover 1) things to consider/watch out for and 2) example methods/strategies for detecting these kinds of duplications.
Thanks for your help,
Michael
Received on Mon Apr 10 2000 - 00:00:00 CDT
![]() |
![]() |