Re: Deduplication of records

From: Lee <lee_at_jamtoday.com>
Date: Wed, 02 Jan 2002 16:32:29 -0500
Message-ID: <iut63u4euul57dfd8uevq03ev9ot9bkkco_at_4ax.com>


Well of course, if you have something such as the social security number, an employee id number, an account number, something like a "primary key", then there's not much of a problem.

But if you have to identify records which are "highly likely" (whatever "highly" means) to be duplicates, then you may have wandered into deeper waters.

If two names are spelled "almost" the same, ARE they the same? What about if the name is the same and the address also the same? What other data do you have that might be able to discriminate between duplicates and non duplicates? How heavily does each factor weigh in the decision? What is the confidence of your classification (dup vs non dup) ? What is the best way to combine the data you DO have to get the best possible confidence of calling a dup a dup and not calling a dup a non dup?

This is well known problem which occurs in all sort (no pun) of situations such as medical record matching. Here's a bunch of birth records, there's a bunch of vaccination records. Which belongs to which and which kids born that day have no "matching" vaccination records?

See for example www.choicmaker.com

Or fire up your search engine and look for "record matching"

On Tue, 01 Jan 2002 08:12:24 GMT, "Alex Pilling" <alex_at_tabbybadger.com> wrote:

>You use an attribute of person that is guaranteed to be unique as the
>primary key.
>
>For instance, here in the US, the Social Security Number is almost
>invariably used as primary key for person tables.
>
>Alex Pilling FIAP MISM AIDPM
>Business Systems Architect
>alex_at_tabbybadger.com
>
>"Raymond de Ligt" <rdl_at_dss.nl> wrote in message
>news:Xns911BB06EEA9AArdldssnl_at_213.222.27.9...
>>
>> Hi Everybodu, does anyone have some information about how to deduplicate
>> records in databases, example how does one prevent that he gets the same
>> person Mr. Johnson more than once in the system , once as Mr. Johnson and
>> the other time as Mr. Johnsson.
>>
>> I'm doing this on a Magic system and a oracle Dbase
>
Received on Wed Jan 02 2002 - 22:32:29 CET

Original text of this message