Re: Merge/Purge

From: Tom Leylan <tleylan_at_leylan.com>
Date: Mon, 27 Sep 1999 20:38:35 -0400
Message-ID: <37f00e18_1_at_news1.prserv.net>


Gerry Thorpe <gerrythorpe_at_hotmail.com> wrote...

> I am looking for information about and software to do Merge/Purge.

I think it is safe to say, this is a very specialized field (if you want to do it right.)

> A typical situation is as follows: John Doe, may be in one
> data source as John Doe, another as Jonny Doe, another
> as J. Doe, Johnathan Doe, etc. I would like to be able to
> take the data from all of those sources and construct a single
> row of data that aggregates the data from all those rows.

I'm not sure you actually want to aggregate the data. If "John Doe" uses "Jonny" these days and really lives in Philadelphia then the fact that "John Doe" appears 3 times as "St. Paul, MN" is of no value, he no longer lives there.

> What I need is a system that knows that John Doe, Jonny Doe,
> J. Doe and Johnathan Doe are all likely the same person.

And the clue that they are the same person would be?

> Any leads would be appreciated.

  1. Search the Internet for what not to do.
  2. Expect a multiple-pass solution.
  3. Expect a solution based upon specific knowledge of the domain.
  4. Accept "reasonable" solutions.
  5. Document your assumptions.
  6. Consider posting your ideas (as they arrive) here, before you merge all the "John Smith" records in Los Angeles into a single row.

Tom

Oh... visit www.deja.com and search for "database duplicates"

--
---> Learn a little something at http://www.leylan.com
Received on Tue Sep 28 1999 - 02:38:35 CEST

Original text of this message