Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> slightly OT - cleaning up "dirty" keys?

slightly OT - cleaning up "dirty" keys?

From: bugbear <bugbear_at_trim_papermule.co.uk_trim>
Date: Wed, 01 Mar 2006 13:44:53 +0000
Message-ID: <4405a556$0$3558$ed2619ec@ptn-nntp-reader03.plus.net>


If (!) one had a database where a primary key field (e.g. name) had been used for a few years, and the DB had serveral "variant" spellings
(e.g. "J Smith", "John Smith", "J K Smith", "J. Smith"
all for the same induividual) does
anyone know of a tool that would identify "likely" groupings.

One would like 2 names with a small
"edit distance"
http://en.wikipedia.org/wiki/Edit_distance to be put together, for human checking.

But if one had 100,000 keys, this would
involve (in a naive implementation)
10^10 comparisons.

Does anyone know a good algorithm
(an/or heuristic if this is NP-hard)

   BugBear Received on Wed Mar 01 2006 - 07:44:53 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US