Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates

From: kroger <kroger_at_vp.pl>
Date: Wed, 6 Sep 2006 18:17:19 +0200
Message-ID: <edmsal$a0v$1@news.onet.pl>

> having done similar work for a client, you will be best off using
> PL/SQL or other procedural language (I also used PERL on that project.)

That's what I'm going to do. I'm not limited to SQL, I will finally do the thing in PL/SQL
>
> IF you must use only a SQL solution, then you need some preparation
> work. Assuming this is a spelling issue, then you create a spelling
> correction table. One column is the misspelling, and the second column
> is the correct spelling. As long as the mispelling always maps to only
> one correct spelling then this works. Otherwise you need other
> intervention to bring more context into play, which means at least more
> columns to the spelling table. (Consider that if the data you are
> trying to match is a set of street names, then the context of the
> street name is the city. So City would be a column in the spelling
> table and in your query. For example the misspelled street FAR is FAIR
> in A city, but it is FARE in B city.)

In fact my context is very similar:
name-city-country where city+country make an additional context for name.

I was asking for distinguishing names only as it seems the simplest, but what I need to do basing on the business requirement is sorting out candidate duplicates in given city and country + some more conditions ;)

Since I'm dealing with ANY sort of naming - companies, business areas, private people - checking against dictionary would be a kind of suicide ;)

Thanks and BR,
Kroger Received on Wed Sep 06 2006 - 11:17:19 CDT