Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: slightly OT - cleaning up "dirty" keys?

Re: slightly OT - cleaning up "dirty" keys?

From: Joel Garry <joel-garry_at_home.com>
Date: 3 Mar 2006 11:38:53 -0800
Message-ID: <1141414733.240545.310130@t39g2000cwt.googlegroups.com>

bugbear wrote:
> EdStevens wrote:
> > And that means looking at the human factors as well. Not knowing anything about
> > the application, I'm wonder WHY the operators are failing to find an
> > existing record and end up createing a duplicate.
>
> There's an automatic feed from external sources, which
> include all the "secondary" data (address, phone etc).
>
> If the (primary)name doesn't match, a new record is created,
> from all the fields in the external feed.
>
> Combine this with multiple external sources,
> run for 5 years, and you have a mess.
>
> That's where I'm STARTING.
>
> Now I have to "make it better".

Charge by the hour! :-)

Ed has good ideas, I'd add there are really a number of disparate problems here that should be addressed individually. To start, you need to define the range of possibilities, based on your feeds (I'm not asking you to post them!).

So for example, my initial Having suggestion would work on a subset of dirt, those where the external feed happens to give the same address (or phone number or whatever) when creating a row with a new name. It would also work for identical names - err, how do you handle identical names from different people, anyways?

The pseudocode in the edit distance link you gave would translate quite simply to awk, which I would highly recommend for these sorts of cleansings, I've always been amazed at how efficient it is, from back in the olden days, and it is a language designed for this type of thing. If you don't already know it, see the awk book by Aho Kernighan and Weinberger (and I'm sure there are a few others if you need).

The goal of course is to winnow down to a list of possibilities that someone can look at and say yeah or nay. Then make a more reasonable key!

jg

--
@home.com is bogus.
http://64.233.179.104/custom?q=cache:OUIfJJWtJI8J:www.phpbbserver.com/phpbb/viewtopic.php%3Ft%3D189%26mforum%3Ddizwellforum%26+natural+keys&hl=en&gl=us&ct=clnk&cd=1&ie=UTF-8
Received on Fri Mar 03 2006 - 13:38:53 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US