Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Is there a faster/better way to do this?

Re: Is there a faster/better way to do this?

From: Galen Boyer <galenboyer_at_hotpop.com>
Date: 3 Apr 2002 22:02:38 -0600
Message-ID: <uhemsaqoh.fsf@rcn.com>


On 3 Apr 2002, f1fteen_at_hotmail.com wrote:
> Hi all,
>
> I have a puzzle for all you performance/tuning gurus out there!
>
> I have a table with around 4 million records in it called CUSTOMERS.
>
> Each record represents a customer with many columns that hold things
> like forename, surname, address, post code, telephone number etc.
>
> The customer records have come from many different sources and as a
> result, there are in most cases, many customer records representing
> only one actual "real life" customer.
>
> What I am trying to do is write a routine that will group together
> records that are likely to be the same customer.
>
> The way I have set about this is to write an SQL statement that
> SELECTs across the whole CUSTOMERS table (4 million records) using the
> GROUP BY clause on 7 key columns. The results of this are then
> inserted into a MATCHED table for all groups of 2 or more records and
> flagged as match type "1" (best).
>
> This is then repeated, but this time the GROUP BY is on only 6 of the
> above key columns. These results are stored in the MATCHED table as
> match type "2".

But the second match is the same match as the first one, just rolled up by one field. You won't get anymore information from this set of data that the group by of the 7 columns will give you. Actually, this will be less. Here's how to see what I'm talking about.

insert into matched
select fld1,fld2,fld3,fld4,fld5,fld6,fld7,1,count(*) from customers
group by fld1,fld2,fld3,fld4,fld5,fld7,1 having count(*) > 1;

insert into matched
select fld1,fld2,fld3,fld4,fld5,fld6,null,2,count(*) from customers
group by fld1,fld2,fld3,fld4,fld5,fld6,2 having count(*) > 1;

select sum(cnt) from matched where match_type = 1; select sum(cnt) from matched where match_type = 2;

These counts should equal.

What you need is some type of regexp or incomplete hashing algorithm or something.

-- 
Galen deForest Boyer
Sweet dreams and flying machines in pieces on the ground.
Received on Wed Apr 03 2002 - 22:02:38 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US