Is there a faster/better way to do this?

From: Paul <f1fteen_at_hotmail.com>
Date: 3 Apr 2002 11:27:24 -0800
Message-ID: <7b7286ec.0204031127.3c2172e6@posting.google.com>

Hi all,

I have a puzzle for all you performance/tuning gurus out there!

I have a table with around 4 million records in it called CUSTOMERS.

Each record represents a customer with many columns that hold things like forename, surname, address, post code, telephone number etc.

The customer records have come from many different sources and as a result, there are in most cases, many customer records representing only one actual "real life" customer.

What I am trying to do is write a routine that will group together records that are likely to be the same customer.

The way I have set about this is to write an SQL statement that SELECTs across the whole CUSTOMERS table (4 million records) using the GROUP BY clause on 7 key columns. The results of this are then inserted into a MATCHED table for all groups of 2 or more records and flagged as match type "1" (best).

This is then repeated, but this time the GROUP BY is on only 6 of the above key columns. These results are stored in the MATCHED table as match type "2".

This is again repeated using GROUP BY on 5/4/3 key fields.

The above method works OK, the only problem is that it takes a very long time to run - around 20 hours!!!

So, my question is, is this the best way to do this, or is there a more efficient way to code this that will improve speed?

I am running this on Oracle 8i/Solaris platform.

Thanks in advance for any help.

John. Received on Wed Apr 03 2002 - 13:27:24 CST