Re: Sql - Best way?

From: <holderm_at_mailexcite.com>
Date: Wed, 24 Jun 1998 00:32:32 GMT
Message-ID: <6mphf1$cot$1@nnrp1.dejanews.com>

In article <6mlqh2$os7$1_at_usenet01.srv.cis.pitt.edu>, "Lisa Lewis" <lewis_at_med.pitt.edu> wrote:
>
> Can anyone give me any insight as to what might be the best way to do the
> following: On a table with 500,000 rows, compare every row to every other
> row in the table (500,000X500,000 comparisons) in order to do some
> similarity analysis between rows of a table. Would it be more efficient to
> create two cursors each stepping through the complete table or create 1
> cursor that does an unconstrained join? What are the implications of
> creating a cursor that would have 500,000x500,000 rows in it? Any
> recommendations would be GREATLY appreicated!
>
> Lisa
>
>

I'm not sure just what kind of similarity analysis you're doing but you could try using the HAVING clause to filter what gets returned:

SELECT mt1.num_col1 "Before col1"
     , mt1.num_col2 "Before col2"
     , mt2.num_col1 "After col1"
     , mt2.num_col2 "After col2"

FROM my_table mt1

, my_table mt2
WHERE mt2.primary_key > mt1.primary_key -- this gives half of the Cartesian product
HAVING mt1.num_col1 - mt2.num_col1 < (pick a tollarance)

The WHERE clause does this for you:

| 1,1 1,2 1,3 ... 1,N |
| 2,1 2,2 2,3 ... 2,N |
| ............... ... |
| N-1,1 N-1,2 ... N-1,N |
| N,1 N,2 N,3 ... N,N |

the only rows returned are 1,2 (row 1 compared to 2); 1,3; ... 1,N ... up to N- 1,N. For N=500,000 that's 499,999 + 499,998 + ... + 2 = N (N-1) / 2 = 124,999,750,000

If you're going to send 125 billion rows to PL/SQL you will want to further reduce this by the HAVING clause.

Hopefully you have a big SGA (System Global Area) so that the whole table can reside in memory. You could look at the number of bytes/row and ask the DBA how big the SGA is. If the table fits in memory then the join shouldn't be too bad (i.e. it might take a few hours). The big hit would be sending those 125 billion rows to PL/SQL for analysis, if you can eliminate most of them with the HAVING clause you might get something useful.

Hope that helps!

-----== Posted via Deja News, The Leader in Internet Discussion ==----- http://www.dejanews.com/ Now offering spam-free web-based newsreading Received on Tue Jun 23 1998 - 19:32:32 CDT