Re: Join Two Extremely Large Tables

From: Bass Chorng <bchorng_at_yahoo.com>
Date: 17 Apr 2003 12:00:07 -0700
Message-ID: <bd9a9a76.0304171100.202a75ee@posting.google.com>

"Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote in message news:<b7ll20$qag$1$8300dec7_at_news.demon.co.uk>...
> How are you planning to use the 3-4 Billion rows
> that you generate ? This is probably a critical
> issue than the mechanism for generating them.
>

The output is sent to many different unix boxes ( output file is split into many smaller ones ) to generate emails.

> If you hope to do some sort of grouping or
> ordering then that may dictate the method
> you HAVE to use on the join.
>
> Why do you think your index would be more
> than 10 levels deep ? How long is the index
> key ? At a generous 80 bytes, with an 8K
> block size, you get 100 entries per node so:
>
> Height = 1 => 100 table rows
> Height = 2 => 10,000 table rows
> Height = 3 => 1,000,000 table rows
> Height = 4 => 100,000,000 table rows
> Height = 5 => 10,000,000,000 table rows
>
> In real life, your index is more likely to be
> height 4.

I validated the index in QA a few weeks back, the height was 7. The data was refreshed at least 6 months ago. We use 8K block size. One problem is this table gets deleted and the index has never been rebuilt. We are running 8.1.7. Can't do online rebuild.

>
> Remember that the ordering of the result set
> from a hash join is effectively governed by
> the second table. If you want the 15 B rows
> for each A row in one shot to process them
> together, then you can't do a hash join unless
> you are prepared to do a sort of the full result
> set afterwards.

We do need the 15 B rows processed in one shot. I figure sorting the output file would probably be more costly. So my only option is nested loop then ? (Thats pretty much what we are doing now, except it is artificial. We do not join, we use foreach A, do B.. ).

Thanks!

>
>
> --
> Regards
>
> Jonathan Lewis
Received on Thu Apr 17 2003 - 14:00:07 CDT