Re: Join Two Extremely Large Tables

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Thu, 17 Apr 2003 08:29:00 +0100
Message-ID: <b7ll20$qag$1$8300dec7@news.demon.co.uk>

How are you planning to use the 3-4 Billion rows that you generate ? This is probably a critical issue than the mechanism for generating them.

If you hope to do some sort of grouping or ordering then that may dictate the method you HAVE to use on the join.

Why do you think your index would be more than 10 levels deep ? How long is the index key ? At a generous 80 bytes, with an 8K block size, you get 100 entries per node so:

Height = 1 => 100 table rows
Height = 2 => 10,000 table rows
Height = 3 => 1,000,000 table rows
Height = 4 => 100,000,000 table rows
Height = 5 => 10,000,000,000 table rows

In real life, your index is more likely to be height 4.

Remember that the ordering of the result set from a hash join is effectively governed by the second table. If you want the 15 B rows for each A row in one shot to process them together, then you can't do a hash join unless you are prepared to do a sort of the full result set afterwards.

--
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

  The educated person is not the person
  who can answer the questions, but the
  person who can question the answers -- T. Schick Jr


One-day tutorials:
http://www.jlcomp.demon.co.uk/tutorial.html

____UK_______April 22nd
____USA_(FL)_May 2nd
____Denmark__May 21-23rd
____Sweden___June
____Finland__September
____Norway___September

Three-day seminar:
see http://www.jlcomp.demon.co.uk/seminar.html
____UK_(Manchester)_May x 2
____Estonia___June (provisional)
____Australia_June (provisional)
____USA_(CA, TX)_August

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html


"Bass Chorng" <bchorng_at_yahoo.com> wrote in message
news:bd9a9a76.0304161946.4c0f5cbd_at_posting.google.com...


> I have 2 tables A and B. Table A contains 86 million rows, about 4.5

GB


> in size. Table B contains 1.2 billion rows, about 25 GB in size.

>

> For each row in table A, I need to query table B, which is indexed,

> returning in average of 15 rows and process them.

>

> If I use foreach cursor and loop thru table A, the result will be

> probably at least 3-4 billion IOs. Since in reality, my index depth

> is probably far more than 10 levels, I would expect total IOs to be

> near 20 billions.

>

> Basically, this is an enormous index range scan.

>

> The # of IO is not a problem, but the time it takes is. This process

> would take 6-7 hours.

>

> Would if be feasible for me to hash join these 2 giant tables

> and not use index at all ?  Is it faster this way ?

> What about my temp segment ? I would either give it a

> hash_area_size of 10 GB (not even sure if Oracle accepts that)

> or prepare a temp segment that size, or my process would

> fail.

>

> Your advice is appreciated.

Received on Thu Apr 17 2003 - 02:29:00 CDT