Re: Join Two Extremely Large Tables

From: Alex Filonov <afilonov_at_yahoo.com>
Date: 17 Apr 2003 09:40:08 -0700
Message-ID: <336da121.0304170840.1419ef10@posting.google.com>

"Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote in message news:<b7ll20$qag$1$8300dec7_at_news.demon.co.uk>...
> How are you planning to use the 3-4 Billion rows
> that you generate ? This is probably a critical
> issue than the mechanism for generating them.
>
> If you hope to do some sort of grouping or
> ordering then that may dictate the method
> you HAVE to use on the join.
>
> Why do you think your index would be more
> than 10 levels deep ? How long is the index
> key ? At a generous 80 bytes, with an 8K
> block size, you get 100 entries per node so:
>

You mean when index was just rebuilt? In real life indexes don't have 100% fill. It's more like 2/3, and may be much worse sometimes.

> Height = 1 => 100 table rows
> Height = 2 => 10,000 table rows
> Height = 3 => 1,000,000 table rows
> Height = 4 => 100,000,000 table rows
> Height = 5 => 10,000,000,000 table rows
>
> In real life, your index is more likely to be
> height 4.
>
> Remember that the ordering of the result set
> from a hash join is effectively governed by
> the second table. If you want the 15 B rows
> for each A row in one shot to process them
> together, then you can't do a hash join unless
> you are prepared to do a sort of the full result
> set afterwards.
>
>
> --
> Regards
>
> Jonathan Lewis
> http://www.jlcomp.demon.co.uk
>
> The educated person is not the person
> who can answer the questions, but the
> person who can question the answers -- T. Schick Jr
>
>
> One-day tutorials:
> http://www.jlcomp.demon.co.uk/tutorial.html
>
> ____UK_______April 22nd
> ____USA_(FL)_May 2nd
> ____Denmark__May 21-23rd
> ____Sweden___June
> ____Finland__September
> ____Norway___September
>
> Three-day seminar:
> see http://www.jlcomp.demon.co.uk/seminar.html
> ____UK_(Manchester)_May x 2
> ____Estonia___June (provisional)
> ____Australia_June (provisional)
> ____USA_(CA, TX)_August
>
> The Co-operative Oracle Users' FAQ
> http://www.jlcomp.demon.co.uk/faq/ind_faq.html
>
>
> "Bass Chorng" <bchorng_at_yahoo.com> wrote in message
> news:bd9a9a76.0304161946.4c0f5cbd_at_posting.google.com...
> > I have 2 tables A and B. Table A contains 86 million rows, about 4.5
> GB
> > in size. Table B contains 1.2 billion rows, about 25 GB in size.
> >
> > For each row in table A, I need to query table B, which is indexed,
> > returning in average of 15 rows and process them.
> >
> > If I use foreach cursor and loop thru table A, the result will be
> > probably at least 3-4 billion IOs. Since in reality, my index depth
> > is probably far more than 10 levels, I would expect total IOs to be
> > near 20 billions.
> >
> > Basically, this is an enormous index range scan.
> >
> > The # of IO is not a problem, but the time it takes is. This process
> > would take 6-7 hours.
> >
> > Would if be feasible for me to hash join these 2 giant tables
> > and not use index at all ? Is it faster this way ?
> > What about my temp segment ? I would either give it a
> > hash_area_size of 10 GB (not even sure if Oracle accepts that)
> > or prepare a temp segment that size, or my process would
> > fail.
> >
> > Your advice is appreciated.
Received on Thu Apr 17 2003 - 11:40:08 CDT