Re: When does a merge join beat a hash join?

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Wed, 5 Mar 2003 12:48:22 -0000
Message-ID: <b44rn4$dug$1$830fa7b3@news.demon.co.uk>
Notes in-line:
--
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

Coming soon one-day tutorials:
Cost Based Optimisation
Trouble-shooting and Tuning
Indexing Strategies
(see http://www.jlcomp.demon.co.uk/tutorial.html )

____UK_______March 19th
____UK_______April 8th
____UK_______April 22nd

____USA_(FL)_May 2nd


Next Seminar dates:
(see http://www.jlcomp.demon.co.uk/seminar.html )

____USA_(CA, TX)_August


The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html


"Richard Foote" <richard.foote_at_bigpond.com> wrote in message
news:iWl9a.60828$jM5.155756_at_newsfeeds.bigpond.com...


>

>    however if it's a real big mamma of a table and the

> poor hash_area_size just doesn't cut it and a massive amount of I/O

to the


> temporary tablespace results, perhaps it would be better to suffer

the


> overhead of reading/sorting the table once.


But for most people, hash_area_size = 2 * sort_area_size because
that's the way the defaults work. So if a table (rowsource) is too
large to hash entirely into memory, then it's going to be too big
to sort in memory.

Given that the passes for the one-pass hash are:
    Read first table - hash and partition to disc if necessary
    Read second table - partition and write to disc
    Read pairs of matching partitions and hash

Compared to the one pass sort:
    Read first table write to disc in runs
    Re-read runs to merge
    Write sorted table to disc

    Read second table write to disc in runs
    Re-read runs to merge
    Write sorted table to disc

    Read first and second tables and merge


So for hash join, each table gets:
    Read, write, read
for sort join each table gest
    read write, read, write, read


Apart from the fact that sort/merge can take
advantage of pre-sorted data (from previous
passes, or indexed access paths) it seems
that a simple hash join OUGHT to beat a merge
join every time.

Of course, there are advantages that appear
when multiple joins have to take place in
sequence; and we have to remember that
hash joins require an equality condition,
whereas merge joins can handle range-based
joins.




>

> BTW, one of the nice things with the PGA_AGGREGATE_TARGET (when one

> eventually gets it to work) is that the "tuning" of the

hash_area_size is


> implicit and potentially a lot more memory is available "safely" for

such


> operations to occur efficiently.

>


It also deals with Oracle's habit of being over-generous
with allocating of memory for the hash_area_size - which
can be very helpful if you have to do a six-way hash join.
Received on Wed Mar 05 2003 - 06:48:22 CST