Re: HASH vs NESTED LOOP join: when is one better than the other ?

From: William Robertson <william.robertson_at_bigfoot.com>
Date: 22 Mar 2006 16:46:51 -0800
Message-ID: <1143074811.097993.164960@i40g2000cwc.googlegroups.com>

Spendius wrote:
> <snip>
> Can the algorithm be simply summarized?

In the nested-loops approach, for each row in driving table A you search table B for a corresponding value. This will typically (though not necessarily) be an index search (visit the index, retrieve the rowid, visit the table - potentially accessing a block you have already visited in an earlier loop iteration).

In the hash-join approach, as I understand it, you first read the smaller table and construct an in-memory (if possible, subject to PGA resources etc) array using hash values generated from the specified key. Then you work through the second table applying the same hash function to the corresponding key values, allowing you to match rows without a sort but at some cost in memory and CPU.

Typically the second approach is significantly more efficient when joining two large sets of rows. However if you have a relatively small number of values to search for, you might as well fetch them directly (via nested loops) and avoid the overhead of building large hashes.

There is also the sort-merge approach (largely superseded by hash joins since 7.3) in which two large tables are matched by sorting both sets of keys. Received on Wed Mar 22 2006 - 18:46:51 CST