Re: Hash Join vs. Nested Loops

From: Mikito Harakiri <mikharakiri_at_ywho.com>
Date: Wed, 26 Feb 2003 10:42:22 -0800
Message-ID: <NZ77a.10$Us3.116_at_news.oracle.com>


"Scott David Daniels" <Scott.Daniels_at_Acm.Org> wrote in message news:3e5d0050$1_at_nntp0.pdx.net...
> Mikito Harakiri wrote:
> >... Suppose both tables are cached into the memory....
> > Is Nested Loop algorithm inherently "nonoptimal" compared
> > to Hash Join?
> Depends on the data as always, but check out the following as a
> very rough cut:
>
> Assume len(Table1) = 100,000
> and len(Table2) = 1,000,000
> Data randomly distributed, hash produces 1 of 1000 values.
> Then HashTab = gentable(Table1) in 100,000 H+K ops (K=hashcost, K more)
> and Scan(Table2) is 1,000,000 times the following:
> SubTable = HashContents(hash(el)) # SubTable size: 100
> Select from SubTable where matches # However many, but always in
> # any correct answer
> We do 1,100,000 hashes total, overhead for table,
> but 1,100,000 H + 100,000 K + 100,000,000 I
> about= 101,200,000 (I?K?H?) ops
> vs. Nested Loops:
> 100,000 * 1,000,000 I ops = 100,000,000,000 I ops

The last equation is not correct: my assumption was that we always use Indexed Nested Loops. No database goes into production without DBA creating join indexes for potentially large tables. (Even if join index is missing it can be created on the fly in about N*Log(N) operations, where N is the size of the table -- this process would be a counterpart of creating hash table). With join index the NL algorithm is very similar to HJ: the execution don't have to fully scan the inner table, it navigates the index and finds a reference to the matching record in roughly log(N) operations.

I can't write the formula for indexed nested loops cost unless some assumption is made about the selectivity of the join predicate. In the extreme case when tables A and B are in 1-to-1 relationship we have about N*Log(N) total operations. It is easy to see that Sort Merge Join would have aprroximately the same performance:

N*Log(N)  for sorting table A +
N*Log(N)  for sorting B +
2*N for merging =
N*Log(N) total

("O capital" ariphmetics is cute, isn't it;-) Received on Wed Feb 26 2003 - 19:42:22 CET

Original text of this message