Re: Hash Join vs. Nested Loops

From: Scott David Daniels <Scott.Daniels_at_Acm.Org>
Date: Wed, 26 Feb 2003 09:58:39 -0800
Message-ID: <3e5d0050$1_at_nntp0.pdx.net>


Mikito Harakiri wrote:

 >... Suppose both tables are cached into the memory....

> Is Nested Loop algorithm inherently "nonoptimal" compared
> to Hash Join?
Depends on the data as always, but check out the following as a very rough cut:

Assume len(Table1) = 100,000
and len(Table2) = 1,000,000
Data randomly distributed, hash produces 1 of 1000 values. Then HashTab = gentable(Table1) in 100,000 H+K ops (K=hashcost, K more) and Scan(Table2) is 1,000,000 times the following:

   SubTable = HashContents(hash(el))   # SubTable size: 100
   Select from SubTable where matches  # However many, but always in
                                       # any correct answer
We do 1,100,000 hashes total, overhead for table,

   but 1,100,000 H + 100,000 K + 100,000,000 I

        about= 101,200,000 (I?K?H?) ops
vs. Nested Loops:

   100,000 * 1,000,000 I ops = 100,000,000,000 I ops So you'd have to have overhead of the more complex ops in Hash be about 1000 times the cost of the nested loops ops.

-Scott David Daniels
Scott.Daniels_at_Acm.Org Received on Wed Feb 26 2003 - 18:58:39 CET

Original text of this message