Re: Hash Join vs. Nested Loops
Date: Wed, 26 Feb 2003 09:58:39 -0800
Message-ID: <3e5d0050$1_at_nntp0.pdx.net>
Mikito Harakiri wrote:
 >... Suppose both tables are cached into the memory....
 > Is Nested Loop algorithm inherently "nonoptimal" compared
 > to Hash Join?
Depends on the data as always, but check out the following as a
very rough cut:
Assume len(Table1) = 100,000
and len(Table2) = 1,000,000
Data randomly distributed, hash produces 1 of 1000 values.
Then HashTab = gentable(Table1) in 100,000 H+K ops (K=hashcost, K more)
and  Scan(Table2) is 1,000,000 times the following:
   SubTable = HashContents(hash(el))   # SubTable size: 100
   Select from SubTable where matches  # However many, but always in
                                       # any correct answer
We do 1,100,000 hashes total, overhead for table,
but 1,100,000 H + 100,000 K + 100,000,000 I
        about=   101,200,000 (I?K?H?) ops
vs. Nested Loops:
100,000 * 1,000,000 I ops = 100,000,000,000 I ops So you'd have to have overhead of the more complex ops in Hash be about 1000 times the cost of the nested loops ops.
-Scott David Daniels
Scott.Daniels_at_Acm.Org
Received on Wed Feb 26 2003 - 18:58:39 CET
