Re: Hash Join vs. Nested Loops
Date: Wed, 26 Feb 2003 09:58:39 -0800
Message-ID: <3e5d0050$1_at_nntp0.pdx.net>
Mikito Harakiri wrote:
>... Suppose both tables are cached into the memory....
> Is Nested Loop algorithm inherently "nonoptimal" compared
> to Hash Join?
Depends on the data as always, but check out the following as a
very rough cut:
Assume len(Table1) = 100,000
and len(Table2) = 1,000,000
Data randomly distributed, hash produces 1 of 1000 values.
Then HashTab = gentable(Table1) in 100,000 H+K ops (K=hashcost, K more)
and Scan(Table2) is 1,000,000 times the following:
SubTable = HashContents(hash(el)) # SubTable size: 100 Select from SubTable where matches # However many, but always in # any correct answerWe do 1,100,000 hashes total, overhead for table,
but 1,100,000 H + 100,000 K + 100,000,000 I
about= 101,200,000 (I?K?H?) ops
vs. Nested Loops:
100,000 * 1,000,000 I ops = 100,000,000,000 I ops So you'd have to have overhead of the more complex ops in Hash be about 1000 times the cost of the nested loops ops.
-Scott David Daniels
Scott.Daniels_at_Acm.Org
Received on Wed Feb 26 2003 - 18:58:39 CET