Re: Hash Join vs. Nested Loops

From: Scott David Daniels <Scott.Daniels_at_Acm.Org>
Date: Wed, 26 Feb 2003 09:58:39 -0800
Message-ID: <3e5d0050$1_at_nntp0.pdx.net>

Mikito Harakiri wrote:

 >... Suppose both tables are cached into the memory....

> Is Nested Loop algorithm inherently "nonoptimal" compared
> to Hash Join?
Depends on the data as always, but check out the following as a very rough cut:

Assume len(Table1) = 100,000
and len(Table2) = 1,000,000
Data randomly distributed, hash produces 1 of 1000 values. Then HashTab = gentable(Table1) in 100,000 H+K ops (K=hashcost, K more) and Scan(Table2) is 1,000,000 times the following:

   SubTable = HashContents(hash(el))   # SubTable size: 100
   Select from SubTable where matches  # However many, but always in
                                       # any correct answer

We do 1,100,000 hashes total, overhead for table,

but 1,100,000 H + 100,000 K + 100,000,000 I

about= 101,200,000 (I?K?H?) ops
vs. Nested Loops:

100,000 * 1,000,000 I ops = 100,000,000,000 I ops So you'd have to have overhead of the more complex ops in Hash be about 1000 times the cost of the nested loops ops.

-Scott David Daniels
Scott.Daniels_at_Acm.Org Received on Wed Feb 26 2003 - 18:58:39 CET

This message: [ Message body ]
Next message: Mikito Harakiri: "Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?"
Previous message: stu: "Re: Merging records in a query on the fly -- a problem scenario"
In reply to Mikito Harakiri: "Hash Join vs. Nested Loops"
Next in thread: Mikito Harakiri: "Re: Hash Join vs. Nested Loops"
Reply: Mikito Harakiri: "Re: Hash Join vs. Nested Loops"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message