Re: Hash Join vs. Nested Loops

From: Scott David Daniels <Scott.Daniels_at_Acm.Org>
Date: Fri, 28 Feb 2003 23:38:47 -0800
Message-ID: <3e606387$1_at_nntp0.pdx.net>


Mikito Harakiri wrote:
 > Thank you for your comment, but ...

You are right, I did get confused about your claims. I also thought your question was from a more naive source. At first I assumed I was talking to a possibly fine programmer with no DB experience in his early learn-about-DB phase. This is the period where you doesn't understand what query optimization can do. I apologize if I seem to have been condescending; I was trying to answer.

If you have the join condition indexed, and RAM is really RAM (that is, no locality issues for cache or pages), the three strategies may not provide any major differences (leaving indexed nested loops join the winner from simplicity). Oops, see below: hash join may be easier for multi-cpu.

Locality often does matter, however (fewer page tables in cache, for example). A merge join will get you the most coherence on the final pass. Hash join has coherence in in the outer relation, but so does indexed nested loops. Hash join parallelizes nicely (each CPU takes a set of hash values), with minimal contention, a feature neither of the others have. Also, it is rare to have join indices for all join conditions; while you may have in mind a class of applications where that is true, tossing out H-J and M-J misses some big optimizations when the join condition is not fully indexed.

> ... ordinary Nested Loops is so obviously flawed that it doesn't
> make sense to compare it to anything....

True, but I had not understood that _you_ understood this until your last message. Also this most recent few delays were due to the death of my news feed. I am doing a funny dance to get to the discussion via Google and then replying via the news link.

-Scott David Daniels
Scott.Daniels_at_Acm.Org Received on Sat Mar 01 2003 - 08:38:47 CET

Original text of this message