Re: Functions and Relations

From: Aloha Kakuikanu <aloha.kakuikanu_at_yahoo.com>
Date: 21 Nov 2006 12:06:02 -0800
Message-ID: <1164139562.727267.47670_at_e3g2000cwe.googlegroups.com>


vc wrote:
> Aloha Kakuikanu wrote:
> > Hash join is an ugly artifact of physical world where we have
> > random io inferior to sequential io.
> I think multiblock vs. single block IO for an index access path for NLs
> is also an important factor, maybe more important than IO randomness
> depending on a specific IO/RAID system of course.

By "random IO" i slopilly meant single block IO, and by "sequential IO" i meant multiblock IO.

> With solid state disks, reading large portions of memory vs. smaller
> chunks may still make hash joins more performant(performance complexity
> O(n) for NL may be higher than O(m+n) for HJ where m and n are
> relation cardinalities).

I'm not sure I follow your calculation. Anyhow, 5 years ago I had hard time finding a performance case where Hash Join was merely 10 times faster as indexed nested loop. That was on traditional disc system. Now, given no performance penalty for randomly accessed index nodes/records, I doubt there is any difference today. With nested loops you scan the outer table, navigate about 3-4 levels in the B-tree and find the match in the inner table. With hash join you scan the outer table, the match in the hash table and retrieve thae matching portion from the inner table. That is logically about the same amount of work. Plus Nested Loops is more elegant from theoretical point of view. Remember hash indexes? Nobody uses them today, because B-tree is more elegant and more general.

> How do you suggest to materialize 'f(x)' as a relation and why do you
> think the join will be more performant than just calculating 'f(x)' per
> each row ?

See the example with finite relation/function `x=1`

> > We can sample a function as a normal relation
> > and derive statistics.
>
> Oracle statistics for example provide food for the cost based optimizer
> which is mainly interested in IO. Well, versions 9i and above take
> into account CPU as well, but I believe IO is still the optimizer
> staple. I am not sure how representing a function as a relation will
> improve the optimizer bahavior. Could you please explain ?
>
> > Even if there is no new access paths, we'll
> > still benefit from more accurate statistics.
>
> How exactly ?

Are you aware that selectivity/cardinality estimation related to function invokations in SQL is finicky in today's RDBMS implementations? Received on Tue Nov 21 2006 - 21:06:02 CET

Original text of this message