Re: Functions and Relations

From: Aloha Kakuikanu <aloha.kakuikanu_at_yahoo.com>
Date: 21 Nov 2006 09:48:26 -0800
Message-ID: <1164131306.854299.17840_at_h54g2000cwb.googlegroups.com>


Sampo Syreeni wrote:
> Suppose you want an equijoin f(x)=y where x and y are in different
> relations. Most optimizers throw away any statistics on x because of the
> function application or use some approximate statistic derived from x.
> With high cardinality on both attributes, the plan tree might become
> something like hash_join(f(x), y).

This is a great example of how optimizer can leverage new access path. But first let's refine it a little bit.

Hash join is similar to nested loop in a sense that it creates a temporary join index structure. (OK, this index is technically a hash table). Hash join is an ugly artifact of physical world where we have random io inferior to sequential io. The implementation progresses to eliminating this gap completely: with solid state disc technology nested loops is never slower than hash join. Therefore, from theoretical perspective we can focus on Nested loops join solely.

Now returning to the main discussion. The query in question:

select x,y from T
where x=y and y=f(x)

The new access path is to evaluate

'x=y' /\ 'y=f(x)`

somehow, and if this relation is finite and small, then we can join T via indexed nested loops!

> If the function is treated as a relation, the machinery is already there
> in the form of table statistics. In this case they just need to be
> filled in at query compilation or dynamic sampling time.

One more reason, indeed. We can sample a function as a normal relation and derive statistics. Even if there is no new access paths, we'll still benefit from more accurate statistics. Received on Tue Nov 21 2006 - 18:48:26 CET

Original text of this message