Re: Functions and Relations

From: Aloha Kakuikanu <aloha.kakuikanu_at_yahoo.com>
Date: 21 Nov 2006 18:17:13 -0800
Message-ID: <1164161833.832885.123540_at_h54g2000cwb.googlegroups.com>


vc wrote:
> In a sort of way, yes, it does if you think that a 5 times difference
> in the execution time is OK.

Since, the difference the other way is routinely orders of magnitude, 5 times time difference is nothing.

Now, for complex queries it is common for cardinality/selectivity estimation error to be magnified. So, you have cardinality/selectivity estimation error which influences Hash Join versus Nested Loops decision, but the consequence of this error is fundamentally skewed. If you get NL instead of HJ by mistake, then you pay the penalty of perhaps 5 times slower query. Nothing to panic about. Now, if you get HJ insted of NL by mistake, your query may hang. That is the difference between the query working, or not.

I would even go as far as to suggest disabling HJ (and SMJ for that matter) for OLTP system. This is of course assuming that we always have *indexed* nested loops. If index is not there, it has to be temporarily created, pretty much like HJ creates temporary hash table.

Anyway, i dismiss hash join for purely aesthetic reasons. It has too many arbitrary parameters: bucket size, block size, various hashing constants in the hash function. Do solid state disks still have blocks? I guess they no more relevant than memory pages? If hash join disappear one day, nobody in the databese theory world would notice.

> > > That answers "why you think it'll be more performant" but does not
> > > answer "How do you suggest to materialize 'f(x)' as a relation ?". So
> > > how do you suggest to do the trick ?
> >
> > No, I objected to the method where we materialize the `y=f(x)`
> > relation. I suggested a vague idea that we access this relation via
> > index -- function invokation.
>
> Can you make this idea a little bit less vague ?

See the detailed evaluation of 6 join orders in the other thread.

> > > > Are you aware that selectivity/cardinality estimation related to
> > > > function invokations in SQL is finicky in today's RDBMS implementations?
> > >
> > > I am aware of the cardinality problem with 'select * from t1 where x in
> > > (select x from table(function_1(a,b,c))' and its variations, but the
> > > problem can be cured with the cardinality hint. It may not be
> > > esthetically pleasing, granted, but what is the alternative ?
> >
> > Not only that. Can you specify card(A) = 10 * card(B)? How do you
> > specify cardinality after some predicates are applied? How do you
> > specify cardinality for a volatile (or temporaty) table that changes
> > cardinality by orders of magnitude?

If we adopt the idea of predicates being relations, we can hint their cardinalities!

> The last one is implemented by Oracle as "dynamic sampling".

True. But yet again, if predicates are elevated to the level of relations, even understanding of SQL optimization becomes easier. Consider a table/view with several predicates attached to it. There is a concept of table cardinality without predicates, cardinality of table with all single table predicates applied (e.g. t.x=1), cardinality of table with all predicates applied (including join predicates). That is very confusing. (BTW, dynamic sampling estimates cardinality with single table predicates only). Received on Wed Nov 22 2006 - 03:17:13 CET

Original text of this message