Re: Functions and Relations

From: vc <boston103_at_hotmail.com>
Date: 21 Nov 2006 19:56:01 -0800
Message-ID: <1164167761.008174.153350_at_b28g2000cwb.googlegroups.com>


Aloha Kakuikanu wrote:
> vc wrote:
> > In a sort of way, yes, it does if you think that a 5 times difference
> > in the execution time is OK.
>
> Since, the difference the other way is routinely orders of magnitude, 5
> times time difference is nothing.
>

That is odd. I've never seen "the difference the other way" being "orders of magnitude". There's always the first time as they say. Please provide a test case of a HJ being at least 10 times slower than a NL join for the same collection of data of course.

> Now, for complex queries it is common for cardinality/selectivity
> estimation error to be magnified.

That is correct buit the execution plan construction seems to be getting better with the recent crop of optimizers.

>So, you have cardinality/selectivity
> estimation error which influences Hash Join versus Nested Loops
> decision, but the consequence of this error is fundamentally skewed. If
> you get NL instead of HJ by mistake, then you pay the penalty of
> perhaps 5 times slower query. Nothing to panic about.

Oh, I assure you quite a few folks would panic allright, especially in the financial sector, if a nightly cycle spanned several days instead of say 8-10 hours.

> Now, if you get
> HJ insted of NL by mistake, your query may hang. That is the difference
> between the query working, or not.

Never seen this one either. Care to provide a test case ? Usually, it's just the opposite, a NL join may take forever doing its block-at-a-time IO especially if say five or six tableas are involved. A HJ would just chug along with its full table scans.

>
> I would even go as far as to suggest disabling HJ (and SMJ for that
> matter) for OLTP system.

Well, with OLTP systems, those queries are not that complicated because they have to provide 1-2-5 second response time. Usually, the optimizer does a decent job and if it does not the first_rows hint can put it straight without need to disable SJs of HJs. Also, the problem is that there are no pure OLTP systems any more. They are usually hybrids with some reports running on them and taking a couple hours or more to complete, and disabling SJs would not look like a smart decision. But it's rather boring a subject for the theory group would not you agree ?

>This is of course assuming that we always have
> *indexed* nested loops. If index is not there, it has to be temporarily
> created, pretty much like HJ creates temporary hash table.

You are surely kidding. Creating an index for a million row table takes about a minute or two(depending on the column width of course). Is your hypothetical Web user supposed to sit and wait while you create an index ? The index should already exist -- I have a strange feeling of explaining why multiplication can be more useful than repeated addition. The HJ temp table and the HJ itself are used for complex reporting queries as a rule. The NL happens automatically with the majority of OLTP queries, hinted or not, thanks to the optimuizer or the database designer or both (or neither ;).

>
> Anyway, i dismiss hash join for purely aesthetic reasons.

I like those reasons too. But, why be so minimalistic, why not apply you aesthetic judgement to the whole world around you and dismiss it as being as butt ugly as the HJ ?

> It has too
> many arbitrary parameters: bucket size, block size, various hashing
> constants in the hash function. Do solid state disks still have blocks?
> I guess they no more relevant than memory pages? If hash join disappear
> one day, nobody in the databese theory world would notice.

Tell it to the marines or Mr. Kyte ;)

>
> > > > That answers "why you think it'll be more performant" but does not
> > > > answer "How do you suggest to materialize 'f(x)' as a relation ?". So
> > > > how do you suggest to do the trick ?
> > >
> > > No, I objected to the method where we materialize the `y=f(x)`
> > > relation. I suggested a vague idea that we access this relation via
> > > index -- function invokation.
> >
> > Can you make this idea a little bit less vague ?
>
> See the detailed evaluation of 6 join orders in the other thread.

Sounds good.

>
> > > > > Are you aware that selectivity/cardinality estimation related to
> > > > > function invokations in SQL is finicky in today's RDBMS implementations?
> > > >
> > > > I am aware of the cardinality problem with 'select * from t1 where x in
> > > > (select x from table(function_1(a,b,c))' and its variations, but the
> > > > problem can be cured with the cardinality hint. It may not be
> > > > esthetically pleasing, granted, but what is the alternative ?
> > >
> > > Not only that. Can you specify card(A) = 10 * card(B)? How do you
> > > specify cardinality after some predicates are applied? How do you
> > > specify cardinality for a volatile (or temporaty) table that changes
> > > cardinality by orders of magnitude?
>
> If we adopt the idea of predicates being relations, we can hint their
> cardinalities!
>
> > The last one is implemented by Oracle as "dynamic sampling".
>
> True. But yet again, if predicates are elevated to the level of
> relations, even understanding of SQL optimization becomes easier.
> Consider a table/view with several predicates attached to it. There is
> a concept of table cardinality without predicates, cardinality of table
> with all single table predicates applied (e.g. t.x=1), cardinality of
> table with all predicates applied (including join predicates). That is
> very confusing.

Sounds most cool, but I'll have to read that other message of yours before replying.

>(BTW, dynamic sampling estimates cardinality with
> single table predicates only).

No kidding ? I vaguely remember row sampling being done for several tables participating in a join but am too lazy to check. Received on Wed Nov 22 2006 - 04:56:01 CET

Original text of this message