Re: Self Joins and optimization

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Sat, 05 May 2007 17:14:52 -0300
Message-ID: <463ce58d$0$4041$9a566e8b_at_news.aliant.net>


Kevin Kirkpatrick wrote:

> David Cressey wrote:
> 

>>In the "interpolation" thread, Brian has been expounding on the idea that
>>he can discover algorithms that he knows to be correct, and that outperform
>>anything an optimizer can generate. He's mentioned "self joins" and the
>>idea that he can get a job doen with only one read, where the automatic
>>solution generated in response to declarative code generates multiple passes
>>through the data.
>>
>>My experience leads me to the opposite conclusion from that.
>>
>>Here's the questions I'd like the newsgroup to consider:
>>
>>What, if any, algorithms that are demonstrably correct are going to be
>>overlooked by the optimizer, although discoverable by a human coder? Why
>>would an optimizer fail to discover the possibility of eliminating
>>self-joins? Are there any undecidability issues here?
>>
>>Aside:
>>
>>I know a little about the DEC Rdb/VMS optimizer. Less than that about the
>>Oracle CBO, although the Oracle CBO appears to have been conceptually
>>derived from the Rdb/VMS optimizer. Much less than that about the Oracle
>>RBO as of 1994. The RBO appears to have been an optimizer in name only.
>>
>>While I've got a lot of positive opinions about various features of Oracle,
>>the RBO was not one of those features. Shortly after my cutover to Oracle,
>>one of the tech types told me: "All the cool people use hints."
>>
>>After looking at it in as much depth as I cared to, my conclusion (which I
>>kept to myself) was: "No. All the really cool people find a DBMS that
>>doesn't require hints to do a good job."
>>
>>End of aside.
> 
> A simple case study: extend a relation by adding an attribute that
> gives the rank of each tuple based on the ordering of a subset of
> existing attributes of the relation.  With primitive relational
> operators, the only way I'm aware of to express this is:
> 
> SELECT E1.NAME, E1.SAL, COUNT(*) RNK
> FROM EMP E1 EMP E2
> WHERE E2.SAL <= E1.SAL
> GROUP BY E1.NAME, E1.SAL;

The above has a serious bug in it. What happens when 10 people all have the same salary?

s/b WHERE E2.SAL < E1.SAL OR E1.somekey = E2.somekey

> I'm not aware of any rationale that would allow an optimizer to > compute the above with anything faster than a O(n^2) algorithm.

The same rationale that allows any other method to perform a rank query in a timely manner. It is simply a matter of recognizing the rank query.

[snip] Received on Sat May 05 2007 - 22:14:52 CEST

Original text of this message