Re: Self Joins and optimization
From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Sat, 05 May 2007 17:14:52 -0300
Message-ID: <463ce58d$0$4041$9a566e8b_at_news.aliant.net>
>>In the "interpolation" thread, Brian has been expounding on the idea that
>>he can discover algorithms that he knows to be correct, and that outperform
>>anything an optimizer can generate. He's mentioned "self joins" and the
>>idea that he can get a job doen with only one read, where the automatic
>>solution generated in response to declarative code generates multiple passes
>>through the data.
>>
>>My experience leads me to the opposite conclusion from that.
>>
>>Here's the questions I'd like the newsgroup to consider:
>>
>>What, if any, algorithms that are demonstrably correct are going to be
>>overlooked by the optimizer, although discoverable by a human coder? Why
>>would an optimizer fail to discover the possibility of eliminating
>>self-joins? Are there any undecidability issues here?
>>
>>Aside:
>>
>>I know a little about the DEC Rdb/VMS optimizer. Less than that about the
>>Oracle CBO, although the Oracle CBO appears to have been conceptually
>>derived from the Rdb/VMS optimizer. Much less than that about the Oracle
>>RBO as of 1994. The RBO appears to have been an optimizer in name only.
>>
>>While I've got a lot of positive opinions about various features of Oracle,
>>the RBO was not one of those features. Shortly after my cutover to Oracle,
>>one of the tech types told me: "All the cool people use hints."
>>
>>After looking at it in as much depth as I cared to, my conclusion (which I
>>kept to myself) was: "No. All the really cool people find a DBMS that
>>doesn't require hints to do a good job."
>>
>>End of aside.
Date: Sat, 05 May 2007 17:14:52 -0300
Message-ID: <463ce58d$0$4041$9a566e8b_at_news.aliant.net>
Kevin Kirkpatrick wrote:
> David Cressey wrote: >
>>In the "interpolation" thread, Brian has been expounding on the idea that
>>he can discover algorithms that he knows to be correct, and that outperform
>>anything an optimizer can generate. He's mentioned "self joins" and the
>>idea that he can get a job doen with only one read, where the automatic
>>solution generated in response to declarative code generates multiple passes
>>through the data.
>>
>>My experience leads me to the opposite conclusion from that.
>>
>>Here's the questions I'd like the newsgroup to consider:
>>
>>What, if any, algorithms that are demonstrably correct are going to be
>>overlooked by the optimizer, although discoverable by a human coder? Why
>>would an optimizer fail to discover the possibility of eliminating
>>self-joins? Are there any undecidability issues here?
>>
>>Aside:
>>
>>I know a little about the DEC Rdb/VMS optimizer. Less than that about the
>>Oracle CBO, although the Oracle CBO appears to have been conceptually
>>derived from the Rdb/VMS optimizer. Much less than that about the Oracle
>>RBO as of 1994. The RBO appears to have been an optimizer in name only.
>>
>>While I've got a lot of positive opinions about various features of Oracle,
>>the RBO was not one of those features. Shortly after my cutover to Oracle,
>>one of the tech types told me: "All the cool people use hints."
>>
>>After looking at it in as much depth as I cared to, my conclusion (which I
>>kept to myself) was: "No. All the really cool people find a DBMS that
>>doesn't require hints to do a good job."
>>
>>End of aside.
> > A simple case study: extend a relation by adding an attribute that > gives the rank of each tuple based on the ordering of a subset of > existing attributes of the relation. With primitive relational > operators, the only way I'm aware of to express this is: > > SELECT E1.NAME, E1.SAL, COUNT(*) RNK > FROM EMP E1 EMP E2 > WHERE E2.SAL <= E1.SAL > GROUP BY E1.NAME, E1.SAL;
The above has a serious bug in it. What happens when 10 people all have the same salary?
s/b WHERE E2.SAL < E1.SAL OR E1.somekey = E2.somekey
> I'm not aware of any rationale that would allow an optimizer to > compute the above with anything faster than a O(n^2) algorithm.
The same rationale that allows any other method to perform a rank query in a timely manner. It is simply a matter of recognizing the rank query.
[snip] Received on Sat May 05 2007 - 22:14:52 CEST