Re: Can relational alegbra perform bulk operations?

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Tue, 29 Sep 2009 20:03:18 -0300
Message-ID: <4ac29239$0$23754$9a566e8b_at_news.aliant.net>



Banana wrote:

> Tegiri Nenashi wrote:
>

>> if you are saying that only those transformations that lead to 
>> "shorter" relational
>> expressions make sense from optimization perspective, than this is not
>> true. E.g. "magic" transformation.

>
> Yes, that's what I was trying to understand and your statement
> correlates with Clifford's statement that no 'magic' transformation
> actually exists.
>
>> Are you familiar with cost based optimization? They basically throw in
>> multiple equivalent expression and estimate cost of running each
>> apriori.

>
> Yes, but I suspect I had a mistaken notion of what it actually did; I
> thought they basically selected an expression that was known to be
> cheaper than other equivalent expression but as there is no magic
> transformation, it's probably basically a guessimate (though I'm quite
> sure it's more complicated/sophisticated than that).

I think you are missing the forest for the trees. It's really quite simple: The first stage of the optimizer spits out a series of equivalent relational expressions by rearranging the original expression according to identities in the algebra. e.g. properties like transitivity, commutativity etc. The second stage estimates a cost for each of the expressions until it thinks it has one that is "cheap enough".

Having a simple algebra with plenty of useful identities makes creating an optimizer much easier. SQL dbmses do something similar, but things like duplicates and null complicate the algebra and reduce the number of useful identities. As a result, SQL optimizers are more complicated and error-prone than they might otherwise be.

One example where the expression is longer but the execution is usually faster happens when pushing restrict through join. e.g. suppose one restricts a result to customers in a particular state 'AK' and one joins the customer relation to a state relation. Expanding the expression to evaluate state='AK' for both customers and states will reduce the cost of the join by eliminating all the other states from consideration. Assuming one has an index on the state relation the join will be very fast. Received on Tue Sep 29 2009 - 18:03:18 CDT

Original text of this message