Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

From: Jan Hidders <jan.hidders_at_REMOVE.THIS.ua.ac.be>
Date: 4 Mar 2003 15:06:21 +0100
Message-ID: <3e64b2dd.0_at_news.ruca.ua.ac.be>


Lauri Pietarinen wrote:
>Jan Hidders wrote:
>
>>But you have been arguing all the time that we should keep things as
>>simple as possible for the implementors. Have you now changed your mind?
>>
>The main claim was that if we use sets intead of bags there are more
>optimisations availabe (e.g. as in the projection of view as mentioned
>earlier).

Which, taken literally, is nonsense, as I've already argued. At most you can claim that they make certain optimization harder to spot and then you still have to show *how much* harder they become.

>However because the main focus in SQL has been on the bag-like features,
>lots of optimisation avenues have been left untouched. They became
>distracted, in a sense. Did it make life easier for them? Maybe...
>
>But wait! Why don't we just go back to DL/I and CODASYL? You know what? We
>will make the lives of implementors even easier: No optimisations are
>possible ergo no optimising problems!!!

Hello? Who are you talking to? I never said that we should make the life of them as easy as possible. On the contrary, I have consistently argued that from the perspective of query optimization the addition of bags only causes a little extra complexity and therefore might be worth the extra effort.

>>Could it not be that adding bags is also going to give us extra
>>optimizations that would not be considered in a set-only approach?
>
>What would they be? Could you give me an example?

Let's say I have a set X and I want to know the average of f(x) for all x in the set X. In the bag algebra this is straightforward: an iteration over X where you compute f(x), followed by the bag function AVG. If you simulate this with sets ( the bag {{x, x, y}} becomes {(x,2), (y,1)} ) then you get an iteration followed by a grouping on the values of f(x) and then you have to compute the average over the product of every element and its cardinality. It's not trivial to recognize that this is equivalent with the previous sequence of bag operations, which is often a more efficient implementation.

>>>>Meaning that in this case you might be right that for this particular
>>>>query the bag-based approach makes optimization a bit harder. However,
>>>>after giving it a bit more thought I doubt that even this is true. The
>>>>problem is just as difficult in a set-only as in a bag-based approach.
>>>>In both cases you can optimize a join followed by a project that
>>>>projects out a certain table by using only the index for the "invisible"
>>>>table.
>>>>
>>>Or ignoring it completely...
>>
>>That would be semantic query optimization and that is already hard in a
>>set-only approach.
>>
>Is it easier in a bag-based approach?

No.

>>>Your point is that if the user gives the 'DISTINCT' keyword,
>>>it works as if it was a set-based system.
>>
>>Yes, it should and it could.
>>
>In my opinion it is just an unnecessary distraction for both implementors
>and users.

Opinions are nice, arguments are better.

  • Jan Hidders
Received on Tue Mar 04 2003 - 15:06:21 CET

Original text of this message