Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

From: Jan Hidders <>
Date: 25 Feb 2003 11:19:22 +0100
Message-ID: <>

Costin Cozianu wrote:
>Jan Hidders wrote:
>> Costin Cozianu wrote:
>>>Here's my take on it:
>>>- I don't see in any useful sense how can bag algebra be a superset of set
>>>algebra. The obvious inclusion mapping from sets to bags is not a morphism
>>>: Bag(a \/ b) = Bag(a) \/ Bag(b) doesn't hold
>> That depends on how you define the operations in the bag algebra. It is
>> quite possible to define them such that there actually *is* a morphism,
>> for example, by the defining the bag-union such that the element x
>> appears max(a,b) times in the bag union A \/ B if it appears a times in A
>> and b times in B. So it all depends on which bag algebra you are talking
>> about.
>Yes, it all depends. That's why you have to pick a good one and then
>compare. In the case specified by you, it looks like you loose the natural
>semantics bag and bag unions (also defined by SQL). Let's not forget that
>we're not talking here about bisimulation between bag crunching machine and
>relational crunching machines, we're talking here about the usefulness of
>algberaic theory to software engineering. At least I'm no scientist around
>here (well, some people call me architect), I have to be concerned with
>delivering solutions.

Well, I'm trying to become a scientist (so I probably should stop spending time in this newsgroup :-)) but I also have an engineering degree and am very concerned with the practical side of things. That's actually why I like database theory so much.

>It's already tough for me that I have to bend SQL over to be able to use
>it decently, and for the life of me, I really don't see too much
>usefulness for a bag algebra.

Even as a bag calculus SQL is a mess, so that is not what I'm trying to defend here. Also note that I'm not claiming that there is a big need to expose end users to bags. There are a few optimizations that become easier if you do, but I think it is also clear that it makes things a bit more complicated (but not much IMO) and there is nothing that you can do extra that you couldn't do before.

>> However, what is relevant in the context of query optimization and for my
>> remarks above is that for every expression in the set-based algebra there
>> is a corresponding expression in most reasonable bag-based algebras, and
>> that is more or less how I would interpret the statement that the
>> set-based algebra is a subset of the bag-based algebra.
>But if you choose this trick show me an expression in bag algebra, I'll
>show you the expression in relational algebra that calculates it. Since
>bags are sets, I would submit this is rather easy.


>I mean we are arguing here about which is model is more conveninet,
>aren't we ?

Er, not exactly. I'm arguing that some arguments that are presented by Date to disallow bags are not entirely correct.

>>>- Bag on the other hand are a special kind of sets. Aren't you defining
>>>bags as a function from a universe of elements to Natural numbers ? But
>>>also bag operations are not a straight mapping of corresponding set
>> Bags can be defined as a special kind of set, and sets can be seen as a
>> special case of bags. For every set operation it is easy to think of one
>> (or more) corresponding bag operation that is the same up to duplicates.
>Another doubt that I have is that we have several axiomatic constructions
>that go from set theory to the whole math. I don't know if you can as
>easily construct sets from bags. Remeber, sets are how you *define* bags.
>Saying that every set is a bag is a one to one mapping between some bags
>and sets, not a constructive definition. There's also a one to one mapping
>from cartesian coordinates and polar coordinates, yet we don't seem to
>argue about sacrificing one vs. the other.

That's not really a good analogy because cartesions coordinates cannot be seen as a special case of polar coordinates while sets are clearly a special case of bags. So choosing for bags doesn't sacrifice anything in this case.

>>>- therefore my suspicion is that neither algebraic structures are
>>>isomoprhic to a substructure of the other.
>> They don't have to be. All that is needed is that for every algebraic
>> equality that we had in the set-based algebra there is a corresponding
>> algebraic equality in the bag-based algebra. This is trivially true if
>> for every set operation there is a bag operation (or an expression that
>> simulates it) up to duplicates.
>That's not a problem. The problem is that for example SQL defines a set
>of semantic requirements with regards to operation on bags. Pretty
>gratuituous semantic rules if you ask me, more so if you ask Date :)

Again, I have no desire to defend SQL in any way.

>So while every operation on sets can be simulated by any operation on
>bags that are equivalent to the initial sets up to duplicates. This not
>true for example in the case of SQL bags. So less optimizations you can
>choose from. If I operate on set algebras, I can choose a whole class of
>equivalence in bags for one set. The reverse is not true.

That depends not on SQL but on the bag algebra that is used to implement it. And again, if you choose this bag algebra right then it is trivial to see that for every algebraic identity in the set algebra there is a corresponding algebraic identity in the bag algebra. All you have to do is replace the set operations with bag operations that are equivalent up to duplcates and insert the duplicate elimination steps.

>>>- it is the rule rather than the exception, that the stringer algebraic
>>>properties a structure has , the better suited for optimization it is.
>>>Sets have the well known algebraic properties of lattices. Bag algebra
>>>is a ... (bag algebra I presume) ? All my algebraic books just don't
>>>study the algebra of bags. On the contrary, lattices are widely studied,
>>>presumably this situation
>> Try (free) monoids:
>Here's a gut reaction: monoids have less algebraic properties than
>lattices. So that was exactly my argument, the more algebraic properties
>you have, the better is the optimizer.

I would argue that that is a bit of an oversimplification. The details matter here.

>In the meantime the point I was trying to make vis-a-vis Date vs. Garcia
>Molina is that I might issue a conjecture that Garcia Molina and all
>have to study optimization for bags because that's what is required for
>their students who probably get to work at Oracle et. comp.

That certainly is the case, and I already said that at the beginning of the thread:

| The book is agnostic on this issue. It's starting point is that you want
| to build an SQL database (and so there are duplicates in your results) and
| then explains how you should do that.

But my point was that apart from that there are other reasons to talk about a bag-based algebra, even if you don't expose bags to the end users. And of course it also works the other way around: even though you have bags in your logical data model, you could still use a set-based algebra for optimization.

>I'd like to be contradicted and let me know if their stated opion is
>that bags are better suited for data management with reghards to
>optimization, ease of use and everything.

That's not what I said. Here is what I said at the beginning of the thread:

| There are two separate questions here:
| 1. Do we want duplicates in the data model, i.e., in the original relations
|    and the results of queries?
| 2. Do we want duplicates in intermediate results?
| I'm not completely sure what their answer to 1. is but I suspect it is
| something like "probably not". [...]

Kind regards,

Received on Tue Feb 25 2003 - 11:19:22 CET

Original text of this message