Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

From: Costin Cozianu <ccozianu_at_netzero.net>
Date: Mon, 24 Feb 2003 19:21:28 -0800
Message-ID: <b3en7u\$1ljbht\$1_at_ID-152540.news.dfncis.de>

Jan Hidders wrote:
> Costin Cozianu wrote:
>

```>>>No. In fact, in theory, all optimizations that can be done in a set-based
>>>algebra can also be done in a bag-based algebra but not the other way
>>>around.
>>
>>Hi Jan,
```

>
>
> Hello Costin,
>
> Always nice to see a reaction that actually talks about theory. :-)
>
>
```>>I followed this discussion, interesting as always. But I'm puzzled by this
>>conjecture. Has somebody studied it ?
```

>
>
> No, because it is actually quite trivial. I'll try to explain by answering
> your questions. There is, by the way, quite some literature on bag algebras,
> list algebras, and even the more general pom-set (partially ordered
> multi-set) algebras, although these may not study the questions you have in
> mind. For examples see:
>
> http://citeseer.nj.nec.com/libkin93some.html (on bag algebras)
> http://citeseer.nj.nec.com/grumbach96query.html (more on bag algebras)
> http://citeseer.nj.nec.com/181093.html (on pom-sets)
>
>
```>>Here's my take on  it:
>>
>>- I don't see in any useful sense how can bag algebra be a superset of set
>>algebra. The obvious inclusion mapping from sets to bags is not a morphism
>>: Bag(a \/ b) = Bag(a) \/ Bag(b) doesn't hold
```

>
>
> That depends on how you define the operations in the bag algebra. It is
> quite possible to define them such that there actually *is* a morphism, for
> example, by the defining the bag-union such that the element x appears
> max(a,b) times in the bag union A \/ B if it appears a times in A and b
> times in B. So it all depends on which bag algebra you are talking about.
>

Yes, it all depends. That's why you have to pick a good one and then compare. In the case specified by you, it looks like you loose the natural semantics bag and bag unions (also defined by SQL). Let's not forget that we're not talking here about bisimulation between bag crunching machine and relational crunching machines, we're talking here about the usefulness of algberaic theory to software engineering. At least I'm no scientist around here (well, some people call me architect), I have to be concerned with delivering solutions.

It's already tough for me that I have to bend SQL over to be able to use it decently, and for the life of me, I really don't see too much usefulness for a bag algebra. Did anybody come up with a set of problems for which solutions in set theory are excessively complicated ? If I get to program in a bag algebra by default, can I flip a switch that enforces the rules "all bags are sets by default, unless I state otherwise ?". Is this exercise worth the trouble ?

> However, what is relevant in the context of query optimization and for my
> remarks above is that for every expression in the set-based algebra there is
> a corresponding expression in most reasonable bag-based algebras, and that
> is more or less how I would interpret the statement that the set-based
> algebra is a subset of the bag-based algebra.
>

But if you choose this trick show me an expression in bag algebra, I'll show you the expression in relational algebra that calculates it. Since bags are sets, I would submit this is rather easy.

>

```>>- Bag on the other hand are a special kind of sets. Aren't you defining
>>bags as a function from a universe of elements to Natural numbers ? But
>>also bag operations are not a straight mapping of corresponding set
>>operation
```

>
>
> Bags can be defined as a special kind of set, and sets can be seen as a
> special case of bags. For every set operation it is easy to think of one (or
> more) corresponding bag operation that is the same up to duplicates.
>
>

Another doubt that I have is that we have several axiomatic constructions that go from set theory to the whole math. I don't know if you can as easily construct sets from bags. Remeber, sets are how you *define* bags. Saying that every set is a bag is a one to one mapping between some bags and sets, not a constructive definition. There's also a one to one mapping from cartesian coordinates and polar coordinates, yet we don't seem to argue about sacrificing one vs. the other.

Anyway, this argument doesn't have any practical impact one way or another, methinks.

```>>- therefore my suspicion is that neither algebraic structures are
>>isomoprhic to a substructure of the other.
```

>
>
> They don't have to be. All that is needed is that for every algebraic
> equality that we had in the set-based algebra there is a corresponding
> algebraic equality in the bag-based algebra. This is trivially true if for
> every set operation there is a bag operation (or an expression that
> simulates it) up to duplicates.
>
>

That's not a problem. The problem is that for example SQL defines a set of semantic requirements with regards to operation on bags. Pretty gratuituous semantic rules if you ask me, more so if you ask Date :)

```>>- it is the rule rather than the exception, that the stringer algebraic
>>properties a structure has , the better suited for optimization it is.
>>Sets have the well known algebraic properties of lattices. Bag algebra
>>is a ... (bag algebra I presume) ? All my algebraic books just don't
>>study the algebra of bags. On the contrary, lattices are widely studied,
>>presumably this situation
```

>
>
> Try (free) monoids:
>
>
http://citeseer.nj.nec.com/fegaras94uniform.html
>

Here's a gut reaction: monoids have less algebraic properties than lattices. So that was exactly my argument, the more algebraic properties you have, the better is the optimizer.

>

```>>You recurred at some point to a proof by authority (Ullman, Molina vs.
>>Date), but I'm curious if you have a stated opinion of Molina et all
>>vis-a-vis bag algebra being more optimizable than set alegbra, or is it
>>just a conjecture of yours.
```

>
>
> I hope you understand now that this "conjecture" is rather trivial to prove
> once you exactly define what is meant by "subset of" for such algebras.
>

Well, that's exactly the point. I don;t understand how this conjecture is trivial to prove. And I hope it's not the same kind of conjecture as in " [memory bounded LISP machines] subset of [Turing machines]". That I can understand, but I wouldn't program in Turing machines for the life of me.

Ok, maybe it's a stretched analogy, but I hope you getbthe point. We have to carefully study not only the abstract properties of bag algebras vis-a-vis set algebras, but also the adequacy of semantics for the purpose of data management.

Something along the lines sketched by Mathias Fellejsen in http://citeseer.nj.nec.com/felleisen90expressive.html

In the meantime, all I can see is that pretty much all the bag based SQ products screwed up one way or another, that the bag based SQL standard is less than adequate design. True, we don't have too much production-level set based DB's (although DEC's Rdb has been mentioned here). But frankly I can't see how any faithful implementation of a set based RDBMS can match the failures of SQL based products.

> -- Jan Hidders
>

Best,
Costin Cozianu Received on Tue Feb 25 2003 - 04:21:28 CET

Original text of this message