Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

From: Bob Badour <bbadour_at_golden.net>
Date: Mon, 17 Feb 2003 23:09:41 -0500
Message-ID: <4Ji4a.12$3a2.1676406_at_mantis.golden.net>


"Paul" <pbrazier_at_cosmos-uk.co.uk> wrote in message news:51d64140.0302170330.15d2a98f_at_posting.google.com...
> "Mikito Harakiri" <mikharakiri_at_ywho.com> wrote in message
news:<nGb3a.11$yd.52_at_news.oracle.com>...
> > "Paul" <pbrazier_at_cosmos-uk.co.uk> wrote in message
> > news:51d64140.0302140131.7621ef2b_at_posting.google.com...
> > Adding count column is certainly possible, as Bob demonstrated it yet
one
> > more time. I'm just verifying if this practice is not inferior to
explicitly
> > having multiset concept in the model.
>
> OK suppose I have an employee "relation" which is a multiset.
> I have two employees called John Smith, in the same dept on the same
> salary.
> So my multi-relation contains two outwardly identical tuples:
> ("John Smith", 10, 20000)
> ("John Smith", 10, 20000)
>
> Now one of the John Smiths has a sex-change and becomes Jane Smith.
>
> How does the user update only one of the rows?

One must resort to physical pointers to identify a unique ("John Smith", 10, 20000), which make multisets an obviously inappropriate choice upon which to base a logical data model.

It's only a conceit or affectation of Mikito's that he's opening people's eyes to new untried possibilities. He knows his objections do not hold water; at best, he's just trying to challenge the people in the newsgroup to recognize the best answers. As pedagogy, I suppose I have no objections to that.

> So multisets really just are syntactic shorthand. Now I suppose
> ultimately everything is syntactic shorthand for set theory :) but I'm
> not convinced that at the logical level the advantages of multisets
> outweigh the disadvantages. Quite possibly it might be different for
> the physical level though.

In a relational dbms, physical duplicates represent a single fact and could come from a variety of causes including data distribution, caching and indexing. The only time a transient duplicate has any importance is during aggregate evaluation.

The importance of supporting physical multisets during optimization was simply overstated. The only questions arising for duplicates during optimization are: Is a duplicate an error? Is duplicate removal required? Can it be deferred? Should it be deferred? Received on Tue Feb 18 2003 - 05:09:41 CET

Original text of this message