Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

From: Jan Hidders <hidders_at_hcoss.uia.ac.be>
Date: 25 Feb 2003 17:26:27 +0100
Message-ID: <3e5b9933.0_at_news.ruca.ua.ac.be>


In article <51d64140.0302250629.37501202_at_posting.google.com>,

Paul <pbrazier_at_cosmos-uk.co.uk> wrote:

>jan.hidders_at_REMOVE.THIS.ua.ac.be (Jan Hidders) wrote in message
>news:<3e4bca2b.0_at_news.ruca.ua.ac.be>...
>> No. In fact, in theory, all optimizations that can be done in a set-based
>> algebra can also be done in a bag-based algebra but not the other way
>> around.
>
>We can define a bag [a,a,a,b,c,c,d] as the set {(a,3),(b,1),(c,2),(d,1)}
>where a,b,c,d are distinct. I assume this is the standard definition?

Yes, it is.

>So doesn't any bag algebra have a isomorphic algebra of sets of the form:
>{(a1,n1),(a2,n2),(a3,n3), ...} where a1,a2,a3,... are distinct members of
>our domain set and n1,n2,n3,... are natural numbers (not necesarily
>distinct)?

Yes, it has. But how are you going to express in your algebra that you are not going to eliminate duplicates immediately after a projection?

>The other thing I still don't understand:
>The underlying interpretation of a relvar is a predicate, with the
>rows being propositions. How can it make sense for a relvar to contain
>any proposition more than once? Does the bag-based "relational"
>algebra have a different starting point and what is it?

It's starting point is roughly that we are just talking about a data structure here and that it is up to the user to decide what it means. One possible interpretation is that you are talking about entities which can be distinguished but not by attributes that are stored in the database.

>Should a bag-based SQL have some sort of extensions to cope with duplicate
>rows? For example if I had n identical rows in a relvar and I wanted to
>update m of them (m < n) it should have the syntax to say "only update a
>maximum of m identical rows" (it wouldn't matter which ones because they
>are identical). Or if I wanted to delete one of them there would be a
>corresponding DELETE extension. At the moment I can only delete all or
>nothing.

Yes, it should. Even if you look at SQL as a bag calculus, it is very inadequate.

>I'm having trouble working out which of the pro-bag posters are just
>playing devils advocate or trolling and which are genuine (if any).

If you read my postings carefully you will notice that I have been consistently arguing that the issue is more complicated than being pro-bag or contra-bag. Questions such as "are bags in SQL a good idea?", "should we allow bags in the logical data model?" and "should we use a bag algebra for query optimization?" are related but not exactly the same. Moreover, if someone thinks that pointing out that some of Date's contra-bag arguments are not entirely correct or very convincing means that I am pro-bag, then that is just sloppy thinking.

>I assume from the cited papers that it must be a legitimate area of
>research (the algebra of a particular family or sets) but I can't really
>see its applicability to relational theory.

In relational theory they have their use in query optimization. Besides, relational theory is not the only database theory around.

Kind regards,

  • Jan Hidders
Received on Tue Feb 25 2003 - 17:26:27 CET

Original text of this message