Re: bags vs. sets

From: Vadim Tropashko <vadimtro_invalid_at_yahoo.com>
Date: 15 May 2007 16:45:39 -0700
Message-ID: <1179272739.144062.266650_at_o5g2000hsb.googlegroups.com>


On May 15, 3:38 pm, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:
> Marshall wrote:
> > We regularly deride SQL for using bags rather than sets.
> > Now, I get many of the reasons we point and laugh at SQL,
> > but I'm not so clear on this one.
>
> > We could have a language that covered sets, bags, and lists.
> > We could have a language that used bags as the basic
> > collection type and built lists and relations out of that.
> > It's not hard: a relation is a bag with a uniqueness constraint.
> > A list is a bag with a uniqueness constraint on a column
> > that belongs to the natural numbers. (Details glossed over.)
>
> > OTOH, I'm less clear how to model a bag with relations.
>
> A bag depends on physical location for identifying elements. Because
> elements in relations have no particular physical location, the very
> idea seems a little absurd.

Compared to sets bags occur in math very rarely. Yet there are some prominent cases such as equation roots. Consider:

x*(x-1)*(x-1) = 0

The root x=1 is counted with its multiplicity 2.

> > Bags also show up in things like aggregation. The
> > sum of a column of a relation is the fold of + over
> > the bag-projection over that column.
> At the internal physical level, physical location is perfectly
> acceptable for identifying elements. It's just not such a good thing at
> the external logical level.

The question is if counting fits naturally into logical level. It may be argued that combinatorics, arithmetics, and many other areas escape logical level. Bags are not idempotent, perhaps they are better suitable for counting.

> Internally, dbmses have to deal with streams and blocks and arrays and
> all sorts of physical structures. I don't see what the problem is. As
> long as the dbms manages the physical aspects without burdening the
> user, what's the problem?

My guess is that Marshall asked this question in the context of general-purpose relational programming language. The main question is if we want to expand relational algebra to embrace structures other than relations, or fit everything into relations. Arguably, fitting everything into relations sometimes gives quite an awkward solutions.

Consider languages and grammars. They have 2 fundamental operators: join and union. While union is very similar in its properties to the relational union, the join is very different. And language intersection is no longer a join as in relational case. Sure we can model words (which are lists of symbols) as relations by introducing an auxiliary ordering number attribute, but this is IMO too awkward to result into anything practical. As an exersize, one may try expressing list join relationally. Besides the fact that one has to do very artificial manipulation with ordering numbers, the expression for list join doesn't look particularly exciting. Received on Wed May 16 2007 - 01:45:39 CEST

Original text of this message