Re: repeating groups

From: Marshall Spight <marshall.spight_at_gmail.com>
Date: 19 Feb 2006 01:54:08 -0800
Message-ID: <1140342848.370220.311130_at_g43g2000cwa.googlegroups.com>


dawn wrote:
> Marshall Spight wrote:
> > > >> Sure. But there are several ways out of the repeating groups problem
> > > >>
> > > >> 1) decomposing relations, aka "classical" 1NF
> > > >> 2) higher-than-1 cardinality attributes: lists or sets
>
> You either have to toss (nested) bags in there

I see no reason to add bags. In fact, I'm not even sure I believe a bag is an actual data structure. It seems more like a traversal strategy on a set. What makes you think you need bags?

> or have me referring to lists and ordered lists as if they
> were conceptually different even if implemented identically.

Why?

> It works OK for me to handle nested sets,
> bags, and lists all as lists.

I believe it works "OK"; which is to say, mostly except for some of the time. I have higher aspirations than that, though.

It works *better* to handle lists as lists and sets as sets.

> If the system provides nested lists with
> an arbitrary number of attributes, my experience is that the user can
> distinguish whether the ordering is relevant to them or not.

Sure. Most of the time. See above.

Also, the user is not the only entity in play here. There is also, for example, the optimizer. If you tell the optimizer something is a list, (even though, in your head, it's a set) the optimizer is going to faithfully and dutifully preserve the (meaningless) order even though it doesn't need to. Same thing for the query executor. You may have to give up on parallelization.

The better you can communicate your intent via the code, the better the system will work, both for the programmers and for the "thinking stone."

> > Complicating the algebra should give us pause, but I now believe
> > it is something that we have to do. Lists are essential; sets are
> > essential; I am not willing to admire a language that doesn't make
> > both of them first class. I am not convinced there is any other
> > structure that we need to give such importance to. And lists and
> > sets need to interoperate. Hence we have to have the unified
> > algebra.
> >
> > I've been thinking about this problem for a while. A half-assed
> > job is remarkably easy. What is remarkably hard is to actually
> > do a good job.
>
> I suspect that the existing implementations which serve users very well
> would be considered in the former category.

If you think "half assed" is serving users very well, then it's possible
you just don't know any better alternatives. Classic example, C++ programmer who has never tried Java but is convinced that garbarge collection has nothing to offer him. He feels that manual memory management serves him very well. Then he switches to Java for some reason and discovers he was just reacting from ignorance. (Note that the above story is mine, as well as thousands of others'.) And Java isn't even really all that good compared to what's possible.

> They were built before
> discussions of relational algebra. I'd be curious what aspects of the
> interface to the database would be considered awful (lots I'm sure) by
> those trying to implement a mathematical theory.

Corruption-prone many-to-many handling, and lack of declarative integrity constraints, just to start. :-) How about we throw in lack of data independence, just for fun?

> I'd be interested in
> learning of and discussing those features which developers think are
> great, but database theorists say are wrong.

If you find any, let us know.

> MUMPS has n levels of nesting where PICK has 2 built in: multivalues
> and subvalues (within the multivalue). This mathematically arbitrary
> limit relates quite well to language. Propositions often have lists
> and occasionally have lists within lists.

It's probably easier to implement as well. But it's also a mostly solution, and I'm more interested in totally solutions.

> > But there is the existence proof for the usefulness of one level
> > of additional structure, and that is the special handling given to
> > the type list-of-character. Still, I favor the recursive type
> > approach. (I actually prefer to say "inductive".)
>
> We are looking to improve the API for developers.

By which I understand you to mean we are looking to improve programming languages. "API" denotes library functions, which I'm pretty sure is not what you mean.

> Modeling with two
> levels of nesting (and possibly even one) gives benefits to those using
> the interface and is not overly restrictive.

Sure. But it is *somewhat* restrictive, as well as arbitrary. These should not be design goals.

> Think how complex it
> would be for those working with the data to work with 50 or 500 levels
> of nesting.

I don't see how that's relevant. If the data requires that many levels of nesting, then it doesn't matter how complex it is for the developers;
that's what the problem demands. On the other hand, if the problem does not demand it, then don't do it.

Anyway, even when given the tools to nest relations, you can avoid the nesting by using first normal form. (Consider that the return volley from the "theory not interested in being practical" gibe. :-)

> Forcing them to restructure the data after 2 (rather than
> before 1 as with SQL-DBMS's today) seems like a reasonable constraint
> on the design, even if not mathematically elegant. I fully understand
> the impulse to go from 0 to n rather than a seemingly arbitrary 2,
> however.

Marshall Received on Sun Feb 19 2006 - 10:54:08 CET

Original text of this message