Re: repeating groups

From: dawn <dawnwolthuis_at_gmail.com>
Date: 19 Feb 2006 12:49:26 -0800
Message-ID: <1140382166.489659.19990_at_z14g2000cwz.googlegroups.com>


Marshall Spight wrote:
> dawn wrote:
> > Marshall Spight wrote:
> > > > >> Sure. But there are several ways out of the repeating groups problem
> > > > >>
> > > > >> 1) decomposing relations, aka "classical" 1NF
> > > > >> 2) higher-than-1 cardinality attributes: lists or sets
> >
> > You either have to toss (nested) bags in there
>
> I see no reason to add bags. In fact, I'm not even sure I
> believe a bag is an actual data structure. It seems more
> like a traversal strategy on a set. What makes you think
> you need bags?

Your logical predicates and related propositions can then look very much like the language they are modeling.

E.g. John flipped the coin and got heads, heads, tails, heads. Mary's majors have been math, philosophy, business, math.

We will always be entering data in some order and showing it in some order, so you can call these both lists, but if the user doesn't care about the order, it is just a bag to them. Often the user cares about the "top" value and wants to retain the others but doesn't end up using the ordering of those for any purpose even though they are fine with keeping them in order.

> > or have me referring to lists and ordered lists as if they
> > were conceptually different even if implemented identically.
>
> Why?

Because logical nested bags, sets, and lists are all implemented as lists, if I want to indicate that the ordering matters, I call it an ordered list. The software need not care about this distinction, which perhaps means that you need not care about it. I don't work exclusively within the world of the software, but go from user and their requirements to logical data model.

> > It works OK for me to handle nested sets,
> > bags, and lists all as lists.
>
> I believe it works "OK"; which is to say, mostly except for
> some of the time. I have higher aspirations than that, though.

But you gain some conceptual simplicity for users this way, so it could be the optimum solution.

> It works *better* to handle lists as lists and sets as sets.

I'm not sure about that.

> > If the system provides nested lists with
> > an arbitrary number of attributes, my experience is that the user can
> > distinguish whether the ordering is relevant to them or not.
>
> Sure. Most of the time. See above.

Then I will add "and the added complexity and potential need to switch between such types might negate any benefits there would be in making a distinction between them in the software."

> Also, the user is not the only entity in play here.

But maybe the one that has been ignored for too long

> There is also,
> for example, the optimizer. If you tell the optimizer something
> is a list, (even though, in your head, it's a set) the optimizer
> is going to faithfully and dutifully preserve the (meaningless)
> order even though it doesn't need to. Same thing for the
> query executor. You may have to give up on parallelization.

I have little knowledge of how the current products using a model I like implements these things, but given that it works well for the user, we don't have to theorize about whether it is possible to have a product that does it this way. I only hear of optimizers discussed in the RM world and I don't know why.

> The better you can communicate your intent via the code,
> the better the system will work, both for the programmers
> and for the "thinking stone."
>
> > > Complicating the algebra should give us pause, but I now believe
> > > it is something that we have to do. Lists are essential; sets are
> > > essential; I am not willing to admire a language that doesn't make
> > > both of them first class. I am not convinced there is any other
> > > structure that we need to give such importance to. And lists and
> > > sets need to interoperate. Hence we have to have the unified
> > > algebra.
> > >
> > > I've been thinking about this problem for a while. A half-assed
> > > job is remarkably easy. What is remarkably hard is to actually
> > > do a good job.
> >
> > I suspect that the existing implementations which serve users very well
> > would be considered in the former category.
>
> If you think "half assed" is serving users very well, then it's
> possible
> you just don't know any better alternatives.

I don't think it is h-a, but I think you might consider it such.

> Classic example, C++
> programmer who has never tried Java but is convinced that garbarge
> collection has nothing to offer him. He feels that manual memory
> management serves him very well. Then he switches to Java for
> some reason and discovers he was just reacting from ignorance.
> (Note that the above story is mine, as well as thousands of others'.)
> And Java isn't even really all that good compared to what's possible.
>
>
> > They were built before
> > discussions of relational algebra. I'd be curious what aspects of the
> > interface to the database would be considered awful (lots I'm sure) by
> > those trying to implement a mathematical theory.
>
> Corruption-prone many-to-many handling, and lack of declarative
> integrity constraints, just to start. :-) How about we throw in lack
> of data independence, just for fun?

OK, so we have your starter list and my starter list (1NF and 3VL) and I'll throw in proprietary validation specifications (requiring duplication in front-end code) into mine.

>
> > I'd be interested in
> > learning of and discussing those features which developers think are
> > great, but database theorists say are wrong.
>
> If you find any, let us know.

You started a list above.

> > MUMPS has n levels of nesting where PICK has 2 built in: multivalues
> > and subvalues (within the multivalue). This mathematically arbitrary
> > limit relates quite well to language. Propositions often have lists
> > and occasionally have lists within lists.
>
> It's probably easier to implement as well. But it's also a mostly
> solution, and I'm more interested in totally solutions.

I agree I want a better solution than either RM or MV, but think we should start with the data model that already starts off with non-1NF and 2VL and move forward from there.

> > > But there is the existence proof for the usefulness of one level
> > > of additional structure, and that is the special handling given to
> > > the type list-of-character. Still, I favor the recursive type
> > > approach. (I actually prefer to say "inductive".)
> >
> > We are looking to improve the API for developers.
>
> By which I understand you to mean we are looking to improve
> programming languages. "API" denotes library functions, which
> I'm pretty sure is not what you mean.

It actually is what I mean. A language is fine for command line work, but why not give the developer a application programming interface to work with the database? That is where the developer lives -- with languages and libraries.

This is OT and I might be out to lunch on this (too ;-) since I don't follow .NET, but I read something that made me think that MS decided against using XQuery in favor of using the features of each .NET programming language so the developers use the current language and don't have to build XQuery statements like people now build SQL statements. They could then execute the same query against data in memory or in a dbms. This is getting OT and perhaps I'm really out to lunch on it, but to answer your question, yes, I meant API.

> > Modeling with two
> > levels of nesting (and possibly even one) gives benefits to those using
> > the interface and is not overly restrictive.
>
> Sure. But it is *somewhat* restrictive, as well as arbitrary. These
> should not be design goals.

I'm addressing the "Ease of use" requirement with conceptual simplicity.

> > Think how complex it
> > would be for those working with the data to work with 50 or 500 levels
> > of nesting.
>
> I don't see how that's relevant. If the data requires that many levels
> of nesting, then it doesn't matter how complex it is for the
> developers;

smiling.

> that's what the problem demands.

As you can tell from the myriad of SQL implementations, there are many ways to squeeze the data into various formats. The data and related requirements do not come to us with their logical data model in tow. We design that. Two expert RM modelers will not come up with identical designs after talking to the same users.

My question is about the tools (API, language) developers have to work with. How clever vs straight-forward do they need to be? There is one type of simplicity with the RM (and please remember that I was a believer for quite some time), but the simplicity of aligning the propositions with language (in a sense) provides another type of conceptual simplicity that has advantages.

> On the other hand, if the problem
> does not demand it, then don't do it.
>
> Anyway, even when given the tools to nest relations, you can
> avoid the nesting by using first normal form. (Consider that
> the return volley from the "theory not interested in being practical"
> gibe. :-)

Yes. The developer will use the tools they have to figure out a solution. I want to give them tools that make them as productive as feasible for the long haul, addressing all other risks and quality requirements too, of course. Cheers! --dawn Received on Sun Feb 19 2006 - 21:49:26 CET

Original text of this message