Re: repeating groups

From: Marshall Spight <marshall.spight_at_gmail.com>
Date: 19 Feb 2006 16:20:55 -0800
Message-ID: <1140394855.387165.242480_at_z14g2000cwz.googlegroups.com>


dawn wrote:
> Marshall Spight wrote:
> >
> > I see no reason to add bags. In fact, I'm not even sure I
> > believe a bag is an actual data structure. It seems more
> > like a traversal strategy on a set. What makes you think
> > you need bags?
>
> Your logical predicates and related propositions can then look very
> much like the language they are modeling.

This is a non-goal for me. We are doing math here, not literature. I love literature, but it does not inform the field of data modelling.

> E.g. John flipped the coin and got heads, heads, tails, heads.
> Mary's majors have been math, philosophy, business, math.

Neither of these are well-formed propositions as I understand the term.

Either:
John flipped some coins and got heads 3 times, tails 1 time.

   or
John flipped some coins and got [heads, heads, tails, heads].

Those are propositions.

Either you care about the order or you don't.

Can you think of some operation you want to do that couldn't (easily) be satisfied by one of the above? Up to and including "implement bag operations."

> We will always be entering data in some order and
> showing it in some order,

Usually, not always. And even so, this fact doesn't constrain our data models. Unless perhaps you are only interested in a model designed to support data entry and display.

> so you can call these both lists, but if the user doesn't care
> about the order, it is just a bag to them.

Assuming we agree that sometimes we want order and sometimes not, we would also presumably agree we want data structures for both cases. Would you rather have the set {list, bag} or the set {list, set} available to you? I would prefer the latter.

> Often the user cares about
> the "top" value and wants to retain the others but doesn't end up using
> the ordering of those for any purpose even though they are fine with
> keeping them in order.

Sure. You could handle this easily with either a list or a set, although the list would over-specify the information the user cares about. Actually a bag seems the least useful for this example.

> > > or have me referring to lists and ordered lists as if they
> > > were conceptually different even if implemented identically.
> >
> > Why?
>
> Because logical nested bags, sets, and lists
> are all implemented as lists,

Implementation doesn't matter logically; that's pretty much the definition of implementation.

> if I want to indicate that the ordering matters, I call it an
> ordered list.

Sure. One needs lists; we agree on this. The question is whether one needs bags. I claim not.

(Again, the canonical term for this is "sequence" or "list." Since "unordered list" is a contraction in terms, "ordered list" is redundant. You might just as well say, "ordered, ordered list.")

> > > It works OK for me to handle nested sets,
> > > bags, and lists all as lists.
> >
> > I believe it works "OK"; which is to say, mostly except for
> > some of the time. I have higher aspirations than that, though.
>
> But you gain some conceptual simplicity for users this way, so it could
> be the optimum solution.

Well, I've got 20+ years experience using system that support only ordered data as primitive, (Fortran, C, C++, Java, etc.) and I am confident that this solution isn't enough. It is definitely *not* optimum. As I have said before, I want lists *and* I want sets. Lists have a great bang-for-the-buck, but they don't have as much bang as sets do.

(BTW, who are these users we're talking about? Are they programmers? Since this is a theory newsgroup, one can assume we're discussing systems built for trained/educated people, and we don't have to truncate the system prematurely, cutting corners so we don't have to make them think too hard.)

> > It works *better* to handle lists as lists and sets as sets.
>
> I'm not sure about that.

I am.

Also, I am hard pressed to imagine a convincing argument that says, instead of treating X as X and Y as Y, just treat everything as X and that's best.

The most straightforward, the most conceptually simple thing to do, is to handle lists as lists and sets as sets. Consider: we can also use sets to model lists. We can, in fact, use sets to model everything. How happy are you with that approach?

> > > If the system provides nested lists with
> > > an arbitrary number of attributes, my experience is that the user can
> > > distinguish whether the ordering is relevant to them or not.
> >
> > Sure. Most of the time. See above.
>
> Then I will add "and the added complexity and potential need to switch
> between such types might negate any benefits there would be in making a
> distinction between them in the software."

Honestly, how hard is it to say of a collection, do we care about the order or not? Are you saying that a professional who can handle doubly nested loops, learning complicated libraries, and data modelling,
can't handle the awesome responsibility of deciding whether a collection is a list or a set? It seems a less difficult choice than deciding between, say, and int and a long.

> > Also, the user is not the only entity in play here.
>
> But maybe the one that has been ignored for too long

I see no evidence that they have ever been ignorned. And if they had, what would it matter? What matters is getting the right answer to this question.

> > There is also,
> > for example, the optimizer. If you tell the optimizer something
> > is a list, (even though, in your head, it's a set) the optimizer
> > is going to faithfully and dutifully preserve the (meaningless)
> > order even though it doesn't need to. Same thing for the
> > query executor. You may have to give up on parallelization.
>
> I have little knowledge of how the current products using a model I
> like implements these things, but given that it works well for the
> user, we don't have to theorize about whether it is possible to have a
> product that does it this way. I only hear of optimizers discussed in
> the RM world and I don't know why.

One also hears of it in the compiler world.

Anyway, you keep saying that thing about how it works well for users. How does that matter? Maybe these users just don't know any better. Should I then give up on my interest in something that works better? "Good enough is the enemy of better" as they say. Perhaps you are the Good-Enough Adversary, trying to tempt me away from improving the lot of the programmer. (Not serious.)

I am fairly sure I would be unhappy to have to hand-optimize everything I do.

> I don't think it is h-a, but I think you might consider it such.

Ah, I see.

> > Classic example, C++
> > programmer who has never tried Java but is convinced that garbarge
> > collection has nothing to offer him. He feels that manual memory
> > management serves him very well. Then he switches to Java for
> > some reason and discovers he was just reacting from ignorance.
> > (Note that the above story is mine, as well as thousands of others'.)
> > And Java isn't even really all that good compared to what's possible.
> >
> >
> > > They were built before
> > > discussions of relational algebra. I'd be curious what aspects of the
> > > interface to the database would be considered awful (lots I'm sure) by
> > > those trying to implement a mathematical theory.
> >
> > Corruption-prone many-to-many handling, and lack of declarative
> > integrity constraints, just to start. :-) How about we throw in lack
> > of data independence, just for fun?
>
> OK, so we have your starter list and my starter list (1NF and 3VL) and
> I'll throw in proprietary validation specifications (requiring
> duplication in front-end code) into mine.

Well, actually it's the positive list rather than the negative list that interests me more. If I logically negate the entries in your list, I get: nested structure and 2VL. These are on my list as well.

> > > I'd be interested in
> > > learning of and discussing those features which developers think are
> > > great, but database theorists say are wrong.
> >
> > If you find any, let us know.
>
> You started a list above.

There is no uniformity of opinion on the value of 3VL; many writers criticize it brutally. Similarly, 1NF is not universally thought of as perfect. The things you are mentioning are features of SQL, not features advocated by database theorists everywhere. There is a lot of difference between the two.

> I agree I want a better solution than either RM or MV, but think we
> should start with the data model that already starts off with non-1NF
> and 2VL and move forward from there.

I completely agree that non-1NF and 2VL are must-have features.

> > > We are looking to improve the API for developers.
> >
> > By which I understand you to mean we are looking to improve
> > programming languages. "API" denotes library functions, which
> > I'm pretty sure is not what you mean.
>
> It actually is what I mean. A language is fine for command line work,
> but why not give the developer a application programming interface to
> work with the database? That is where the developer lives -- with
> languages and libraries.

The language designer puts the most important stuff in the language and everything else in the libraries. I consider this stuff really important; to me it's a language issue. If you don't consider it important, you put in an API.

> This is OT and I might be out to lunch on this (too ;-) since I don't
> follow .NET, but I read something that made me think that MS decided
> against using XQuery in favor of using the features of each .NET
> programming language so the developers use the current language and
> don't have to build XQuery statements like people now build SQL
> statements. They could then execute the same query against data in
> memory or in a dbms. This is getting OT and perhaps I'm really out to
> lunch on it, but to answer your question, yes, I meant API.

If you're referring to Linq, I'd describe it as a language extension. It's not just more APIs. Anyway, I'm convinced that no good API to the DBMS can ever do as good a job as a language; I and a thousand other developers have tried and failed. Those few API solutions that get any traction are the ones that embedd a language inside them.

> > > Modeling with two
> > > levels of nesting (and possibly even one) gives benefits to those using
> > > the interface and is not overly restrictive.
> >
> > Sure. But it is *somewhat* restrictive, as well as arbitrary. These
> > should not be design goals.
>
> I'm addressing the "Ease of use" requirement with conceptual
> simplicity.

"Two levels of nesting" appears to me to be less conceptually simple than just "nesting." You have that 2 in there that you don't otherwise! Composition is restricted. The programmer may have to arbitrarily contort his design to fit into the two levels, when three was the exact right number.

> > > Think how complex it
> > > would be for those working with the data to work with 50 or 500 levels
> > > of nesting.
> >
> > I don't see how that's relevant. If the data requires that many levels
> > of nesting, then it doesn't matter how complex it is for the
> > developers;
>
> smiling.

Smiling after you've snipped the strong part of my argument is not nice. :-)

Sometimes we are trying to solve hard, complex problems. Our code and data structures will reflect that difficulty. Think how hard it is for the Java programmer who has to deal with 500 classes. *Sure* it's hard, but it's the *problem* that's hard; that's why someone
had to write 500 classes. Blaming Java won't help.

> > that's what the problem demands.
>
> As you can tell from the myriad of SQL implementations, there are many
> ways to squeeze the data into various formats. The data and related
> requirements do not come to us with their logical data model in tow.
> We design that. Two expert RM modelers will not come up with
> identical designs after talking to the same users.

I agree. But this is a communication and prioritization issue; it tells us nothing about whether our model is any good or not.

> My question is about the tools (API, language) developers have to work
> with. How clever vs straight-forward do they need to be?

Clever is an anti-goal. Straightforward is a goal. Powerful is also a goal, which has a tension with straighforwardness. This gets resolved by looking at bang-for-the-buck, or power-to-weight ratio, or whatever you care to call it.

This metric achieves its maximal value by using sets. :-)

> There is one
> type of simplicity with the RM (and please remember that I was a
> believer for quite some time), but the simplicity of aligning the
> propositions with language (in a sense) provides another type of
> conceptual simplicity that has advantages.

You say you were a believer, but how much SQL did you write?

You keep describing deficiencies in the RM based on your experience, but how do you know these weren't caused by design flaws in SQL?

> Yes. The developer will use the tools they have to figure out a
> solution. I want to give them tools that make them as productive as
> feasible for the long haul, addressing all other risks and quality
> requirements too, of course. Cheers! --dawn

Agreed!

Marshall Received on Mon Feb 20 2006 - 01:20:55 CET

Original text of this message