Re: 3vl 2vl and NULL

From: dawn <dawnwolthuis_at_gmail.com>
Date: 17 Feb 2006 04:07:55 -0800
Message-ID: <1140178075.904221.157070_at_g44g2000cwa.googlegroups.com>

Marshall Spight wrote:
> I'm in an editor kind of mood, having been doing book reviews lately.
> I'm appointing myself editor of the post I'm replying to.

Having read it, I see you would like me to treat my conversation here more like a published work. It takes much longer to write something short and suscinct, and I do try, but will give that more effort.

> dawn wrote:
> > Marshall Spight wrote:
> > >
> > > Wait-- are you saying you *advocate* denormalization? Can you clarify
> > > this please?
> >
> > Yes. I advocate data not being in 1NF (as indicated in the Is Codd
> > Dead? blog entry I've mentioned before). When you move from PICK to a
> > SQL-DBMS, you have to split out the data into 1NF.
>
> Okay, bad phrasing on your part then. "Normalization" is the entirety
> of a bunch of things, 1NF being the weakest of them.

Codd was rather clear on the meaning of the term originally, (as laid out in http://www.tincat-group.com/mewsings/2006/01/is-codd-dead.html ) and it relates ONLY to the issue of "repeating groups." Normal forms such as 2NF, 3NF, and BCNF all require first that the data be normalized (aka 1NF). If you wish to put a different definition of "normalized" on the table, feel free, but this one is useful.

> So to communicate
> this concept, you shouldn't say you're against normalization.

I am using the meaning of the term when the term was coined for use with data. I can point to the appropriate references each time I use the term so that there is not misunderstanding. We have the same problem with 1NF as Date now has that include relation-valued attributes.
<snip>

> Okay, non-1NF is one thing. "Ordered lists" is redundant; just say
> "lists".

Yes, I could say "lists" but you are not right that it is redundant. Lists are ordered within the computer. Ordered lists are lists where the ordering has meaning.

[Example that I am not taking the time to edit: If a data entry form permits entry of all countries in which a person has lived, permitting the user to select them in any order (which might or might not be retained) but only specifying any given country one time, then that list as entered by the user has an order, but the order has no meaning. If this list as entered by the user is a list of the countries, in order, in which they have lived, such as Canada, USA, Canada, The Netherlands, then this is a list and the order has meaning. That is what I am referring to as an ordered list. The computer does not know the difference. The user does.]

> Lists is a second thing.

>From a practical standpoint, I am OK with combining these so that there
are no relation-valued attributes, but lists (ordered or not) as attribute values. It might be better to have the option of rva or list, but both XML and MV retain an ordering in nested elements and that works for me. Neither of them provides features for making clear whether the user cares about the ordering. That means we are not capturing the full semantics in the model.

> 2VL is a third thing. This is a good
> list; it is specific. But above you said "I'm trying to convince
> 'the industry' to adopt more flexible (dare I say 'agile') data models"
> but then you followed it up with an entirely unrelated list of desired
> features. Nothing about nested relations, lists, and 2VL says "agile".

They are not unrelated. It is my starter list, my top 2 (or 3 by your count). Since I'm starting from looking at what works today as an agile approach and trying to work backwards into what features might be accounting for this, I'm sure I will add to my list in the future. My example of e-mail addresses gives a hint at what types of flexibility having lists adds to the mix.

> Probably you should drop the "agile" and focus instead on the specific
> additional capabilities you want.
> This would be more concrete.

Yes, just saying you want maintainability, scalability, agility, or any other ility is typically understood by people, but at a higher level of generalization than any specific functional requirement.

> It
> would give your reader a clearer idea of what you are thinking.

Maybe the industry should have done that instead of asking Bill G for "security." Yes, any time an ility can be turned into clear functional requirements, that is good, but we do not want to purge the high level requirements from our discussion or we could lose sight of the goal.

> We get an entirely different idea about a person who says "I want
> a lot of money" than we get from a person who says "I want to
> go out and work hard, and I hope thereby to get a lot of money."
> That first guy is a lazy dreamer; that second guy is going to
> really make something out of himself one day.
>
> The specific features you want come first; the expected/desired
> benefits come second,

I would think the high level requirement would come first. From a requirement of maintainability/flexibility/agility, I have started to select functional requirements (non-1NF and 2VL).

<snip>
> > The two I'm starting with are 1NF and 3VL.
>
> That's a good sentence! That communicates a specific idea, and
> it has the benefit of being short.

Thank you, master. I recall that I took the time to make that sentence short.

>
> > > Note that I am not asking for "proof" of anything. Proof of
> > > cost/benefit is not possible
> >
> > agreed.
> >
> > > and I suggest you abandon that search.
> >
> > It isn't proof, but don't you think some emperical data would be
> > helpful?
>
> Sure it would. I expect for only ten million dollars or so, you
> could set up a halfway decent comparison. If you are actively
> soliciting a business school to do this, it is worth discussing
> in this post. Otherwise not. Again, I suggest this line of
> argument be dropped as non-productive.

The fact that no such emperical data were collected at the start of moving a large portion of the industry over to the relational model might have been a mistake. This seems like a newsgroup where someone might have a eureka moment on how to test a theory in this way. Should all of our tests assume our theory (e.g. mathematical proofs within a particular model)?

That said, OK, Marshall, I hear you that much of this audience might have no interest in testing the usefulness of theories in this way. I am often talking about theories or about applying theories rather than doing work within theories. Maybe I should be on comp.databases.about.employing.theories. If this news group is comp.database.must.sit.inside.relational.theory, then I'm in the wrong place.

Cheers! --dawn Received on Fri Feb 17 2006 - 13:07:55 CET

This message: [ Message body ]
Next message: dawn: "Re: Data Redundancy"
Previous message: x: "Re: 3vl 2vl and NULL"
In reply to Marshall Spight: "Re: 3vl 2vl and NULL"
Next in thread: Marshall Spight: "Re: 3vl 2vl and NULL"
Reply: Gene Wirchenko: "Re: 3vl 2vl and NULL"
Reply: Marshall Spight: "Re: 3vl 2vl and NULL"
Reply: Marshall Spight: "Re: 3vl 2vl and NULL"
Reply: David Cressey: "Re: 3vl 2vl and NULL"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message