Re: MV Keys

From: Jon Heggland <heggland_at_idi.ntnu.no>
Date: Wed, 1 Mar 2006 12:05:28 +0100
Message-ID: <MPG.1e6f9e5a3f96c28d98976c_at_news.ntnu.no>


In article <1141180942.125742.288830_at_j33g2000cwa.googlegroups.com>, marshall.spight_at_gmail.com says...
> Jon Heggland wrote:
> > marshall.spight_at_gmail.com says...
> > >
> > > Instead of doing this, we allow attributes of list type.
> > > List! Special cased for character lists only, though,
> > > perhaps so we can better convince ourselves that
> > > we didn't just throw in a non-1NF construct. :-)
> >
> > What definition of 1NF do you use? I'm asking in earnest; there are
> > quite a few different definitions floating around.
>
> I guess I use the definition that says 1NF means no
> attributes of compound types. No list or set typed
> attributes, that is.

So compound == list or set? What about ordered pairs? Or fixed-size arrays? Where do you draw the line, and why? Is it the variable size that is important? Is VARCHAR compound (disregarding "convention"), but CHAR not?

> (Only conventionally we have to
> pretend that varchar is not a compound type if we
> want to use this definition, because, practically
> speaking, you can't live without strings.)

Yes. But are strings/varchar then completely unproblematic? And if so, what makes other kinds of lists (or sets) theoretically problematic?

> 1NF is not something that I think too much, though.

Fair enough. Perhaps I should ask Dawn instead. Dawn, what definition of 1NF do you use? Why do *you* think list attributes breaks the RM?

> Can anyone name any update anomalies that are
> solved by 1NF? It seems like most normal forms
> have associated update anomalies that disappear

> under the normal form.

1NF is really the odd man out of the common normal forms (2NF to 6NF, including BCNF). The others have formal definitions, based on functional dependencies. 1NF is mostly informally defined (except for Date's definition, which however renders 1NF pointless and irrelevant---but then, so is 2NF, and arguably 3NF), and there is little agreement even on the informal definition. I have searched about a dozen textbooks on data modelling for 1NF definitions, and there are basically three camps:

  1. "1NF means only atomic attributes." This of course begs the question: What does "atomic" mean, and why are strings and dates considered atomic? I have seen only one book with this definition that actually makes a point of this crucial question---and it has no good answer.
  2. "1NF means no repeating groups of columns." I.e. no tables like this: Order(OrderID, CustomerID, Item1, Amount1, Item2, Amount2, Item3, Amount3, Item4, Amount4). The possible anomalies here are apparent, but the definition is still informal: It depends on the requirements of the database. If an order always has exactly four items, the design above is not that bad. And what about FootballMatch(MatchID, ..., Team1, Team2)? This definition also has the unfortunate property that a relvar can be in 5NF, but not in 1NF.
  3. "1NF is subsumed in the definition of a relation." Date's position, which means it is really pointless to talk about 1NF in any case. If a "relation" is not in 1NF, it isn't a relation, and normalisation doesn't apply.

It is claimed by some that 1NF is required for the RM to "work". This is trivially the case if you use definition #3; it is obviously *not* the case if you use definition #2. It is more uncertain if you use definition #1, because nobody can define "atomic". My position is that there are no theoretical objections against "non-atomic" attributes (whatever it means), just pragmatical.

Codd in his 1970 paper presents definition #1; I don't know where #2 (which is surprisingly common!) comes from. Codd uses relation-valued attributes (RVAs) as examples of non-atomic attributes, but doesn't mention other kinds of non-atomic domains, as far as I can tell. Moreover, he presents normalisation [to 1NF] as something done for convenience and simplicity, since then a relation can be represented in storage by a 2D array. He does not claim that RVAs are theoretically unsound, afaik, though he says in a later paper that 1NF is "mandatory". If anyone has greater insight in Codd's opinions, I'd be glad to hear about it.

In conclusion, I think definitions #1 and #2 are good database design advice---but informal and situation-dependent, and thus not worthy of the "normalisation" label. To use lists, sets or relations as attributes introduces bias and asymmetry in the database design, as shown by Date in his writings on RVAs: Essentially, you create a hierarchical database if you use them (which, BTW, may help explain why some people think list attributes are a good idea:). This might be called 1NF anomalies, Marshall.

-- 
Jon
Received on Wed Mar 01 2006 - 12:05:28 CET

Original text of this message