Re: Base Normal Form

From: dawn <dawnwolthuis_at_gmail.com>
Date: 12 Jul 2005 04:09:20 -0700
Message-ID: <1121166560.512133.178300_at_f14g2000cwb.googlegroups.com>


Jan Hidders wrote:
> dawn wrote:
> > Jan Hidders wrote:
> >
> >>dawn wrote:
> >>
> >>>Jan Hidders wrote:
> >>>
> >>>>It's really not a good idea to confuse the concept of
> >>>>relation and function.
> >>>
> >>>I definitely agree!
> >>
> >>And yet that seems to be the consequence of what you are proposing. I
> >>could be wrong but I interpreted your suggestion as that every relation
> >>with a candidate key is a function.
> >
> > Yes, that is what I am saying, but I'll try to say it better.
>
> Ok. That might well solve it because basically my point was that your
> terminology was too sloppy for my taste.

I'll accept that. When applying a mathematical model within a field that has taken mathematical terms and given them new definitions is problematic. A candidate key of a relation could be modeled as the domain of a function -- gotta love language, eh?

> > Every database relation can be modeled as a mathematical relation.
>
> Agreed. From a mathematical point of view it doesn't make much sense to
> distinguish the two.

I suspect that what you mean here is that when you are doing any mathematics, such as set operations, with relations, you are using the mathematical definition of the relation. Could we possibly rephrase your statement as "when working with the mathematical model of the relation, we work only with the mathematical definition of a relation"?  We can ditch the database relation definition (we need not care about meaningful names for attributes and ensuring no ordering of such, for example) when doing set operations. So, when do we need to use the database definitions for relation? Could we use terms like "table", "folder" or even "file" (gasp) and stop using muddying things by using definitions of "relation" that both add to and subtract from the mathematical relation definition?

> > Every database relation that has a candidate key can be modeled as a
> > mathematical function as well.
>
> Sure. But let me emphasize once more that *every* database relation has
> at least one candidate key. This is not up for discussion, it's a simple
> mathematical fact.

I'll buy that it is a fact by definition. That is yet another way that the term "relation" has been butchered. If every database relation has at least one candidate key, then if we choose to model that relation as a mathematical relation, then our mathematical relation has candidate keys.

> I suspect that the case that you are overlooking is
> the one where that candidate key happens to be the set of all columns,
> in which case the function you would associate with it is probably the
> function that maps each tuple to itself.

I wasn't overlooking that case (since I'm working from an existing model) but there are two different functions that are used with it as in these examples:

Person("12345") = ("John", "Smith", "M") and
Person("12345") = ("12345", "John", "Smith", "M")

So in the case of a candidate key consisting of all attributes, both functions

f(CK) = CK
g(CK) = null set

are both useful. Sloppy notation (where is the tuple or set notation?) is on purpose

>
> By the way, which function do you chose? If I have R(A,B,C,D) with
> candidate key AB then do we take function AB->ABCD or AB->CD? You seem
> to prefer the first. Do you have a specific reason for that?

I guess I just answered that. Both are useful. I prefer AB->ABCD because the entire tuple, which could have candidate keys other than AB, is together, thereby getting the benefit of the function model and the relation.

> >>Since every relation has at least
> >>one candidate key
> >
> > I don't have a handy list of all definitions of "relation" handy, but I
> > thought that some such definitions did not require candidate keys,
> > although I suspect most modern ones do.
>
> It doesn't have to be required, it usually follows from the fact that
> the relation is a set.

You lost me here. How do you define a candidate key in mathematics so that every set must have one?

> > Did that clarify? Does it make sense? Do you accept it?
>
> I have no problem with statements like "I can model every relation with
> a candidate key CK as a function that maps a value for the CK to a tuple
> of that relation".

And that is what I'm saying with the additional statement that I find it useful to do so.

> But for me that is very different from saying that
> the two concepts are the same.

I'm not sure if/where I might have said that. My intent was to answer the question of what else people might call this idea of "base normal form" when teaching these concepts once you have a table and at least one candidate key. I use the term "function" and show how you have a function mapping key values to tuple values. This makes the concepts very simple, perhaps too simple for some.

> Such a formulation also does more justice
> to the fact that there might in fact be several candidate keys which all
> define different functions.

Absolutely. It makes for a good way to introduce multiple candidate keys into the discussion -- multiple mappings/functions.

> > In spite of the fact there are different definitions for "database
> > relation" and "mathematical relation" I still think it helps to teach
> > the basics of data modeling by starting with language,
> > deriving/creating appropriate predicates and corresponding tables, then
> > modeling these as functions. Functions seem easier for people to grasp
> > than relations, so I start with sets & functions.
>
> I've heard similar arguments before from people who think that when you
> explain normalization you should first explain things under the
> assumption that there is only one candidate key.

I don't think such an assumption is important, but I can see starting with a single key, one composed of a single attribute, to introduce the topic of candidate keys, then lead up to more than one. Functions are an easy way to illustrate that, going from one simple key to multiple composite candidate keys.

> That one-key assumption
> also seems to be hovering in the background of your terminology.

Yes, in that each function only models one of the candidate keys. But, a candidate key is then incorporated into that function and corresponding notation. You can see that you have a key when you are modeling your db relation as a function. The relation can be modeled as multiple functions, but some are more useful than others in any given situation.

> In my
> experience there is always a backlash once you get to the 3NF and BCNF
> because there you have to deal with more than one candidate key. That
> means that the student then has to make two big conceptual jumps at
> once: understanding >1 cand. keys, and BCNF.

The non-SQL-RDBMS implementation that I work with most often does fix a single candidate key as the "primary key" but I think of this approach as being implementation-independent, so I'll reflect on that again.

> Note by the way that the link between functional dependencies and
> interpreting relations as functions is very close. If the relation
> R(A,B,C,D,E) has the dependencies AB->C and BD->E and ABD->C the I
> wouldn't say it represents a function, I would say it represents three
> functions.

YES, YES, YES -- SO WOULD I!!! It's all about functions and it's all about data (so can we drag that "relations" word to the trash when introducing these concepts and make it simple?)

Cheers! --dawn

> -- Jan Hidders
Received on Tue Jul 12 2005 - 13:09:20 CEST

Original text of this message