Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)

From: Neo <neo55592_at_hotmail.com>
Date: 26 May 2004 16:36:38 -0700
Message-ID: <4b45d3ad.0405261536.4deac675_at_posting.google.com>


> > > Here are Date's formal definitions for the five normal forms ...
> >
> > The problem ... is that they are RM specific.
>
> Exactly how is that a "problem"?

The problem is, when I used the term normalized, I meant it in the more general sense which can be applied to any data model [replace redundant data with refs (system supported links that are unrelated to data ie IDs in RM) to the one and only original within db). But I can see how the term caused confusion.

> You didn't set a challenge to make a report using a model that is
> normalized according to XDb1's own definition of normalisation and then
> kludged in an RDBMS.

XDb1 doesn't define the general form of normalization. Neither does RM. But the general form of normalization applies to all data models. I want RM's solution to be normalized in the general sense of the word (which apparently is a superset of the 5/6 "normal forms"). Even C. J. Date states "the purpose of such reduction is to avoid redundancy" in his chapter titled "Further Normalization I: 1NF, 2NF, 3NF, BCNF".

> Table hierarchies
> thing otherThing hierarchy
> ------ ---------- ---------
> 'fido' 'john' 'leader'
> 'fido' 'mary' 'leader'
> 'fido' 'luke' 'leader'
> My solution had no redundancy ... but my point should already be clear:
> there is no redundancy. You seem to be thinking that
> storing the character strings 'fido' and 'leader' several times is bad.

Redundancy is plainly obvious to me, so we must be working from different definitions of normalization. Redundancy makes a solution less generic and more prone to problems over a border scope. For instance, the above table's design prevents it from handling two things with the same name or a thing with no name.

> for argument's sake assume ...
> That would allow me to remove all 'fido's and 'leader's from the
> hierarchies table. Instead, it would now look like this:
>
> Table hierarchies
> thingID otherThingID hierarchyID
> --------- ------------ -----------
> (snip some rows)
> 17 3 1
> 17 4 1
> 17 5 1
>
> And the table things would read:
> thingID thingName
> ------- ---------
> 3 'john'
> 4 'mary'
> 5 'luke'
> 17 'fido'

> But if I follow your suggestion of "improving", I end up with the
> same data in the existing tables and some extra relations that
> I could do without.

The above design is an improvement because it increases genericness. Two things can now have the same name. Also, the above design is a stepping stone to the next design which would allow a thing to be name less and not incur NULLs.

The above design is an improvement because it reduces redundant data. The duplicate 17 in T_Hierarchies and T_Thing is not considered redundant (with respect to dbs) because it is unrelated to data being modelled. Separating data from the mechanism that links things with a db, allows maximum flexibility under a broader scope (ie allowing unamed things). Received on Thu May 27 2004 - 01:36:38 CEST

Original text of this message